Package tech.tablesaw.columns.strings
Interface StringMapFunctions
- All Known Implementing Classes:
AbstractStringColumn
,StringColumn
,TextColumn
String utility functions. Each function takes one or more String columns as input and produces
another Column as output. The resulting column need not be a string column.
This code was developed as part of Apache Commons Text.
-
Method Summary
Modifier and TypeMethodDescriptiondefault StringColumn
abbreviate
(int maxWidth) Abbreviates a String using ellipses.default StringColumn
Capitalizes each String changing the first character of each to title case as perCharacter.toTitleCase(int)
, as if in a sentence.default StringColumn
commonPrefix
(Column<String> column2) default StringColumn
commonSuffix
(Column<String> column2) default StringColumn
concatenate
(Object... stringsToAppend) Return a copy of this column with the given string appended to each elementdefault StringColumn
concatenate
(Column<?>... stringColumns) Return a copy of this column with the corresponding value of each column argument appended to each element. getString is used to ensure the value returned by the args are stringsdefault DoubleColumn
countTokens
(String separator) default DoubleColumn
Returns a column containing the levenshtein distance between the two given string columnsdefault StringColumn
default StringColumn
Return a copy of this column with the given string appendeddefault DoubleColumn
length()
Returns a column containing the character length of each string in this column The returned column is the same size as the originaldefault StringColumn
default StringColumn
padEnd
(int minLength, char padChar) default StringColumn
padStart
(int minLength, char padChar) default DoubleColumn
Returns an Double containing all the values of this string column as doubles, assuming all the values are stringified doubles in the first place.default FloatColumn
Returns an Float containing all the values of this string column as floats, assuming all the values are stringified floats in the first place.default IntColumn
parseInt()
Returns an IntColumn containing all the values of this string column as integers, assuming all the values are stringified ints in the first place.default StringColumn
repeat
(int times) Repeats each the column's values elementwise, concatinating the results into a new StringColumndefault StringColumn
replaceAll
(String[] regexArray, String replacement) Creates a new column, replacing each string in this column with a new string formed by replacing any substring that matches the regexdefault StringColumn
replaceAll
(String regex, String replacement) default StringColumn
replaceFirst
(String regex, String replacement) default StringColumn
substring
(int start) Returns a column containing the substrings from start to the end of the inputdefault StringColumn
substring
(int start, int end) default StringColumn
tokenizeAndRemoveDuplicates
(String separator) default StringColumn
Splits on Whitespace and returns the lexicographically sorted result.default StringColumn
tokenizeAndSort
(String separator) default StringColumn
Returns a column of arbitrary size containing each token in this column, where a token is defined using the given separator.default StringColumn
trim()
default StringColumn
uniqueTokens
(String separator) Returns a column of arbitrary size containing each unique token in this column, where a token is defined using the given separator, and uniqueness is calculated across the entire columndefault StringColumn
Methods inherited from interface tech.tablesaw.columns.Column
allMatch, anyMatch, append, append, append, appendCell, appendCell, appendMissing, appendObj, asBytes, asList, asObjectArray, asSet, asStringColumn, byteSize, clear, columnWidth, contains, copy, count, count, countMissing, countUnique, emptyCopy, emptyCopy, equals, filter, first, get, getString, getUnformattedString, indexOf, inRange, interpolate, isEmpty, isMissing, isMissing, isNotMissing, lag, last, lead, map, map, mapInto, max, max, min, min, name, noneMatch, parser, print, reduce, reduce, removeMissing, rolling, rowComparator, sampleN, sampleX, set, set, set, set, set, set, setMissing, setMissingTo, setName, setParser, size, sortAscending, sortDescending, sorted, subset, summary, title, type, unique, valueHash, where
Methods inherited from interface java.util.Comparator
compare, equals, reversed, thenComparing, thenComparing, thenComparing, thenComparingDouble, thenComparingInt, thenComparingLong
Methods inherited from interface java.lang.Iterable
forEach, iterator, spliterator
-
Method Details
-
upperCase
-
lowerCase
-
capitalize
Capitalizes each String changing the first character of each to title case as perCharacter.toTitleCase(int)
, as if in a sentence. No other characters are changed.capitalize(null) = null capitalize("") = "" capitalize("cat") = "Cat" capitalize("cAt") = "CAt" capitalize("'cat'") = "'cat'"
-
repeat
Repeats each the column's values elementwise, concatinating the results into a new StringColumn- Parameters:
times
- The number of repeat desiredrepeat("", 2) = "" repeat("cat", 3) = "catcatcat"
- Returns:
- the new StringColumn
-
trim
-
replaceAll
-
replaceFirst
-
substring
-
substring
Returns a column containing the substrings from start to the end of the input- Throws:
StringIndexOutOfBoundsException
- if any string in the column is shorter than start
-
abbreviate
Abbreviates a String using ellipses. This will turn "Now is the time for all good men" into "Now is the time for..."- Parameters:
maxWidth
- the maximum width of the resulting strings, including the elipses.
-
format
-
parseInt
Returns an IntColumn containing all the values of this string column as integers, assuming all the values are stringified ints in the first place. Otherwise an exception is thrown- Returns:
- An IntColumn containing ints parsed from the strings in this column
-
parseDouble
Returns an Double containing all the values of this string column as doubles, assuming all the values are stringified doubles in the first place. Otherwise an exception is thrown- Returns:
- A DoubleColumn containing doubles parsed from the strings in this column
-
parseFloat
Returns an Float containing all the values of this string column as floats, assuming all the values are stringified floats in the first place. Otherwise an exception is thrown- Returns:
- A FloatColumn containing floats parsed from the strings in this column
-
padEnd
-
padStart
-
commonPrefix
-
commonSuffix
-
distance
Returns a column containing the levenshtein distance between the two given string columns -
join
Return a copy of this column with the given string appended- Parameters:
columns
- the column to append- Returns:
- the new column
-
concatenate
Return a copy of this column with the given string appended to each element- Parameters:
stringsToAppend
- the stringified objects to append- Returns:
- the new column
-
concatenate
Return a copy of this column with the corresponding value of each column argument appended to each element. getString is used to ensure the value returned by the args are strings- Parameters:
stringColumns
- the string columns to append- Returns:
- the new column
-
replaceAll
Creates a new column, replacing each string in this column with a new string formed by replacing any substring that matches the regex- Parameters:
regexArray
- the regex array to replacereplacement
- the replacement array- Returns:
- the new column
-
tokenizeAndSort
-
countTokens
-
uniqueTokens
Returns a column of arbitrary size containing each unique token in this column, where a token is defined using the given separator, and uniqueness is calculated across the entire columnNOTE: Unlike other map functions, this method produces a column whose size may be different from the source, so they cannot safely be combined in a table.
- Parameters:
separator
- the delimiter used in the tokenizing operation- Returns:
- a new column
-
tokens
Returns a column of arbitrary size containing each token in this column, where a token is defined using the given separator.NOTE: Unlike other map functions, this method produces a column whose size may be different from the source, so they cannot safely be combined in a table.
- Parameters:
separator
- the delimiter used in the tokenizing operation- Returns:
- a new column
-
length
Returns a column containing the character length of each string in this column The returned column is the same size as the original -
tokenizeAndSort
Splits on Whitespace and returns the lexicographically sorted result.- Returns:
- a
StringColumn
-
tokenizeAndRemoveDuplicates
-