Package tech.tablesaw.columns.strings
Interface StringMapFunctions
-
- All Known Implementing Classes:
StringColumn
public interface StringMapFunctions
String utility functions. Each function takes one or more String columns as input and produces another Column as output. The resulting column need not be a string column.This code was developed as part of Apache Commons Text.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description default StringColumn
abbreviate(int maxWidth)
Abbreviates a String using ellipses.default StringColumn
capitalize()
Capitalizes each String changing the first character of each to title case as perCharacter.toTitleCase(int)
, as if in a sentence.default StringColumn
commonPrefix(Column<String> column2)
default StringColumn
commonSuffix(Column<String> column2)
default StringColumn
concatenate(Object... stringsToAppend)
Return a copy of this column with the given string appended to each elementdefault StringColumn
concatenate(Column<?>... stringColumns)
Return a copy of this column with the corresponding value of each column argument appended to each element.default DoubleColumn
countTokens(String separator)
default DoubleColumn
distance(Column<String> column2)
Returns a column containing the levenshtein distance between the two given string columnsdefault StringColumn
format(String formatString)
String
getString(int idx)
default StringColumn
join(String separator, Column<?>... columns)
Return a copy of this column with the given string appendeddefault DoubleColumn
length()
Returns a column containing the character length of each string in this column The returned column is the same size as the originaldefault StringColumn
lowerCase()
String
name()
default StringColumn
padEnd(int minLength, char padChar)
default StringColumn
padStart(int minLength, char padChar)
default DoubleColumn
parseDouble()
Returns an Double containing all the values of this string column as doubles, assuming all the values are stringified doubles in the first place.default FloatColumn
parseFloat()
Returns an Float containing all the values of this string column as floats, assuming all the values are stringified floats in the first place.default IntColumn
parseInt()
Returns an IntColumn containing all the values of this string column as integers, assuming all the values are stringified ints in the first place.default StringColumn
repeat(int times)
Repeats each the column's values elementwise, concatinating the results into a new StringColumndefault StringColumn
replaceAll(String[] regexArray, String replacement)
Creates a new column, replacing each string in this column with a new string formed by replacing any substring that matches the regexdefault StringColumn
replaceAll(String regex, String replacement)
default StringColumn
replaceFirst(String regex, String replacement)
int
size()
default StringColumn
substring(int start)
Returns a column containing the substrings from start to the end of the inputdefault StringColumn
substring(int start, int end)
default StringColumn
tokenizeAndRemoveDuplicates(String separator)
default StringColumn
tokenizeAndSort()
Splits on Whitespace and returns the lexicographically sorted result.default StringColumn
tokenizeAndSort(String separator)
default StringColumn
tokens(String separator)
Returns a column of arbitrary size containing each token in this column, where a token is defined using the given separator.default StringColumn
trim()
default StringColumn
uniqueTokens(String separator)
Returns a column of arbitrary size containing each unique token in this column, where a token is defined using the given separator, and uniqueness is calculated across the entire columndefault StringColumn
upperCase()
-
-
-
Method Detail
-
size
int size()
-
getString
String getString(int idx)
-
upperCase
default StringColumn upperCase()
-
lowerCase
default StringColumn lowerCase()
-
capitalize
default StringColumn capitalize()
Capitalizes each String changing the first character of each to title case as perCharacter.toTitleCase(int)
, as if in a sentence. No other characters are changed.capitalize(null) = null capitalize("") = "" capitalize("cat") = "Cat" capitalize("cAt") = "CAt" capitalize("'cat'") = "'cat'"
-
repeat
default StringColumn repeat(int times)
Repeats each the column's values elementwise, concatinating the results into a new StringColumn- Parameters:
times
- The number of repeat desiredrepeat("", 2) = "" repeat("cat", 3) = "catcatcat"
- Returns:
- the new StringColumn
-
trim
default StringColumn trim()
-
replaceAll
default StringColumn replaceAll(String regex, String replacement)
-
replaceFirst
default StringColumn replaceFirst(String regex, String replacement)
-
substring
default StringColumn substring(int start, int end)
-
substring
default StringColumn substring(int start)
Returns a column containing the substrings from start to the end of the input- Throws:
StringIndexOutOfBoundsException
- if any string in the column is shorter than start
-
abbreviate
default StringColumn abbreviate(int maxWidth)
Abbreviates a String using ellipses. This will turn "Now is the time for all good men" into "Now is the time for..."- Parameters:
maxWidth
- the maximum width of the resulting strings, including the elipses.
-
format
default StringColumn format(String formatString)
-
parseInt
default IntColumn parseInt()
Returns an IntColumn containing all the values of this string column as integers, assuming all the values are stringified ints in the first place. Otherwise an exception is thrown- Returns:
- An IntColumn containing ints parsed from the strings in this column
-
parseDouble
default DoubleColumn parseDouble()
Returns an Double containing all the values of this string column as doubles, assuming all the values are stringified doubles in the first place. Otherwise an exception is thrown- Returns:
- A DoubleColumn containing doubles parsed from the strings in this column
-
parseFloat
default FloatColumn parseFloat()
Returns an Float containing all the values of this string column as floats, assuming all the values are stringified floats in the first place. Otherwise an exception is thrown- Returns:
- A FloatColumn containing floats parsed from the strings in this column
-
padEnd
default StringColumn padEnd(int minLength, char padChar)
-
padStart
default StringColumn padStart(int minLength, char padChar)
-
commonPrefix
default StringColumn commonPrefix(Column<String> column2)
-
commonSuffix
default StringColumn commonSuffix(Column<String> column2)
-
distance
default DoubleColumn distance(Column<String> column2)
Returns a column containing the levenshtein distance between the two given string columns
-
join
default StringColumn join(String separator, Column<?>... columns)
Return a copy of this column with the given string appended- Parameters:
columns
- the column to append- Returns:
- the new column
-
concatenate
default StringColumn concatenate(Object... stringsToAppend)
Return a copy of this column with the given string appended to each element- Parameters:
stringsToAppend
- the stringified objects to append- Returns:
- the new column
-
concatenate
default StringColumn concatenate(Column<?>... stringColumns)
Return a copy of this column with the corresponding value of each column argument appended to each element. getString is used to ensure the value returned by the args are strings- Parameters:
stringColumns
- the string columns to append- Returns:
- the new column
-
replaceAll
default StringColumn replaceAll(String[] regexArray, String replacement)
Creates a new column, replacing each string in this column with a new string formed by replacing any substring that matches the regex- Parameters:
regexArray
- the regex array to replacereplacement
- the replacement array- Returns:
- the new column
-
tokenizeAndSort
default StringColumn tokenizeAndSort(String separator)
-
countTokens
default DoubleColumn countTokens(String separator)
-
uniqueTokens
default StringColumn uniqueTokens(String separator)
Returns a column of arbitrary size containing each unique token in this column, where a token is defined using the given separator, and uniqueness is calculated across the entire columnNOTE: Unlike other map functions, this method produces a column whose size may be different from the source, so they cannot safely be combined in a table.
- Parameters:
separator
- the delimiter used in the tokenizing operation- Returns:
- a new column
-
tokens
default StringColumn tokens(String separator)
Returns a column of arbitrary size containing each token in this column, where a token is defined using the given separator.NOTE: Unlike other map functions, this method produces a column whose size may be different from the source, so they cannot safely be combined in a table.
- Parameters:
separator
- the delimiter used in the tokenizing operation- Returns:
- a new column
-
length
default DoubleColumn length()
Returns a column containing the character length of each string in this column The returned column is the same size as the original
-
tokenizeAndSort
default StringColumn tokenizeAndSort()
Splits on Whitespace and returns the lexicographically sorted result.- Returns:
- a
StringColumn
-
tokenizeAndRemoveDuplicates
default StringColumn tokenizeAndRemoveDuplicates(String separator)
-
name
String name()
-
-