Interface StringMapFunctions

All Superinterfaces:
Column<String>, Comparator<String>, Iterable<String>
All Known Implementing Classes:
AbstractStringColumn, StringColumn, TextColumn

public interface StringMapFunctions extends Column<String>
String utility functions. Each function takes one or more String columns as input and produces another Column as output. The resulting column need not be a string column.

This code was developed as part of Apache Commons Text.

  • Method Details

    • upperCase

      default StringColumn upperCase()
    • lowerCase

      default StringColumn lowerCase()
    • capitalize

      default StringColumn capitalize()
      Capitalizes each String changing the first character of each to title case as per Character.toTitleCase(int), as if in a sentence. No other characters are changed.
       capitalize(null)  = null
       capitalize("")    = ""
       capitalize("cat") = "Cat"
       capitalize("cAt") = "CAt"
       capitalize("'cat'") = "'cat'"
       
    • repeat

      default StringColumn repeat(int times)
      Repeats each the column's values elementwise, concatinating the results into a new StringColumn
      Parameters:
      times - The number of repeat desired
        repeat("", 2)   = ""
        repeat("cat", 3) = "catcatcat"
       
      Returns:
      the new StringColumn
    • trim

      default StringColumn trim()
    • replaceAll

      default StringColumn replaceAll(String regex, String replacement)
    • replaceFirst

      default StringColumn replaceFirst(String regex, String replacement)
    • substring

      default StringColumn substring(int start, int end)
    • substring

      default StringColumn substring(int start)
      Returns a column containing the substrings from start to the end of the input
      Throws:
      StringIndexOutOfBoundsException - if any string in the column is shorter than start
    • abbreviate

      default StringColumn abbreviate(int maxWidth)
      Abbreviates a String using ellipses. This will turn "Now is the time for all good men" into "Now is the time for..."
      Parameters:
      maxWidth - the maximum width of the resulting strings, including the elipses.
    • format

      default StringColumn format(String formatString)
    • parseInt

      default IntColumn parseInt()
      Returns an IntColumn containing all the values of this string column as integers, assuming all the values are stringified ints in the first place. Otherwise an exception is thrown
      Returns:
      An IntColumn containing ints parsed from the strings in this column
    • parseDouble

      default DoubleColumn parseDouble()
      Returns an Double containing all the values of this string column as doubles, assuming all the values are stringified doubles in the first place. Otherwise an exception is thrown
      Returns:
      A DoubleColumn containing doubles parsed from the strings in this column
    • parseFloat

      default FloatColumn parseFloat()
      Returns an Float containing all the values of this string column as floats, assuming all the values are stringified floats in the first place. Otherwise an exception is thrown
      Returns:
      A FloatColumn containing floats parsed from the strings in this column
    • padEnd

      default StringColumn padEnd(int minLength, char padChar)
    • padStart

      default StringColumn padStart(int minLength, char padChar)
    • commonPrefix

      default StringColumn commonPrefix(Column<String> column2)
    • commonSuffix

      default StringColumn commonSuffix(Column<String> column2)
    • distance

      default DoubleColumn distance(Column<String> column2)
      Returns a column containing the levenshtein distance between the two given string columns
    • join

      default StringColumn join(String separator, Column<?>... columns)
      Return a copy of this column with the given string appended
      Parameters:
      columns - the column to append
      Returns:
      the new column
    • concatenate

      default StringColumn concatenate(Object... stringsToAppend)
      Return a copy of this column with the given string appended to each element
      Parameters:
      stringsToAppend - the stringified objects to append
      Returns:
      the new column
    • concatenate

      default StringColumn concatenate(Column<?>... stringColumns)
      Return a copy of this column with the corresponding value of each column argument appended to each element. getString is used to ensure the value returned by the args are strings
      Parameters:
      stringColumns - the string columns to append
      Returns:
      the new column
    • replaceAll

      default StringColumn replaceAll(String[] regexArray, String replacement)
      Creates a new column, replacing each string in this column with a new string formed by replacing any substring that matches the regex
      Parameters:
      regexArray - the regex array to replace
      replacement - the replacement array
      Returns:
      the new column
    • tokenizeAndSort

      default StringColumn tokenizeAndSort(String separator)
    • countTokens

      default DoubleColumn countTokens(String separator)
    • uniqueTokens

      default StringColumn uniqueTokens(String separator)
      Returns a column of arbitrary size containing each unique token in this column, where a token is defined using the given separator, and uniqueness is calculated across the entire column

      NOTE: Unlike other map functions, this method produces a column whose size may be different from the source, so they cannot safely be combined in a table.

      Parameters:
      separator - the delimiter used in the tokenizing operation
      Returns:
      a new column
    • tokens

      default StringColumn tokens(String separator)
      Returns a column of arbitrary size containing each token in this column, where a token is defined using the given separator.

      NOTE: Unlike other map functions, this method produces a column whose size may be different from the source, so they cannot safely be combined in a table.

      Parameters:
      separator - the delimiter used in the tokenizing operation
      Returns:
      a new column
    • length

      default DoubleColumn length()
      Returns a column containing the character length of each string in this column The returned column is the same size as the original
    • tokenizeAndSort

      default StringColumn tokenizeAndSort()
      Splits on Whitespace and returns the lexicographically sorted result.
      Returns:
      a StringColumn
    • tokenizeAndRemoveDuplicates

      default StringColumn tokenizeAndRemoveDuplicates(String separator)