Interface StringMapFunctions

  • All Known Implementing Classes:
    StringColumn

    public interface StringMapFunctions
    String utility functions. Each function takes one or more String columns as input and produces another Column as output. The resulting column need not be a string column.

    This code was developed as part of Apache Commons Text.

    • Method Detail

      • size

        int size()
      • getString

        String getString​(int idx)
      • capitalize

        default StringColumn capitalize()
        Capitalizes each String changing the first character of each to title case as per Character.toTitleCase(int), as if in a sentence. No other characters are changed.
         capitalize(null)  = null
         capitalize("")    = ""
         capitalize("cat") = "Cat"
         capitalize("cAt") = "CAt"
         capitalize("'cat'") = "'cat'"
         
      • repeat

        default StringColumn repeat​(int times)
        Repeats each the column's values elementwise, concatinating the results into a new StringColumn
        Parameters:
        times - The number of repeat desired
          repeat("", 2)   = ""
          repeat("cat", 3) = "catcatcat"
         
        Returns:
        the new StringColumn
      • substring

        default StringColumn substring​(int start,
                                       int end)
      • substring

        default StringColumn substring​(int start)
        Returns a column containing the substrings from start to the end of the input
        Throws:
        StringIndexOutOfBoundsException - if any string in the column is shorter than start
      • abbreviate

        default StringColumn abbreviate​(int maxWidth)
        Abbreviates a String using ellipses. This will turn "Now is the time for all good men" into "Now is the time for..."
        Parameters:
        maxWidth - the maximum width of the resulting strings, including the elipses.
      • parseInt

        default IntColumn parseInt()
        Returns an IntColumn containing all the values of this string column as integers, assuming all the values are stringified ints in the first place. Otherwise an exception is thrown
        Returns:
        An IntColumn containing ints parsed from the strings in this column
      • parseDouble

        default DoubleColumn parseDouble()
        Returns an Double containing all the values of this string column as doubles, assuming all the values are stringified doubles in the first place. Otherwise an exception is thrown
        Returns:
        A DoubleColumn containing doubles parsed from the strings in this column
      • parseFloat

        default FloatColumn parseFloat()
        Returns an Float containing all the values of this string column as floats, assuming all the values are stringified floats in the first place. Otherwise an exception is thrown
        Returns:
        A FloatColumn containing floats parsed from the strings in this column
      • padEnd

        default StringColumn padEnd​(int minLength,
                                    char padChar)
      • padStart

        default StringColumn padStart​(int minLength,
                                      char padChar)
      • distance

        default DoubleColumn distance​(Column<String> column2)
        Returns a column containing the levenshtein distance between the two given string columns
      • join

        default StringColumn join​(String separator,
                                  Column<?>... columns)
        Return a copy of this column with the given string appended
        Parameters:
        columns - the column to append
        Returns:
        the new column
      • concatenate

        default StringColumn concatenate​(Object... stringsToAppend)
        Return a copy of this column with the given string appended to each element
        Parameters:
        stringsToAppend - the stringified objects to append
        Returns:
        the new column
      • concatenate

        default StringColumn concatenate​(Column<?>... stringColumns)
        Return a copy of this column with the corresponding value of each column argument appended to each element. getString is used to ensure the value returned by the args are strings
        Parameters:
        stringColumns - the string columns to append
        Returns:
        the new column
      • replaceAll

        default StringColumn replaceAll​(String[] regexArray,
                                        String replacement)
        Creates a new column, replacing each string in this column with a new string formed by replacing any substring that matches the regex
        Parameters:
        regexArray - the regex array to replace
        replacement - the replacement array
        Returns:
        the new column
      • uniqueTokens

        default StringColumn uniqueTokens​(String separator)
        Returns a column of arbitrary size containing each unique token in this column, where a token is defined using the given separator, and uniqueness is calculated across the entire column

        NOTE: Unlike other map functions, this method produces a column whose size may be different from the source, so they cannot safely be combined in a table.

        Parameters:
        separator - the delimiter used in the tokenizing operation
        Returns:
        a new column
      • tokens

        default StringColumn tokens​(String separator)
        Returns a column of arbitrary size containing each token in this column, where a token is defined using the given separator.

        NOTE: Unlike other map functions, this method produces a column whose size may be different from the source, so they cannot safely be combined in a table.

        Parameters:
        separator - the delimiter used in the tokenizing operation
        Returns:
        a new column
      • length

        default DoubleColumn length()
        Returns a column containing the character length of each string in this column The returned column is the same size as the original
      • tokenizeAndSort

        default StringColumn tokenizeAndSort()
        Splits on Whitespace and returns the lexicographically sorted result.
        Returns:
        a StringColumn
      • tokenizeAndRemoveDuplicates

        default StringColumn tokenizeAndRemoveDuplicates​(String separator)