Class StringColumn

All Implemented Interfaces:
Iterable<String>, Comparator<String>, CategoricalColumn<String>, Column<String>, StringFilters, StringMapFunctions, StringReduceUtils, FilterSpec<Selection>, StringFilterSpec<Selection>

public class StringColumn extends AbstractStringColumn<StringColumn>
A column that contains String values. They are assumed to be 'categorical' rather than free-form text, so are stored in an encoding that takes advantage of the expected repetition of string values.

Because the MISSING_VALUE for this column type is an empty string, there is little or no need for special handling of missing values in this class's methods.

  • Method Details

    • valueIsMissing

      public static boolean valueIsMissing(String string)
    • appendMissing

      public StringColumn appendMissing()
      Appends a missing value appropriate to the column
    • valueHash

      public int valueHash(int rowNumber)
      Returns an int suitable as a hash for the value in this column at the given index
    • equals

      public boolean equals(int rowNumber1, int rowNumber2)
      Returns true if the value in this column at rowNumber1 is equal to the value at rowNumber2
    • create

      public static StringColumn create(String name)
    • create

      public static StringColumn create(String name, String... strings)
    • create

      public static StringColumn create(String name, Collection<String> strings)
    • createInternal

      public static StringColumn createInternal(String name, DictionaryMap map)
    • create

      public static StringColumn create(String name, int size)
    • create

      public static StringColumn create(String name, Stream<String> stream)
    • isMissing

      public boolean isMissing(int rowNumber)
      Returns true if the value at rowNumber is missing
    • emptyCopy

      public StringColumn emptyCopy()
      Returns a copy of the receiver with no data. The column name and type are the same.
      Returns:
      a empty copy of Column
    • emptyCopy

      public StringColumn emptyCopy(int rowSize)
      Returns an empty copy of the receiver, with its internal storage initialized to the given row size.
      Parameters:
      rowSize - the initial row size
      Returns:
      a Column
    • sortAscending

      public void sortAscending()
      Sorts my values in ascending order
    • sortDescending

      public void sortDescending()
      Sorts my values in descending order
    • size

      public int size()
      Returns the number of elements (a.k.a. rows or cells) in the column
      Returns:
      size as int
    • get

      public String get(int rowIndex)
      Returns the value at rowIndex in this column. The index is zero-based.
      Parameters:
      rowIndex - index of the row
      Returns:
      value as String
      Throws:
      IndexOutOfBoundsException - if the given rowIndex is not in the column
    • asList

      public List<String> asList()
      Returns a List<String> representation of all the values in this column

      NOTE: Unless you really need a string consider using the column itself for large datasets as it uses much less memory

      Returns:
      values as a list of String.
    • summary

      public Table summary()
      Returns a table containing a ColumnType specific summary of the data in this column
    • countByCategory

      public Table countByCategory()
      Returns a count of the number of elements in each category (i.e., the number of repetitions of each value) TODO: This needs to be well tested, especially for IntColumn
    • clear

      public void clear()
      Removes all elements TODO: Make this return this column
    • lead

      public StringColumn lead(int n)
      Returns a column of the same type as the receiver, containing the receivers values offset -n For example if you lead a column containing 2, 3, 4 by 1, you get a column containing 3, 4, NA.
    • lag

      public StringColumn lag(int n)
      Returns a column of the same type and size as the receiver, containing the receivers values offset by n.

      For example if you lag a column containing 2, 3, 4 by 1, you get a column containing NA, 2, 3

    • set

      public StringColumn set(Selection rowSelection, String newValue)
      Conditionally update this column, replacing current values with newValue for all rows where the current value matches the selection criteria

      Examples: myCatColumn.set(myCatColumn.isEqualTo("Cat"), "Dog"); // no more cats myCatColumn.set(myCatColumn.valueIsMissing(), "Fox"); // no more missing values

      Specified by:
      set in interface Column<String>
      Overrides:
      set in class AbstractColumn<StringColumn,String>
    • set

      public StringColumn set(int rowIndex, String stringValue)
      Sets the value at index row to the given value and return this column
    • countUnique

      public int countUnique()
      Returns the count of unique values in this column.
      Returns:
      unique values as int
    • contains

      public boolean contains(String aString)
      Returns true if this column contains a cell with the given string, and false otherwise
      Parameters:
      aString - the value to look for
      Returns:
      true if contains, false otherwise
    • setMissing

      public StringColumn setMissing(int i)
      Sets the value at index i to the missing-value indicator for this column type, and return this column
    • addAll

      public StringColumn addAll(List<String> stringValues)
      Add all the strings in the list to this column
      Parameters:
      stringValues - a list of values
    • appendCell

      public StringColumn appendCell(String object)
      Add one element to the bottom of this column and set its value to the parsed value of the given String. Parsing is type-specific
    • appendCell

      public StringColumn appendCell(String object, AbstractColumnParser<?> parser)
      Add one element to the bottom of this column and set its value to the parsed value of the given String, as performed by the given parser
    • rowComparator

      public it.unimi.dsi.fastutil.ints.IntComparator rowComparator()
      Returns an IntComparator for sorting my rows
    • isEmpty

      public boolean isEmpty()
      Returns true if the column has no data
      Returns:
      true if empty, false if not
    • isEqualTo

      public Selection isEqualTo(String string)
    • isNotEqualTo

      public Selection isNotEqualTo(String string)
    • getDummies

      public List<BooleanColumn> getDummies()
      Returns a list of boolean columns suitable for use as dummy variables in, for example, regression analysis, select a column of categorical data must be encoded as a list of columns, such that each column represents a single category and indicates whether it is present (1) or not present (0)
      Returns:
      a list of BooleanColumn
    • unique

      public StringColumn unique()
      Returns a new Column containing all the unique values in this column
      Returns:
      a column with unique values.
    • asDoubleColumn

      public DoubleColumn asDoubleColumn()
    • where

      public StringColumn where(Selection selection)
      Returns a new column containing the subset referenced by the Selection
    • copy

      public StringColumn copy()
      Returns a deep copy of the receiver
      Returns:
      a Column
    • append

      public StringColumn append(Column<String> column)
      Appends all the values in the argument to the bottom of this column and return this column
    • countMissing

      public int countMissing()
      Returns the count of missing values in this column
      Returns:
      missing values as int
    • removeMissing

      public StringColumn removeMissing()
      Returns a copy of this column with the missing values removed
    • iterator

      public Iterator<String> iterator()
    • asSet

      public Set<String> asSet()
      Description copied from interface: Column
      Returns a Set containing all the unique values in this column
    • asBytes

      public byte[] asBytes(int rowNumber)
      Returns the contents of the cell at rowNumber as a byte[]
      Parameters:
      rowNumber - index of the row
      Returns:
      content as byte[]
    • getDouble

      public double getDouble(int i)
    • asDoubleArray

      public double[] asDoubleArray()
    • append

      public StringColumn append(String value)
      Added for naming consistency with all other columns
    • appendObj

      public StringColumn appendObj(Object obj)
      Appends the given value to the bottom of this column and return this column
    • isIn

      public Selection isIn(String... strings)
    • isIn

      public Selection isIn(Collection<String> strings)
    • isNotIn

      public Selection isNotIn(String... strings)
    • isNotIn

      public Selection isNotIn(Collection<String> strings)
    • firstIndexOf

      public int firstIndexOf(String value)
    • countOccurrences

      public int countOccurrences(String value)
    • asObjectArray

      public String[] asObjectArray()
      Returns an array of objects as appropriate for my type of column
    • asStringColumn

      public StringColumn asStringColumn()
      Returns a StringColumn consisting of the (unformatted) String representation of this column values
      Specified by:
      asStringColumn in interface Column<String>
      Overrides:
      asStringColumn in class AbstractColumn<StringColumn,String>
      Returns:
      a StringColumn built using the column Column.getUnformattedString(int) method
    • asTextColumn

      public TextColumn asTextColumn()
    • getDictionary

      public DictionaryMap getDictionary()
      For tablesaw internal use only