Package tech.tablesaw.api
Class StringColumn
- java.lang.Object
-
- tech.tablesaw.columns.AbstractColumn<StringColumn,String>
-
- tech.tablesaw.api.StringColumn
-
- All Implemented Interfaces:
Iterable<String>,Comparator<String>,CategoricalColumn<String>,Column<String>,StringFilters,StringMapFunctions,StringReduceUtils,FilterSpec<Selection>,StringFilterSpec<Selection>
public class StringColumn extends AbstractColumn<StringColumn,String> implements CategoricalColumn<String>, StringFilters, StringMapFunctions, StringReduceUtils
A column that contains String values. They are assumed to be 'categorical' rather than free-form text, so are stored in an encoding that takes advantage of the expected repetition of string values.Because the MISSING_VALUE for this column type is an empty string, there is little or no need for special handling of missing values in this class's methods.
-
-
Field Summary
-
Fields inherited from class tech.tablesaw.columns.AbstractColumn
DEFAULT_ARRAY_SIZE
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description StringColumnaddAll(List<String> stringValues)Add all the strings in the list to this columnStringColumnappend(String value)Added for naming consistency with all other columnsStringColumnappend(Column<String> column)Appends all the values in the argument to the bottom of this column and return this columnColumn<String>append(Column<String> column, int row)Appends the value at the given row in the given column to the bottom of this column and return this columnStringColumnappendCell(String object)Add one element to the bottom of this column and set its value to the parsed value of the given String.StringColumnappendCell(String object, AbstractColumnParser<?> parser)Add one element to the bottom of this column and set its value to the parsed value of the given String, as performed by the given parserStringColumnappendMissing()Appends a missing value appropriate to the columnStringColumnappendObj(Object obj)Appends the given value to the bottom of this column and return this columnbyte[]asBytes(int rowNumber)Returns the contents of the cell at rowNumber as a byte[]double[]asDoubleArray()DoubleColumnasDoubleColumn()List<String>asList()Returns a List<String> representation of all the values in this columnString[]asObjectArray()Returns an array of objects as appropriate for my type of columnSet<String>asSet()Returns a Set containing all the unique values in this columnStringColumnasStringColumn()Returns a StringColumn consisting of the (unformatted) String representation of this column valuesList<String>bottom(int n)Returns the smallest ("bottom") n values in the columnintbyteSize()Returns the width of a cell in this column, in bytes.voidclear()Removes all elements TODO: Make this return this columnintcompare(String o1, String o2)booleancontains(String aString)Returns true if this column contains a cell with the given string, and false otherwiseStringColumncopy()Returns a deep copy of the receiverTablecountByCategory()Returns a count of the number of elements in each category (i.e., the number of repetitions of each value) TODO: This needs to be well tested, especially for IntColumnintcountMissing()Returns the count of missing values in this columnintcountOccurrences(String value)intcountUnique()Returns the count of unique values in this column.static StringColumncreate(String name)static StringColumncreate(String name, int size)static StringColumncreate(String name, String... strings)static StringColumncreate(String name, Collection<String> strings)static StringColumncreate(String name, Stream<String> stream)static StringColumncreateInternal(String name, DictionaryMap map)StringColumnemptyCopy()Returns a copy of the receiver with no data.StringColumnemptyCopy(int rowSize)Returns an empty copy of the receiver, with its internal storage initialized to the given row size.booleanequals(int rowNumber1, int rowNumber2)Returns true if the value in this column at rowNumber1 is equal to the value at rowNumber2intfirstIndexOf(String value)Stringget(int rowIndex)Returns the value at rowIndex in this column.DictionaryMapgetDictionary()For tablesaw internal use Note: This method returns null if the stringDataType is TEXTUALdoublegetDouble(int i)List<BooleanColumn>getDummies()Returns a list of boolean columns suitable for use as dummy variables in, for example, regression analysis, select a column of categorical data must be encoded as a list of columns, such that each column represents a single category and indicates whether it is present (1) or not present (0)StringColumnFormattergetPrintFormatter()Returns the currentStringColumnFormatter.StringgetString(int row)Returns a string representation of the value at the given row.StringgetUnformattedString(int row)Returns a String representation of the value at index r, without any formatting appliedbooleanisEmpty()Returns true if the column has no dataSelectionisEqualTo(String string)SelectionisIn(String... strings)SelectionisIn(Collection<String> strings)SelectionisMissing()Returns a selection containing an index for every missing value in this columnbooleanisMissing(int rowNumber)Returns true if the value at rowNumber is missingSelectionisNotEqualTo(String string)SelectionisNotIn(String... strings)SelectionisNotIn(Collection<String> strings)SelectionisNotMissing()Returns a selection containing an index for every non-missing value in this columnIterator<String>iterator()StringColumnlag(int n)Returns a column of the same type and size as the receiver, containing the receivers values offset by n.StringColumnlead(int n)Returns a column of the same type as the receiver, containing the receivers values offset -n For example if you lead a column containing 2, 3, 4 by 1, you get a column containing 3, 4, NA.StringColumnremoveMissing()Returns a copy of this column with the missing values removedit.unimi.dsi.fastutil.ints.IntComparatorrowComparator()Returns an IntComparator for sorting my rowsStringColumnset(int rowIndex, String stringValue)Sets the value at index row to the given value and return this columnColumn<String>set(int row, Column<String> column, int sourceRow)Sets the value at row to the value at sourceRow in the given column and return this columnStringColumnset(Selection rowSelection, String newValue)Conditionally update this column, replacing current values with newValue for all rows where the current value matches the selection criteriaStringColumnsetMissing(int i)Sets the value at index i to the missing-value indicator for this column type, and return this columnvoidsetPrintFormatter(StringColumnFormatter formatter)Sets anStringColumnFormatterwhich will be used to format the display of data from this column when it is printed (using, for example, Table:print()) and optionally when written to a text file like a CSV.intsize()Returns the number of elements (a.k.a.voidsortAscending()Sorts my values in ascending ordervoidsortDescending()Sorts my values in descending orderTablesummary()Returns a table containing a ColumnType specific summary of the data in this columnList<String>top(int n)Returns the largest ("top") n values in the columnStringColumnunique()Returns a new Column containing all the unique values in this columnintvalueHash(int rowNumber)Returns an int suitable as a hash for the value in this column at the given indexstatic booleanvalueIsMissing(String string)StringColumnwhere(Selection selection)Returns a new column containing the subset referenced by theSelection-
Methods inherited from class tech.tablesaw.columns.AbstractColumn
filter, first, indexOf, inRange, last, lastIndexOf, map, max, min, name, parser, sampleN, sampleX, set, setName, setParser, sorted, subset, toString, type
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface tech.tablesaw.columns.Column
allMatch, anyMatch, columnWidth, count, count, filter, first, indexOf, inRange, interpolate, last, lastIndexOf, map, map, mapInto, max, max, min, min, name, noneMatch, parser, print, reduce, reduce, rolling, sampleN, sampleX, set, set, set, setMissingTo, setName, setParser, sorted, subset, title, type
-
Methods inherited from interface java.util.Comparator
equals, reversed, thenComparing, thenComparing, thenComparing, thenComparingDouble, thenComparingInt, thenComparingLong
-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Methods inherited from interface tech.tablesaw.columns.strings.StringFilters
containsString, endsWith, equalsIgnoreCase, equalsIgnoreCase, eval, eval, eval, eval, isAlpha, isAlphaNumeric, isEmptyString, isEqualTo, isIn, isLongerThan, isLowerCase, isNotEqualTo, isNotIn, isNumeric, isShorterThan, isUpperCase, lengthEquals, matchesRegex, startsWith, startsWith
-
Methods inherited from interface tech.tablesaw.columns.strings.StringMapFunctions
abbreviate, capitalize, commonPrefix, commonSuffix, concatenate, concatenate, countTokens, distance, format, join, length, lowerCase, name, padEnd, padStart, parseDouble, parseFloat, parseInt, repeat, replaceAll, replaceAll, replaceFirst, substring, substring, tokenizeAndRemoveDuplicates, tokenizeAndSort, tokenizeAndSort, tokens, trim, uniqueTokens, upperCase
-
Methods inherited from interface tech.tablesaw.columns.strings.StringReduceUtils
appendAll, appendAll
-
-
-
-
Method Detail
-
valueIsMissing
public static boolean valueIsMissing(String string)
-
appendMissing
public StringColumn appendMissing()
Appends a missing value appropriate to the column- Specified by:
appendMissingin interfaceColumn<String>
-
valueHash
public int valueHash(int rowNumber)
Returns an int suitable as a hash for the value in this column at the given index
-
equals
public boolean equals(int rowNumber1, int rowNumber2)Returns true if the value in this column at rowNumber1 is equal to the value at rowNumber2
-
create
public static StringColumn create(String name)
-
create
public static StringColumn create(String name, String... strings)
-
create
public static StringColumn create(String name, Collection<String> strings)
-
createInternal
public static StringColumn createInternal(String name, DictionaryMap map)
-
create
public static StringColumn create(String name, int size)
-
create
public static StringColumn create(String name, Stream<String> stream)
-
setPrintFormatter
public void setPrintFormatter(StringColumnFormatter formatter)
Sets anStringColumnFormatterwhich will be used to format the display of data from this column when it is printed (using, for example, Table:print()) and optionally when written to a text file like a CSV.
-
getPrintFormatter
public StringColumnFormatter getPrintFormatter()
Returns the currentStringColumnFormatter.
-
isMissing
public boolean isMissing(int rowNumber)
Returns true if the value at rowNumber is missing
-
emptyCopy
public StringColumn emptyCopy()
Returns a copy of the receiver with no data. The column name and type are the same.- Specified by:
emptyCopyin interfaceColumn<String>- Specified by:
emptyCopyin classAbstractColumn<StringColumn,String>- Returns:
- a empty copy of
Column
-
emptyCopy
public StringColumn emptyCopy(int rowSize)
Returns an empty copy of the receiver, with its internal storage initialized to the given row size.
-
sortAscending
public void sortAscending()
Sorts my values in ascending order- Specified by:
sortAscendingin interfaceColumn<String>
-
sortDescending
public void sortDescending()
Sorts my values in descending order- Specified by:
sortDescendingin interfaceColumn<String>
-
size
public int size()
Returns the number of elements (a.k.a. rows or cells) in the column- Specified by:
sizein interfaceColumn<String>- Specified by:
sizein interfaceStringFilters- Specified by:
sizein interfaceStringMapFunctions- Specified by:
sizein interfaceStringReduceUtils- Returns:
- size as int
-
get
public String get(int rowIndex)
Returns the value at rowIndex in this column. The index is zero-based.- Specified by:
getin interfaceColumn<String>- Specified by:
getin interfaceStringFilters- Parameters:
rowIndex- index of the row- Returns:
- value as String
- Throws:
IndexOutOfBoundsException- if the given rowIndex is not in the column
-
asList
public List<String> asList()
Returns a List<String> representation of all the values in this columnNOTE: Unless you really need a string consider using the column itself for large datasets as it uses much less memory
-
summary
public Table summary()
Returns a table containing a ColumnType specific summary of the data in this column
-
countByCategory
public Table countByCategory()
Returns a count of the number of elements in each category (i.e., the number of repetitions of each value) TODO: This needs to be well tested, especially for IntColumn- Specified by:
countByCategoryin interfaceCategoricalColumn<String>
-
clear
public void clear()
Removes all elements TODO: Make this return this column
-
lead
public StringColumn lead(int n)
Returns a column of the same type as the receiver, containing the receivers values offset -n For example if you lead a column containing 2, 3, 4 by 1, you get a column containing 3, 4, NA.
-
lag
public StringColumn lag(int n)
Returns a column of the same type and size as the receiver, containing the receivers values offset by n.For example if you lag a column containing 2, 3, 4 by 1, you get a column containing NA, 2, 3
-
set
public StringColumn set(Selection rowSelection, String newValue)
Conditionally update this column, replacing current values with newValue for all rows where the current value matches the selection criteriaExamples: myCatColumn.set(myCatColumn.isEqualTo("Cat"), "Dog"); // no more cats myCatColumn.set(myCatColumn.valueIsMissing(), "Fox"); // no more missing values
- Specified by:
setin interfaceColumn<String>- Overrides:
setin classAbstractColumn<StringColumn,String>
-
set
public StringColumn set(int rowIndex, String stringValue)
Sets the value at index row to the given value and return this column
-
countUnique
public int countUnique()
Returns the count of unique values in this column.- Specified by:
countUniquein interfaceColumn<String>- Returns:
- unique values as int
-
contains
public boolean contains(String aString)
Returns true if this column contains a cell with the given string, and false otherwise
-
setMissing
public StringColumn setMissing(int i)
Sets the value at index i to the missing-value indicator for this column type, and return this column- Specified by:
setMissingin interfaceColumn<String>
-
addAll
public StringColumn addAll(List<String> stringValues)
Add all the strings in the list to this column- Parameters:
stringValues- a list of values
-
appendCell
public StringColumn appendCell(String object)
Add one element to the bottom of this column and set its value to the parsed value of the given String. Parsing is type-specific- Specified by:
appendCellin interfaceColumn<String>
-
appendCell
public StringColumn appendCell(String object, AbstractColumnParser<?> parser)
Add one element to the bottom of this column and set its value to the parsed value of the given String, as performed by the given parser- Specified by:
appendCellin interfaceColumn<String>
-
rowComparator
public it.unimi.dsi.fastutil.ints.IntComparator rowComparator()
Returns an IntComparator for sorting my rows- Specified by:
rowComparatorin interfaceColumn<String>
-
isMissing
public Selection isMissing()
Description copied from interface:ColumnReturns a selection containing an index for every missing value in this column- Specified by:
isMissingin interfaceColumn<String>- Specified by:
isMissingin interfaceFilterSpec<Selection>- Specified by:
isMissingin interfaceStringFilters
-
isNotMissing
public Selection isNotMissing()
Description copied from interface:ColumnReturns a selection containing an index for every non-missing value in this column- Specified by:
isNotMissingin interfaceColumn<String>- Specified by:
isNotMissingin interfaceFilterSpec<Selection>- Specified by:
isNotMissingin interfaceStringFilters
-
isEmpty
public boolean isEmpty()
Returns true if the column has no data
-
isEqualTo
public Selection isEqualTo(String string)
- Specified by:
isEqualToin interfaceStringFilters- Specified by:
isEqualToin interfaceStringFilterSpec<Selection>
-
isNotEqualTo
public Selection isNotEqualTo(String string)
- Specified by:
isNotEqualToin interfaceStringFilters- Specified by:
isNotEqualToin interfaceStringFilterSpec<Selection>
-
getDummies
public List<BooleanColumn> getDummies()
Returns a list of boolean columns suitable for use as dummy variables in, for example, regression analysis, select a column of categorical data must be encoded as a list of columns, such that each column represents a single category and indicates whether it is present (1) or not present (0)- Returns:
- a list of
BooleanColumn
-
unique
public StringColumn unique()
Returns a new Column containing all the unique values in this column
-
asDoubleColumn
public DoubleColumn asDoubleColumn()
-
where
public StringColumn where(Selection selection)
Returns a new column containing the subset referenced by theSelection
-
copy
public StringColumn copy()
Returns a deep copy of the receiver
-
append
public StringColumn append(Column<String> column)
Appends all the values in the argument to the bottom of this column and return this column
-
countMissing
public int countMissing()
Returns the count of missing values in this column- Specified by:
countMissingin interfaceColumn<String>- Returns:
- missing values as int
-
removeMissing
public StringColumn removeMissing()
Returns a copy of this column with the missing values removed- Specified by:
removeMissingin interfaceColumn<String>
-
asSet
public Set<String> asSet()
Description copied from interface:ColumnReturns a Set containing all the unique values in this column
-
asBytes
public byte[] asBytes(int rowNumber)
Returns the contents of the cell at rowNumber as a byte[]
-
getDouble
public double getDouble(int i)
-
asDoubleArray
public double[] asDoubleArray()
-
append
public StringColumn append(String value)
Added for naming consistency with all other columns
-
appendObj
public StringColumn appendObj(Object obj)
Appends the given value to the bottom of this column and return this column
-
isIn
public Selection isIn(String... strings)
- Specified by:
isInin interfaceStringFilters- Specified by:
isInin interfaceStringFilterSpec<Selection>
-
isIn
public Selection isIn(Collection<String> strings)
- Specified by:
isInin interfaceStringFilters- Specified by:
isInin interfaceStringFilterSpec<Selection>
-
isNotIn
public Selection isNotIn(String... strings)
- Specified by:
isNotInin interfaceStringFilters- Specified by:
isNotInin interfaceStringFilterSpec<Selection>
-
isNotIn
public Selection isNotIn(Collection<String> strings)
- Specified by:
isNotInin interfaceStringFilters- Specified by:
isNotInin interfaceStringFilterSpec<Selection>
-
firstIndexOf
public int firstIndexOf(String value)
-
countOccurrences
public int countOccurrences(String value)
-
asObjectArray
public String[] asObjectArray()
Returns an array of objects as appropriate for my type of column- Specified by:
asObjectArrayin interfaceColumn<String>
-
asStringColumn
public StringColumn asStringColumn()
Returns a StringColumn consisting of the (unformatted) String representation of this column values- Specified by:
asStringColumnin interfaceColumn<String>- Overrides:
asStringColumnin classAbstractColumn<StringColumn,String>- Returns:
- a
StringColumnbuilt using the columnColumn.getUnformattedString(int)method
-
getDictionary
@Nullable public DictionaryMap getDictionary()
For tablesaw internal use Note: This method returns null if the stringDataType is TEXTUAL
-
getString
public String getString(int row)
Returns a string representation of the value at the given row.- Specified by:
getStringin interfaceColumn<String>- Specified by:
getStringin interfaceStringMapFunctions- Parameters:
row- The index of the row.- Returns:
- value as String
-
getUnformattedString
public String getUnformattedString(int row)
Returns a String representation of the value at index r, without any formatting applied- Specified by:
getUnformattedStringin interfaceColumn<String>
-
top
public List<String> top(int n)
Returns the largest ("top") n values in the column- Parameters:
n- The maximum number of records to return. The actual number will be smaller if n is greater than the number of observations in the column- Returns:
- A list, possibly empty, of the largest observations
-
bottom
public List<String> bottom(int n)
Returns the smallest ("bottom") n values in the column- Parameters:
n- The maximum number of records to return. The actual number will be smaller if n is greater than the number of observations in the column- Returns:
- A list, possibly empty, of the smallest n observations
-
append
public Column<String> append(Column<String> column, int row)
Appends the value at the given row in the given column to the bottom of this column and return this column
-
set
public Column<String> set(int row, Column<String> column, int sourceRow)
Sets the value at row to the value at sourceRow in the given column and return this column
-
byteSize
public int byteSize()
Returns the width of a cell in this column, in bytes.
-
compare
public int compare(String o1, String o2)
- Specified by:
comparein interfaceComparator<String>
-
-