Package tech.tablesaw.api
Class StringColumn
- java.lang.Object
-
- tech.tablesaw.columns.AbstractColumn<StringColumn,String>
-
- tech.tablesaw.api.StringColumn
-
- All Implemented Interfaces:
Iterable<String>
,Comparator<String>
,CategoricalColumn<String>
,Column<String>
,StringFilters
,StringMapFunctions
,StringReduceUtils
,FilterSpec<Selection>
,StringFilterSpec<Selection>
public class StringColumn extends AbstractColumn<StringColumn,String> implements CategoricalColumn<String>, StringFilters, StringMapFunctions, StringReduceUtils
A column that contains String values. They are assumed to be 'categorical' rather than free-form text, so are stored in an encoding that takes advantage of the expected repetition of string values.Because the MISSING_VALUE for this column type is an empty string, there is little or no need for special handling of missing values in this class's methods.
-
-
Field Summary
-
Fields inherited from class tech.tablesaw.columns.AbstractColumn
DEFAULT_ARRAY_SIZE
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description StringColumn
addAll(List<String> stringValues)
Add all the strings in the list to this columnStringColumn
append(String value)
Added for naming consistency with all other columnsStringColumn
append(Column<String> column)
Appends all the values in the argument to the bottom of this column and return this columnColumn<String>
append(Column<String> column, int row)
Appends the value at the given row in the given column to the bottom of this column and return this columnStringColumn
appendCell(String object)
Add one element to the bottom of this column and set its value to the parsed value of the given String.StringColumn
appendCell(String object, AbstractColumnParser<?> parser)
Add one element to the bottom of this column and set its value to the parsed value of the given String, as performed by the given parserStringColumn
appendMissing()
Appends a missing value appropriate to the columnStringColumn
appendObj(Object obj)
Appends the given value to the bottom of this column and return this columnbyte[]
asBytes(int rowNumber)
Returns the contents of the cell at rowNumber as a byte[]double[]
asDoubleArray()
DoubleColumn
asDoubleColumn()
List<String>
asList()
Returns a List<String> representation of all the values in this columnString[]
asObjectArray()
Returns an array of objects as appropriate for my type of columnSet<String>
asSet()
Returns a Set containing all the unique values in this columnStringColumn
asStringColumn()
Returns a StringColumn consisting of the (unformatted) String representation of this column valuesList<String>
bottom(int n)
Returns the smallest ("bottom") n values in the columnint
byteSize()
Returns the width of a cell in this column, in bytes.void
clear()
Removes all elements TODO: Make this return this columnint
compare(String o1, String o2)
boolean
contains(String aString)
Returns true if this column contains a cell with the given string, and false otherwiseStringColumn
copy()
Returns a deep copy of the receiverTable
countByCategory()
Returns a count of the number of elements in each category (i.e., the number of repetitions of each value) TODO: This needs to be well tested, especially for IntColumnint
countMissing()
Returns the count of missing values in this columnint
countOccurrences(String value)
int
countUnique()
Returns the count of unique values in this column.static StringColumn
create(String name)
static StringColumn
create(String name, int size)
static StringColumn
create(String name, String... strings)
static StringColumn
create(String name, Collection<String> strings)
static StringColumn
create(String name, Stream<String> stream)
static StringColumn
createInternal(String name, DictionaryMap map)
StringColumn
emptyCopy()
Returns a copy of the receiver with no data.StringColumn
emptyCopy(int rowSize)
Returns an empty copy of the receiver, with its internal storage initialized to the given row size.boolean
equals(int rowNumber1, int rowNumber2)
Returns true if the value in this column at rowNumber1 is equal to the value at rowNumber2int
firstIndexOf(String value)
String
get(int rowIndex)
Returns the value at rowIndex in this column.DictionaryMap
getDictionary()
For tablesaw internal use Note: This method returns null if the stringDataType is TEXTUALdouble
getDouble(int i)
List<BooleanColumn>
getDummies()
Returns a list of boolean columns suitable for use as dummy variables in, for example, regression analysis, select a column of categorical data must be encoded as a list of columns, such that each column represents a single category and indicates whether it is present (1) or not present (0)StringColumnFormatter
getPrintFormatter()
Returns the currentStringColumnFormatter
.String
getString(int row)
Returns a string representation of the value at the given row.String
getUnformattedString(int row)
Returns a String representation of the value at index r, without any formatting appliedboolean
isEmpty()
Returns true if the column has no dataSelection
isEqualTo(String string)
Selection
isIn(String... strings)
Selection
isIn(Collection<String> strings)
Selection
isMissing()
Returns a selection containing an index for every missing value in this columnboolean
isMissing(int rowNumber)
Returns true if the value at rowNumber is missingSelection
isNotEqualTo(String string)
Selection
isNotIn(String... strings)
Selection
isNotIn(Collection<String> strings)
Selection
isNotMissing()
Returns a selection containing an index for every non-missing value in this columnIterator<String>
iterator()
StringColumn
lag(int n)
Returns a column of the same type and size as the receiver, containing the receivers values offset by n.StringColumn
lead(int n)
Returns a column of the same type as the receiver, containing the receivers values offset -n For example if you lead a column containing 2, 3, 4 by 1, you get a column containing 3, 4, NA.StringColumn
removeMissing()
Returns a copy of this column with the missing values removedit.unimi.dsi.fastutil.ints.IntComparator
rowComparator()
Returns an IntComparator for sorting my rowsStringColumn
set(int rowIndex, String stringValue)
Sets the value at index row to the given value and return this columnColumn<String>
set(int row, Column<String> column, int sourceRow)
Sets the value at row to the value at sourceRow in the given column and return this columnStringColumn
set(Selection rowSelection, String newValue)
Conditionally update this column, replacing current values with newValue for all rows where the current value matches the selection criteriaStringColumn
setMissing(int i)
Sets the value at index i to the missing-value indicator for this column type, and return this columnvoid
setPrintFormatter(StringColumnFormatter formatter)
Sets anStringColumnFormatter
which will be used to format the display of data from this column when it is printed (using, for example, Table:print()) and optionally when written to a text file like a CSV.int
size()
Returns the number of elements (a.k.a.void
sortAscending()
Sorts my values in ascending ordervoid
sortDescending()
Sorts my values in descending orderTable
summary()
Returns a table containing a ColumnType specific summary of the data in this columnList<String>
top(int n)
Returns the largest ("top") n values in the columnStringColumn
unique()
Returns a new Column containing all the unique values in this columnint
valueHash(int rowNumber)
Returns an int suitable as a hash for the value in this column at the given indexstatic boolean
valueIsMissing(String string)
StringColumn
where(Selection selection)
Returns a new column containing the subset referenced by theSelection
-
Methods inherited from class tech.tablesaw.columns.AbstractColumn
filter, first, indexOf, inRange, last, lastIndexOf, map, max, min, name, parser, sampleN, sampleX, set, setName, setParser, sorted, subset, toString, type
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface tech.tablesaw.columns.Column
allMatch, anyMatch, columnWidth, count, count, filter, first, indexOf, inRange, interpolate, last, lastIndexOf, map, map, mapInto, max, max, min, min, name, noneMatch, parser, print, reduce, reduce, rolling, sampleN, sampleX, set, set, set, setMissingTo, setName, setParser, sorted, subset, title, type
-
Methods inherited from interface java.util.Comparator
equals, reversed, thenComparing, thenComparing, thenComparing, thenComparingDouble, thenComparingInt, thenComparingLong
-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Methods inherited from interface tech.tablesaw.columns.strings.StringFilters
containsString, endsWith, equalsIgnoreCase, equalsIgnoreCase, eval, eval, eval, eval, isAlpha, isAlphaNumeric, isEmptyString, isEqualTo, isIn, isLongerThan, isLowerCase, isNotEqualTo, isNotIn, isNumeric, isShorterThan, isUpperCase, lengthEquals, matchesRegex, startsWith, startsWith
-
Methods inherited from interface tech.tablesaw.columns.strings.StringMapFunctions
abbreviate, capitalize, commonPrefix, commonSuffix, concatenate, concatenate, countTokens, distance, format, join, length, lowerCase, name, padEnd, padStart, parseDouble, parseFloat, parseInt, repeat, replaceAll, replaceAll, replaceFirst, substring, substring, tokenizeAndRemoveDuplicates, tokenizeAndSort, tokenizeAndSort, tokens, trim, uniqueTokens, upperCase
-
Methods inherited from interface tech.tablesaw.columns.strings.StringReduceUtils
appendAll, appendAll
-
-
-
-
Method Detail
-
valueIsMissing
public static boolean valueIsMissing(String string)
-
appendMissing
public StringColumn appendMissing()
Appends a missing value appropriate to the column- Specified by:
appendMissing
in interfaceColumn<String>
-
valueHash
public int valueHash(int rowNumber)
Returns an int suitable as a hash for the value in this column at the given index
-
equals
public boolean equals(int rowNumber1, int rowNumber2)
Returns true if the value in this column at rowNumber1 is equal to the value at rowNumber2
-
create
public static StringColumn create(String name)
-
create
public static StringColumn create(String name, String... strings)
-
create
public static StringColumn create(String name, Collection<String> strings)
-
createInternal
public static StringColumn createInternal(String name, DictionaryMap map)
-
create
public static StringColumn create(String name, int size)
-
create
public static StringColumn create(String name, Stream<String> stream)
-
setPrintFormatter
public void setPrintFormatter(StringColumnFormatter formatter)
Sets anStringColumnFormatter
which will be used to format the display of data from this column when it is printed (using, for example, Table:print()) and optionally when written to a text file like a CSV.
-
getPrintFormatter
public StringColumnFormatter getPrintFormatter()
Returns the currentStringColumnFormatter
.
-
isMissing
public boolean isMissing(int rowNumber)
Returns true if the value at rowNumber is missing
-
emptyCopy
public StringColumn emptyCopy()
Returns a copy of the receiver with no data. The column name and type are the same.- Specified by:
emptyCopy
in interfaceColumn<String>
- Specified by:
emptyCopy
in classAbstractColumn<StringColumn,String>
- Returns:
- a empty copy of
Column
-
emptyCopy
public StringColumn emptyCopy(int rowSize)
Returns an empty copy of the receiver, with its internal storage initialized to the given row size.
-
sortAscending
public void sortAscending()
Sorts my values in ascending order- Specified by:
sortAscending
in interfaceColumn<String>
-
sortDescending
public void sortDescending()
Sorts my values in descending order- Specified by:
sortDescending
in interfaceColumn<String>
-
size
public int size()
Returns the number of elements (a.k.a. rows or cells) in the column- Specified by:
size
in interfaceColumn<String>
- Specified by:
size
in interfaceStringFilters
- Specified by:
size
in interfaceStringMapFunctions
- Specified by:
size
in interfaceStringReduceUtils
- Returns:
- size as int
-
get
public String get(int rowIndex)
Returns the value at rowIndex in this column. The index is zero-based.- Specified by:
get
in interfaceColumn<String>
- Specified by:
get
in interfaceStringFilters
- Parameters:
rowIndex
- index of the row- Returns:
- value as String
- Throws:
IndexOutOfBoundsException
- if the given rowIndex is not in the column
-
asList
public List<String> asList()
Returns a List<String> representation of all the values in this columnNOTE: Unless you really need a string consider using the column itself for large datasets as it uses much less memory
-
summary
public Table summary()
Returns a table containing a ColumnType specific summary of the data in this column
-
countByCategory
public Table countByCategory()
Returns a count of the number of elements in each category (i.e., the number of repetitions of each value) TODO: This needs to be well tested, especially for IntColumn- Specified by:
countByCategory
in interfaceCategoricalColumn<String>
-
clear
public void clear()
Removes all elements TODO: Make this return this column
-
lead
public StringColumn lead(int n)
Returns a column of the same type as the receiver, containing the receivers values offset -n For example if you lead a column containing 2, 3, 4 by 1, you get a column containing 3, 4, NA.
-
lag
public StringColumn lag(int n)
Returns a column of the same type and size as the receiver, containing the receivers values offset by n.For example if you lag a column containing 2, 3, 4 by 1, you get a column containing NA, 2, 3
-
set
public StringColumn set(Selection rowSelection, String newValue)
Conditionally update this column, replacing current values with newValue for all rows where the current value matches the selection criteriaExamples: myCatColumn.set(myCatColumn.isEqualTo("Cat"), "Dog"); // no more cats myCatColumn.set(myCatColumn.valueIsMissing(), "Fox"); // no more missing values
- Specified by:
set
in interfaceColumn<String>
- Overrides:
set
in classAbstractColumn<StringColumn,String>
-
set
public StringColumn set(int rowIndex, String stringValue)
Sets the value at index row to the given value and return this column
-
countUnique
public int countUnique()
Returns the count of unique values in this column.- Specified by:
countUnique
in interfaceColumn<String>
- Returns:
- unique values as int
-
contains
public boolean contains(String aString)
Returns true if this column contains a cell with the given string, and false otherwise
-
setMissing
public StringColumn setMissing(int i)
Sets the value at index i to the missing-value indicator for this column type, and return this column- Specified by:
setMissing
in interfaceColumn<String>
-
addAll
public StringColumn addAll(List<String> stringValues)
Add all the strings in the list to this column- Parameters:
stringValues
- a list of values
-
appendCell
public StringColumn appendCell(String object)
Add one element to the bottom of this column and set its value to the parsed value of the given String. Parsing is type-specific- Specified by:
appendCell
in interfaceColumn<String>
-
appendCell
public StringColumn appendCell(String object, AbstractColumnParser<?> parser)
Add one element to the bottom of this column and set its value to the parsed value of the given String, as performed by the given parser- Specified by:
appendCell
in interfaceColumn<String>
-
rowComparator
public it.unimi.dsi.fastutil.ints.IntComparator rowComparator()
Returns an IntComparator for sorting my rows- Specified by:
rowComparator
in interfaceColumn<String>
-
isMissing
public Selection isMissing()
Description copied from interface:Column
Returns a selection containing an index for every missing value in this column- Specified by:
isMissing
in interfaceColumn<String>
- Specified by:
isMissing
in interfaceFilterSpec<Selection>
- Specified by:
isMissing
in interfaceStringFilters
-
isNotMissing
public Selection isNotMissing()
Description copied from interface:Column
Returns a selection containing an index for every non-missing value in this column- Specified by:
isNotMissing
in interfaceColumn<String>
- Specified by:
isNotMissing
in interfaceFilterSpec<Selection>
- Specified by:
isNotMissing
in interfaceStringFilters
-
isEmpty
public boolean isEmpty()
Returns true if the column has no data
-
isEqualTo
public Selection isEqualTo(String string)
- Specified by:
isEqualTo
in interfaceStringFilters
- Specified by:
isEqualTo
in interfaceStringFilterSpec<Selection>
-
isNotEqualTo
public Selection isNotEqualTo(String string)
- Specified by:
isNotEqualTo
in interfaceStringFilters
- Specified by:
isNotEqualTo
in interfaceStringFilterSpec<Selection>
-
getDummies
public List<BooleanColumn> getDummies()
Returns a list of boolean columns suitable for use as dummy variables in, for example, regression analysis, select a column of categorical data must be encoded as a list of columns, such that each column represents a single category and indicates whether it is present (1) or not present (0)- Returns:
- a list of
BooleanColumn
-
unique
public StringColumn unique()
Returns a new Column containing all the unique values in this column
-
asDoubleColumn
public DoubleColumn asDoubleColumn()
-
where
public StringColumn where(Selection selection)
Returns a new column containing the subset referenced by theSelection
-
copy
public StringColumn copy()
Returns a deep copy of the receiver
-
append
public StringColumn append(Column<String> column)
Appends all the values in the argument to the bottom of this column and return this column
-
countMissing
public int countMissing()
Returns the count of missing values in this column- Specified by:
countMissing
in interfaceColumn<String>
- Returns:
- missing values as int
-
removeMissing
public StringColumn removeMissing()
Returns a copy of this column with the missing values removed- Specified by:
removeMissing
in interfaceColumn<String>
-
asSet
public Set<String> asSet()
Description copied from interface:Column
Returns a Set containing all the unique values in this column
-
asBytes
public byte[] asBytes(int rowNumber)
Returns the contents of the cell at rowNumber as a byte[]
-
getDouble
public double getDouble(int i)
-
asDoubleArray
public double[] asDoubleArray()
-
append
public StringColumn append(String value)
Added for naming consistency with all other columns
-
appendObj
public StringColumn appendObj(Object obj)
Appends the given value to the bottom of this column and return this column
-
isIn
public Selection isIn(String... strings)
- Specified by:
isIn
in interfaceStringFilters
- Specified by:
isIn
in interfaceStringFilterSpec<Selection>
-
isIn
public Selection isIn(Collection<String> strings)
- Specified by:
isIn
in interfaceStringFilters
- Specified by:
isIn
in interfaceStringFilterSpec<Selection>
-
isNotIn
public Selection isNotIn(String... strings)
- Specified by:
isNotIn
in interfaceStringFilters
- Specified by:
isNotIn
in interfaceStringFilterSpec<Selection>
-
isNotIn
public Selection isNotIn(Collection<String> strings)
- Specified by:
isNotIn
in interfaceStringFilters
- Specified by:
isNotIn
in interfaceStringFilterSpec<Selection>
-
firstIndexOf
public int firstIndexOf(String value)
-
countOccurrences
public int countOccurrences(String value)
-
asObjectArray
public String[] asObjectArray()
Returns an array of objects as appropriate for my type of column- Specified by:
asObjectArray
in interfaceColumn<String>
-
asStringColumn
public StringColumn asStringColumn()
Returns a StringColumn consisting of the (unformatted) String representation of this column values- Specified by:
asStringColumn
in interfaceColumn<String>
- Overrides:
asStringColumn
in classAbstractColumn<StringColumn,String>
- Returns:
- a
StringColumn
built using the columnColumn.getUnformattedString(int)
method
-
getDictionary
@Nullable public DictionaryMap getDictionary()
For tablesaw internal use Note: This method returns null if the stringDataType is TEXTUAL
-
getString
public String getString(int row)
Returns a string representation of the value at the given row.- Specified by:
getString
in interfaceColumn<String>
- Specified by:
getString
in interfaceStringMapFunctions
- Parameters:
row
- The index of the row.- Returns:
- value as String
-
getUnformattedString
public String getUnformattedString(int row)
Returns a String representation of the value at index r, without any formatting applied- Specified by:
getUnformattedString
in interfaceColumn<String>
-
top
public List<String> top(int n)
Returns the largest ("top") n values in the column- Parameters:
n
- The maximum number of records to return. The actual number will be smaller if n is greater than the number of observations in the column- Returns:
- A list, possibly empty, of the largest observations
-
bottom
public List<String> bottom(int n)
Returns the smallest ("bottom") n values in the column- Parameters:
n
- The maximum number of records to return. The actual number will be smaller if n is greater than the number of observations in the column- Returns:
- A list, possibly empty, of the smallest n observations
-
append
public Column<String> append(Column<String> column, int row)
Appends the value at the given row in the given column to the bottom of this column and return this column
-
set
public Column<String> set(int row, Column<String> column, int sourceRow)
Sets the value at row to the value at sourceRow in the given column and return this column
-
byteSize
public int byteSize()
Returns the width of a cell in this column, in bytes.
-
compare
public int compare(String o1, String o2)
- Specified by:
compare
in interfaceComparator<String>
-
-