Package tech.tablesaw.api
Class StringColumn
- All Implemented Interfaces:
Iterable<String>
,Comparator<String>
,CategoricalColumn<String>
,Column<String>
,StringFilters
,StringMapFunctions
,StringReduceUtils
,FilterSpec<Selection>
,StringFilterSpec<Selection>
A column that contains String values. They are assumed to be 'categorical' rather than free-form
text, so are stored in an encoding that takes advantage of the expected repetition of string
values.
Because the MISSING_VALUE for this column type is an empty string, there is little or no need for special handling of missing values in this class's methods.
-
Field Summary
Fields inherited from class tech.tablesaw.columns.AbstractColumn
DEFAULT_ARRAY_SIZE, DEFAULT_COLUMN_TYPE_MISMATCH_MESSAGE
-
Method Summary
Modifier and TypeMethodDescriptionAdd all the strings in the list to this columnAdded for naming consistency with all other columnsAppends all the values in the argument to the bottom of this column and return this columnappendCell
(String object) Add one element to the bottom of this column and set its value to the parsed value of the given String.appendCell
(String object, AbstractColumnParser<?> parser) Add one element to the bottom of this column and set its value to the parsed value of the given String, as performed by the given parserAppends a missing value appropriate to the columnAppends the given value to the bottom of this column and return this columnbyte[]
asBytes
(int rowNumber) Returns the contents of the cell at rowNumber as a byte[]double[]
asList()
Returns a List<String> representation of all the values in this columnString[]
Returns an array of objects as appropriate for my type of columnasSet()
Returns a Set containing all the unique values in this columnReturns a StringColumn consisting of the (unformatted) String representation of this column valuesvoid
clear()
Removes all elements TODO: Make this return this columnboolean
Returns true if this column contains a cell with the given string, and false otherwisecopy()
Returns a deep copy of the receiverReturns a count of the number of elements in each category (i.e., the number of repetitions of each value) TODO: This needs to be well tested, especially for IntColumnint
Returns the count of missing values in this columnint
countOccurrences
(String value) int
Returns the count of unique values in this column.static StringColumn
static StringColumn
static StringColumn
static StringColumn
create
(String name, Collection<String> strings) static StringColumn
static StringColumn
createInternal
(String name, DictionaryMap map) Returns a copy of the receiver with no data.emptyCopy
(int rowSize) Returns an empty copy of the receiver, with its internal storage initialized to the given row size.boolean
equals
(int rowNumber1, int rowNumber2) Returns true if the value in this column at rowNumber1 is equal to the value at rowNumber2int
firstIndexOf
(String value) get
(int rowIndex) Returns the value at rowIndex in this column.For tablesaw internal use onlydouble
getDouble
(int i) Returns a list of boolean columns suitable for use as dummy variables in, for example, regression analysis, select a column of categorical data must be encoded as a list of columns, such that each column represents a single category and indicates whether it is present (1) or not present (0)boolean
isEmpty()
Returns true if the column has no dataisIn
(Collection<String> strings) boolean
isMissing
(int rowNumber) Returns true if the value at rowNumber is missingisNotEqualTo
(String string) isNotIn
(Collection<String> strings) iterator()
lag
(int n) Returns a column of the same type and size as the receiver, containing the receivers values offset by n.lead
(int n) Returns a column of the same type as the receiver, containing the receivers values offset -n For example if you lead a column containing 2, 3, 4 by 1, you get a column containing 3, 4, NA.Returns a copy of this column with the missing values removedit.unimi.dsi.fastutil.ints.IntComparator
Returns an IntComparator for sorting my rowsSets the value at index row to the given value and return this columnConditionally update this column, replacing current values with newValue for all rows where the current value matches the selection criteriasetMissing
(int i) Sets the value at index i to the missing-value indicator for this column type, and return this columnint
size()
Returns the number of elements (a.k.a. rows or cells) in the columnvoid
Sorts my values in ascending ordervoid
Sorts my values in descending ordersummary()
Returns a table containing a ColumnType specific summary of the data in this columnunique()
Returns a new Column containing all the unique values in this columnint
valueHash
(int rowNumber) Returns an int suitable as a hash for the value in this column at the given indexstatic boolean
valueIsMissing
(String string) Returns a new column containing the subset referenced by theSelection
Methods inherited from class tech.tablesaw.columns.strings.AbstractStringColumn
append, bottom, byteSize, compare, getPrintFormatter, getString, getUnformattedString, set, setPrintFormatter, top
Methods inherited from class tech.tablesaw.columns.AbstractColumn
filter, first, indexOf, inRange, last, map, max, min, name, parser, sampleN, sampleX, set, setName, setParser, sorted, subset, toString, type
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface tech.tablesaw.columns.Column
allMatch, anyMatch, columnWidth, count, count, filter, first, indexOf, inRange, interpolate, last, map, map, mapInto, max, max, min, min, name, noneMatch, parser, print, reduce, reduce, rolling, sampleN, sampleX, set, set, set, setMissingTo, setName, setParser, sorted, subset, title, type
Methods inherited from interface java.util.Comparator
equals, reversed, thenComparing, thenComparing, thenComparing, thenComparingDouble, thenComparingInt, thenComparingLong
Methods inherited from interface java.lang.Iterable
forEach, spliterator
Methods inherited from interface tech.tablesaw.columns.strings.StringFilters
containsString, endsWith, equalsIgnoreCase, equalsIgnoreCase, eval, eval, eval, eval, isAlpha, isAlphaNumeric, isEmptyString, isEqualTo, isIn, isLongerThan, isLowerCase, isMissing, isNotEqualTo, isNotIn, isNotMissing, isNumeric, isShorterThan, isUpperCase, lengthEquals, matchesRegex, startsWith, startsWith
Methods inherited from interface tech.tablesaw.columns.strings.StringMapFunctions
abbreviate, capitalize, commonPrefix, commonSuffix, concatenate, concatenate, countTokens, distance, format, join, length, lowerCase, padEnd, padStart, parseDouble, parseFloat, parseInt, repeat, replaceAll, replaceAll, replaceFirst, substring, substring, tokenizeAndRemoveDuplicates, tokenizeAndSort, tokenizeAndSort, tokens, trim, uniqueTokens, upperCase
Methods inherited from interface tech.tablesaw.columns.strings.StringReduceUtils
appendAll, appendAll
-
Method Details
-
valueIsMissing
-
appendMissing
Appends a missing value appropriate to the column -
valueHash
public int valueHash(int rowNumber) Returns an int suitable as a hash for the value in this column at the given index -
equals
public boolean equals(int rowNumber1, int rowNumber2) Returns true if the value in this column at rowNumber1 is equal to the value at rowNumber2 -
create
-
create
-
create
-
createInternal
-
create
-
create
-
isMissing
public boolean isMissing(int rowNumber) Returns true if the value at rowNumber is missing -
emptyCopy
Returns a copy of the receiver with no data. The column name and type are the same.- Returns:
- a empty copy of
Column
-
emptyCopy
Returns an empty copy of the receiver, with its internal storage initialized to the given row size.- Parameters:
rowSize
- the initial row size- Returns:
- a
Column
-
sortAscending
public void sortAscending()Sorts my values in ascending order -
sortDescending
public void sortDescending()Sorts my values in descending order -
size
public int size()Returns the number of elements (a.k.a. rows or cells) in the column- Returns:
- size as int
-
get
Returns the value at rowIndex in this column. The index is zero-based.- Parameters:
rowIndex
- index of the row- Returns:
- value as String
- Throws:
IndexOutOfBoundsException
- if the given rowIndex is not in the column
-
asList
Returns a List<String> representation of all the values in this columnNOTE: Unless you really need a string consider using the column itself for large datasets as it uses much less memory
- Returns:
- values as a list of String.
-
summary
Returns a table containing a ColumnType specific summary of the data in this column -
countByCategory
Returns a count of the number of elements in each category (i.e., the number of repetitions of each value) TODO: This needs to be well tested, especially for IntColumn -
clear
public void clear()Removes all elements TODO: Make this return this column -
lead
Returns a column of the same type as the receiver, containing the receivers values offset -n For example if you lead a column containing 2, 3, 4 by 1, you get a column containing 3, 4, NA. -
lag
Returns a column of the same type and size as the receiver, containing the receivers values offset by n.For example if you lag a column containing 2, 3, 4 by 1, you get a column containing NA, 2, 3
-
set
Conditionally update this column, replacing current values with newValue for all rows where the current value matches the selection criteriaExamples: myCatColumn.set(myCatColumn.isEqualTo("Cat"), "Dog"); // no more cats myCatColumn.set(myCatColumn.valueIsMissing(), "Fox"); // no more missing values
- Specified by:
set
in interfaceColumn<String>
- Overrides:
set
in classAbstractColumn<StringColumn,
String>
-
set
Sets the value at index row to the given value and return this column -
countUnique
public int countUnique()Returns the count of unique values in this column.- Returns:
- unique values as int
-
contains
Returns true if this column contains a cell with the given string, and false otherwise- Parameters:
aString
- the value to look for- Returns:
- true if contains, false otherwise
-
setMissing
Sets the value at index i to the missing-value indicator for this column type, and return this column -
addAll
Add all the strings in the list to this column- Parameters:
stringValues
- a list of values
-
appendCell
Add one element to the bottom of this column and set its value to the parsed value of the given String. Parsing is type-specific -
appendCell
Add one element to the bottom of this column and set its value to the parsed value of the given String, as performed by the given parser -
rowComparator
public it.unimi.dsi.fastutil.ints.IntComparator rowComparator()Returns an IntComparator for sorting my rows -
isEmpty
public boolean isEmpty()Returns true if the column has no data- Returns:
- true if empty, false if not
-
isEqualTo
-
isNotEqualTo
-
getDummies
Returns a list of boolean columns suitable for use as dummy variables in, for example, regression analysis, select a column of categorical data must be encoded as a list of columns, such that each column represents a single category and indicates whether it is present (1) or not present (0)- Returns:
- a list of
BooleanColumn
-
unique
Returns a new Column containing all the unique values in this column- Returns:
- a column with unique values.
-
asDoubleColumn
-
where
Returns a new column containing the subset referenced by theSelection
-
copy
Returns a deep copy of the receiver- Returns:
- a
Column
-
append
Appends all the values in the argument to the bottom of this column and return this column -
countMissing
public int countMissing()Returns the count of missing values in this column- Returns:
- missing values as int
-
removeMissing
Returns a copy of this column with the missing values removed -
iterator
-
asSet
Description copied from interface:Column
Returns a Set containing all the unique values in this column -
asBytes
public byte[] asBytes(int rowNumber) Returns the contents of the cell at rowNumber as a byte[]- Parameters:
rowNumber
- index of the row- Returns:
- content as byte[]
-
getDouble
public double getDouble(int i) -
asDoubleArray
public double[] asDoubleArray() -
append
Added for naming consistency with all other columns -
appendObj
Appends the given value to the bottom of this column and return this column -
isIn
-
isIn
-
isNotIn
-
isNotIn
-
firstIndexOf
-
countOccurrences
-
asObjectArray
Returns an array of objects as appropriate for my type of column -
asStringColumn
Returns a StringColumn consisting of the (unformatted) String representation of this column values- Specified by:
asStringColumn
in interfaceColumn<String>
- Overrides:
asStringColumn
in classAbstractColumn<StringColumn,
String> - Returns:
- a
StringColumn
built using the columnColumn.getUnformattedString(int)
method
-
asTextColumn
-
getDictionary
For tablesaw internal use only
-