Package tech.tablesaw.columns.strings
Class TextualStringData
- java.lang.Object
-
- tech.tablesaw.columns.strings.TextualStringData
-
- All Implemented Interfaces:
Iterable<String>
,StringData
,StringFilters
,StringReduceUtils
,FilterSpec<Selection>
,StringFilterSpec<Selection>
public class TextualStringData extends Object implements StringData
A column that contains String values. They are assumed to be free-form text. For categorical data, use stringColumnThis is the default column type for SQL longvarchar and longnvarchar types
Because the MISSING_VALUE for this column type is an empty string, there is little or no need for special handling of missing values in this class's methods.
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description TextualStringData
addAll(List<String> stringValues)
Add all the strings in the list to this columnTextualStringData
append(String value)
Added for naming consistency with all other columnsvoid
append(Column<String> column)
TextualStringData
appendMissing()
TextualStringData
appendObj(Object obj)
byte[]
asBytes(int rowNumber)
Returns the contents of the cell at rowNumber as a byte[]double[]
asDoubleArray()
List<String>
asList()
Returns a List<String> representation of all the values in this columnString[]
asObjectArray()
Set<String>
asSet()
void
clear()
boolean
contains(String aString)
Returns true if this column contains a cell with the given string, and false otherwiseTextualStringData
copy()
Table
countByCategory(String columnName)
int
countMissing()
Returns the count of missing values in this columnint
countOccurrences(String value)
int
countUnique()
static TextualStringData
create()
static TextualStringData
create(int size)
static TextualStringData
create(String... strings)
static TextualStringData
create(Collection<String> strings)
static TextualStringData
create(Stream<String> stream)
TextualStringData
emptyCopy()
TextualStringData
emptyCopy(int rowSize)
boolean
equals(int rowNumber1, int rowNumber2)
int
firstIndexOf(String value)
String
get(int rowIndex)
Returns the value at rowIndex in this column.DictionaryMap
getDictionary()
Returns null, as this Column is not backed by a dictionaryMapdouble
getDouble(int i)
Returns a double that can stand in for the string at index i in some ML applicationsList<BooleanColumn>
getDummies()
Unsupported Operation This can't be used on a text column as the number of BooleanColumns would likely be excessiveboolean
isEmpty()
Selection
isIn(String... strings)
Selection
isIn(Collection<String> strings)
boolean
isMissing(int rowNumber)
Selection
isNotIn(String... strings)
Selection
isNotIn(Collection<String> strings)
Iterator<String>
iterator()
TextualStringData
lag(int n)
TextualStringData
lead(int n)
TextualStringData
removeMissing()
it.unimi.dsi.fastutil.ints.IntComparator
rowComparator()
TextualStringData
set(int rowIndex, String stringValue)
TextualStringData
set(Selection rowSelection, String newValue)
Conditionally update this column, replacing current values with newValue for all rows where the current value matches the selection criteriaTextualStringData
setMissing(int i)
int
size()
Returns the number of elements (a.k.a.void
sortAscending()
void
sortDescending()
Table
summary()
TextualStringData
unique()
Returns a new Column containing all the unique values in this columnint
valueHash(int rowNumber)
static boolean
valueIsMissing(String string)
TextualStringData
where(Selection selection)
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Methods inherited from interface tech.tablesaw.columns.strings.StringData
subset
-
Methods inherited from interface tech.tablesaw.columns.strings.StringFilters
containsString, endsWith, equalsIgnoreCase, equalsIgnoreCase, eval, eval, eval, eval, isAlpha, isAlphaNumeric, isEmptyString, isEqualTo, isEqualTo, isIn, isLongerThan, isLowerCase, isMissing, isNotEqualTo, isNotEqualTo, isNotIn, isNotMissing, isNumeric, isShorterThan, isUpperCase, lengthEquals, matchesRegex, startsWith, startsWith
-
Methods inherited from interface tech.tablesaw.columns.strings.StringReduceUtils
appendAll, appendAll
-
-
-
-
Method Detail
-
valueHash
public int valueHash(int rowNumber)
-
equals
public boolean equals(int rowNumber1, int rowNumber2)
-
valueIsMissing
public static boolean valueIsMissing(String string)
-
appendMissing
public TextualStringData appendMissing()
- Specified by:
appendMissing
in interfaceStringData
-
create
public static TextualStringData create()
-
create
public static TextualStringData create(String... strings)
-
create
public static TextualStringData create(Collection<String> strings)
-
create
public static TextualStringData create(int size)
-
create
public static TextualStringData create(Stream<String> stream)
-
isMissing
public boolean isMissing(int rowNumber)
- Specified by:
isMissing
in interfaceStringData
-
emptyCopy
public TextualStringData emptyCopy()
- Specified by:
emptyCopy
in interfaceStringData
-
emptyCopy
public TextualStringData emptyCopy(int rowSize)
- Specified by:
emptyCopy
in interfaceStringData
-
sortAscending
public void sortAscending()
- Specified by:
sortAscending
in interfaceStringData
-
sortDescending
public void sortDescending()
- Specified by:
sortDescending
in interfaceStringData
-
size
public int size()
Returns the number of elements (a.k.a. rows or cells) in the column- Specified by:
size
in interfaceStringFilters
- Specified by:
size
in interfaceStringReduceUtils
- Returns:
- size as int
-
get
public String get(int rowIndex)
Returns the value at rowIndex in this column. The index is zero-based.- Specified by:
get
in interfaceStringFilters
- Parameters:
rowIndex
- index of the row- Returns:
- value as String
- Throws:
IndexOutOfBoundsException
- if the given rowIndex is not in the column
-
asList
public List<String> asList()
Returns a List<String> representation of all the values in this columnNOTE: Unless you really need a string consider using the column itself for large datasets as it uses much less memory
- Specified by:
asList
in interfaceStringData
- Returns:
- values as a list of String.
-
countByCategory
public Table countByCategory(String columnName)
- Specified by:
countByCategory
in interfaceStringData
-
summary
public Table summary()
-
clear
public void clear()
- Specified by:
clear
in interfaceStringData
-
lead
public TextualStringData lead(int n)
- Specified by:
lead
in interfaceStringData
-
lag
public TextualStringData lag(int n)
- Specified by:
lag
in interfaceStringData
-
set
public TextualStringData set(Selection rowSelection, String newValue)
Conditionally update this column, replacing current values with newValue for all rows where the current value matches the selection criteriaExamples: myCatColumn.set(myCatColumn.isEqualTo("Cat"), "Dog"); // no more cats myCatColumn.set(myCatColumn.valueIsMissing(), "Fox"); // no more missing values
- Specified by:
set
in interfaceStringData
-
set
public TextualStringData set(int rowIndex, String stringValue)
- Specified by:
set
in interfaceStringData
-
countUnique
public int countUnique()
- Specified by:
countUnique
in interfaceStringData
-
contains
public boolean contains(String aString)
Returns true if this column contains a cell with the given string, and false otherwise- Specified by:
contains
in interfaceStringData
- Parameters:
aString
- the value to look for- Returns:
- true if contains, false otherwise
-
setMissing
public TextualStringData setMissing(int i)
- Specified by:
setMissing
in interfaceStringData
-
addAll
public TextualStringData addAll(List<String> stringValues)
Add all the strings in the list to this column- Parameters:
stringValues
- a list of values
-
rowComparator
public it.unimi.dsi.fastutil.ints.IntComparator rowComparator()
- Specified by:
rowComparator
in interfaceStringData
-
isEmpty
public boolean isEmpty()
- Specified by:
isEmpty
in interfaceStringData
-
unique
public TextualStringData unique()
Returns a new Column containing all the unique values in this column- Specified by:
unique
in interfaceStringData
- Returns:
- a column with unique values.
-
where
public TextualStringData where(Selection selection)
- Specified by:
where
in interfaceStringData
-
copy
public TextualStringData copy()
- Specified by:
copy
in interfaceStringData
-
append
public void append(Column<String> column)
- Specified by:
append
in interfaceStringData
-
countMissing
public int countMissing()
Returns the count of missing values in this column- Specified by:
countMissing
in interfaceStringData
-
removeMissing
public TextualStringData removeMissing()
- Specified by:
removeMissing
in interfaceStringData
-
asSet
public Set<String> asSet()
- Specified by:
asSet
in interfaceStringData
-
asBytes
public byte[] asBytes(int rowNumber)
Returns the contents of the cell at rowNumber as a byte[]- Specified by:
asBytes
in interfaceStringData
-
append
public TextualStringData append(String value)
Added for naming consistency with all other columns- Specified by:
append
in interfaceStringData
-
appendObj
public TextualStringData appendObj(Object obj)
- Specified by:
appendObj
in interfaceStringData
-
isIn
public Selection isIn(String... strings)
- Specified by:
isIn
in interfaceStringFilters
- Specified by:
isIn
in interfaceStringFilterSpec<Selection>
-
isIn
public Selection isIn(Collection<String> strings)
- Specified by:
isIn
in interfaceStringFilters
- Specified by:
isIn
in interfaceStringFilterSpec<Selection>
-
isNotIn
public Selection isNotIn(String... strings)
- Specified by:
isNotIn
in interfaceStringFilters
- Specified by:
isNotIn
in interfaceStringFilterSpec<Selection>
-
isNotIn
public Selection isNotIn(Collection<String> strings)
- Specified by:
isNotIn
in interfaceStringFilters
- Specified by:
isNotIn
in interfaceStringFilterSpec<Selection>
-
firstIndexOf
public int firstIndexOf(String value)
- Specified by:
firstIndexOf
in interfaceStringData
-
asObjectArray
public String[] asObjectArray()
- Specified by:
asObjectArray
in interfaceStringData
-
getDouble
public double getDouble(int i)
Returns a double that can stand in for the string at index i in some ML applicationsTODO: Evaluate use of hashCode() here for uniqueness
- Specified by:
getDouble
in interfaceStringData
- Parameters:
i
- The index in this column
-
asDoubleArray
public double[] asDoubleArray()
- Specified by:
asDoubleArray
in interfaceStringData
-
countOccurrences
public int countOccurrences(String value)
- Specified by:
countOccurrences
in interfaceStringData
-
getDummies
public List<BooleanColumn> getDummies()
Unsupported Operation This can't be used on a text column as the number of BooleanColumns would likely be excessive- Specified by:
getDummies
in interfaceStringData
-
getDictionary
@Nullable public DictionaryMap getDictionary()
Returns null, as this Column is not backed by a dictionaryMap- Specified by:
getDictionary
in interfaceStringData
-
-