Package org.datavec.api.writable
Class Text
- java.lang.Object
-
- org.datavec.api.io.BinaryComparable
-
- org.datavec.api.writable.Text
-
- All Implemented Interfaces:
Serializable
,Comparable<BinaryComparable>
,WritableComparable<BinaryComparable>
,Writable
public class Text extends BinaryComparable implements WritableComparable<BinaryComparable>
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
Text.Comparator
A WritableComparator optimized for Text keys.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
append(byte[] utf8, int start, int len)
Append a range of bytes to the end of the given textstatic int
bytesToCodePoint(ByteBuffer bytes)
Returns the next code point at the current position in the buffer.int
charAt(int position)
Returns the Unicode Scalar Value (32-bit integer value) for the character atposition
.void
clear()
Clear the string to empty.static String
decode(byte[] utf8)
Converts the provided byte array to a String using the UTF-8 encoding.static String
decode(byte[] utf8, int start, int length)
static String
decode(byte[] utf8, int start, int length, boolean replace)
Converts the provided byte array to a String using the UTF-8 encoding.static ByteBuffer
encode(String string)
Converts the provided String to bytes using the UTF-8 encoding.static ByteBuffer
encode(String string, boolean replace)
Converts the provided String to bytes using the UTF-8 encoding.boolean
equals(Object o)
Returns true iffo
is a Text with the same contents.int
find(String what)
int
find(String what, int start)
Finds any occurence ofwhat
in the backing buffer, starting as positionstart
.byte[]
getBytes()
Returns the raw bytes; however, only data up togetLength()
is valid.int
getLength()
Returns the number of bytes in the byte arrayWritableType
getType()
Get the type of the writable.int
hashCode()
Return a hash of the bytes returned from {#getBytes()}.void
readFields(DataInput in)
deserializestatic String
readString(DataInput in)
Read a UTF8 encoded string from invoid
set(byte[] utf8)
Set to a utf8 byte arrayvoid
set(byte[] utf8, int start, int len)
Set the Text to range of bytesvoid
set(String string)
Set to contain the contents of a string.void
set(Text other)
copy a text.static void
skip(DataInput in)
Skips over one Text in the input.double
toDouble()
Convert Writable to double.float
toFloat()
Convert Writable to float.int
toInt()
Convert Writable to int.long
toLong()
Convert Writable to long.String
toString()
Convert text back to stringstatic int
utf8Length(String string)
For the given string, returns the number of UTF-8 bytes required to encode the string.static void
validateUTF8(byte[] utf8)
Check if a byte array contains valid utf-8static void
validateUTF8(byte[] utf8, int start, int len)
Check to see if a byte array is valid utf-8void
write(DataOutput out)
serialize write this object to out length uses zero-compressed encodingstatic int
writeString(DataOutput out, String s)
Write a UTF8 encoded string to outvoid
writeType(DataOutput out)
Write the type (a single short value) to the DataOutput.-
Methods inherited from class org.datavec.api.io.BinaryComparable
compareTo, compareTo
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface java.lang.Comparable
compareTo
-
-
-
-
Method Detail
-
getBytes
public byte[] getBytes()
Returns the raw bytes; however, only data up togetLength()
is valid.- Specified by:
getBytes
in classBinaryComparable
-
getLength
public int getLength()
Returns the number of bytes in the byte array- Specified by:
getLength
in classBinaryComparable
-
charAt
public int charAt(int position)
Returns the Unicode Scalar Value (32-bit integer value) for the character atposition
. Note that this method avoids using the converter or doing String instatiation- Returns:
- the Unicode scalar value at position or -1 if the position is invalid or points to a trailing byte
-
find
public int find(String what)
-
find
public int find(String what, int start)
Finds any occurence ofwhat
in the backing buffer, starting as positionstart
. The starting position is measured in bytes and the return value is in terms of byte position in the buffer. The backing buffer is not converted to a string for this operation.- Returns:
- byte position of the first occurence of the search string in the UTF-8 buffer or -1 if not found
-
set
public void set(String string)
Set to contain the contents of a string.
-
set
public void set(byte[] utf8)
Set to a utf8 byte array
-
set
public void set(Text other)
copy a text.
-
set
public void set(byte[] utf8, int start, int len)
Set the Text to range of bytes- Parameters:
utf8
- the data to copy fromstart
- the first position of the new stringlen
- the number of bytes of the new string
-
append
public void append(byte[] utf8, int start, int len)
Append a range of bytes to the end of the given text- Parameters:
utf8
- the data to copy fromstart
- the first position to append from utf8len
- the number of bytes to append
-
clear
public void clear()
Clear the string to empty.
-
toString
public String toString()
Convert text back to string- Overrides:
toString
in classObject
- See Also:
Object.toString()
-
readFields
public void readFields(DataInput in) throws IOException
deserialize- Specified by:
readFields
in interfaceWritable
- Parameters:
in
-DataInput
to deseriablize this object from.- Throws:
IOException
-
writeType
public void writeType(DataOutput out) throws IOException
Description copied from interface:Writable
Write the type (a single short value) to the DataOutput. SeeWritableFactory
for details.- Specified by:
writeType
in interfaceWritable
- Parameters:
out
- DataOutput to write to- Throws:
IOException
- For errors during writing
-
skip
public static void skip(DataInput in) throws IOException
Skips over one Text in the input.- Throws:
IOException
-
write
public void write(DataOutput out) throws IOException
serialize write this object to out length uses zero-compressed encoding- Specified by:
write
in interfaceWritable
- Parameters:
out
-DataOuput
to serialize this object into.- Throws:
IOException
- See Also:
Writable.write(DataOutput)
-
equals
public boolean equals(Object o)
Returns true iffo
is a Text with the same contents.- Overrides:
equals
in classBinaryComparable
-
hashCode
public int hashCode()
Description copied from class:BinaryComparable
Return a hash of the bytes returned from {#getBytes()}.- Overrides:
hashCode
in classBinaryComparable
- See Also:
org.apache.hadoop.io.WritableComparator#hashBytes(byte[],int)
-
decode
public static String decode(byte[] utf8) throws CharacterCodingException
Converts the provided byte array to a String using the UTF-8 encoding. If the input is malformed, replace by a default value.- Throws:
CharacterCodingException
-
decode
public static String decode(byte[] utf8, int start, int length) throws CharacterCodingException
- Throws:
CharacterCodingException
-
decode
public static String decode(byte[] utf8, int start, int length, boolean replace) throws CharacterCodingException
Converts the provided byte array to a String using the UTF-8 encoding. Ifreplace
is true, then malformed input is replaced with the substitution character, which is U+FFFD. Otherwise the method throws a MalformedInputException.- Throws:
CharacterCodingException
-
encode
public static ByteBuffer encode(String string) throws CharacterCodingException
Converts the provided String to bytes using the UTF-8 encoding. If the input is malformed, invalid chars are replaced by a default value.- Returns:
- ByteBuffer: bytes stores at ByteBuffer.array() and length is ByteBuffer.limit()
- Throws:
CharacterCodingException
-
encode
public static ByteBuffer encode(String string, boolean replace) throws CharacterCodingException
Converts the provided String to bytes using the UTF-8 encoding. Ifreplace
is true, then malformed input is replaced with the substitution character, which is U+FFFD. Otherwise the method throws a MalformedInputException.- Returns:
- ByteBuffer: bytes stores at ByteBuffer.array() and length is ByteBuffer.limit()
- Throws:
CharacterCodingException
-
readString
public static String readString(DataInput in) throws IOException
Read a UTF8 encoded string from in- Throws:
IOException
-
writeString
public static int writeString(DataOutput out, String s) throws IOException
Write a UTF8 encoded string to out- Throws:
IOException
-
validateUTF8
public static void validateUTF8(byte[] utf8) throws MalformedInputException
Check if a byte array contains valid utf-8- Parameters:
utf8
- byte array- Throws:
MalformedInputException
- if the byte array contains invalid utf-8
-
validateUTF8
public static void validateUTF8(byte[] utf8, int start, int len) throws MalformedInputException
Check to see if a byte array is valid utf-8- Parameters:
utf8
- the array of bytesstart
- the offset of the first byte in the arraylen
- the length of the byte sequence- Throws:
MalformedInputException
- if the byte array contains invalid bytes
-
bytesToCodePoint
public static int bytesToCodePoint(ByteBuffer bytes)
Returns the next code point at the current position in the buffer. The buffer's position will be incremented. Any mark set on this buffer will be changed by this method!
-
utf8Length
public static int utf8Length(String string)
For the given string, returns the number of UTF-8 bytes required to encode the string.- Parameters:
string
- text to encode- Returns:
- number of UTF-8 bytes required to encode
-
toDouble
public double toDouble()
Description copied from interface:Writable
Convert Writable to double. Whether this is supported depends on the specific writable.
-
toFloat
public float toFloat()
Description copied from interface:Writable
Convert Writable to float. Whether this is supported depends on the specific writable.
-
toInt
public int toInt()
Description copied from interface:Writable
Convert Writable to int. Whether this is supported depends on the specific writable.
-
toLong
public long toLong()
Description copied from interface:Writable
Convert Writable to long. Whether this is supported depends on the specific writable.
-
getType
public WritableType getType()
Description copied from interface:Writable
Get the type of the writable.
-
-