Class Text

    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  Text.Comparator
      A WritableComparator optimized for Text keys.
    • Constructor Summary

      Constructors 
      Constructor Description
      Text()  
      Text​(byte[] utf8)
      Construct from a byte array.
      Text​(String string)
      Construct from a string.
      Text​(Text utf8)
      Construct from another text.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void append​(byte[] utf8, int start, int len)
      Append a range of bytes to the end of the given text
      static int bytesToCodePoint​(ByteBuffer bytes)
      Returns the next code point at the current position in the buffer.
      int charAt​(int position)
      Returns the Unicode Scalar Value (32-bit integer value) for the character at position.
      void clear()
      Clear the string to empty.
      static String decode​(byte[] utf8)
      Converts the provided byte array to a String using the UTF-8 encoding.
      static String decode​(byte[] utf8, int start, int length)  
      static String decode​(byte[] utf8, int start, int length, boolean replace)
      Converts the provided byte array to a String using the UTF-8 encoding.
      static ByteBuffer encode​(String string)
      Converts the provided String to bytes using the UTF-8 encoding.
      static ByteBuffer encode​(String string, boolean replace)
      Converts the provided String to bytes using the UTF-8 encoding.
      boolean equals​(Object o)
      Returns true iff o is a Text with the same contents.
      int find​(String what)  
      int find​(String what, int start)
      Finds any occurence of what in the backing buffer, starting as position start.
      byte[] getBytes()
      Returns the raw bytes; however, only data up to getLength() is valid.
      int getLength()
      Returns the number of bytes in the byte array
      WritableType getType()
      Get the type of the writable.
      int hashCode()
      Return a hash of the bytes returned from {#getBytes()}.
      void readFields​(DataInput in)
      deserialize
      static String readString​(DataInput in)
      Read a UTF8 encoded string from in
      void set​(byte[] utf8)
      Set to a utf8 byte array
      void set​(byte[] utf8, int start, int len)
      Set the Text to range of bytes
      void set​(String string)
      Set to contain the contents of a string.
      void set​(Text other)
      copy a text.
      static void skip​(DataInput in)
      Skips over one Text in the input.
      double toDouble()
      Convert Writable to double.
      float toFloat()
      Convert Writable to float.
      int toInt()
      Convert Writable to int.
      long toLong()
      Convert Writable to long.
      String toString()
      Convert text back to string
      static int utf8Length​(String string)
      For the given string, returns the number of UTF-8 bytes required to encode the string.
      static void validateUTF8​(byte[] utf8)
      Check if a byte array contains valid utf-8
      static void validateUTF8​(byte[] utf8, int start, int len)
      Check to see if a byte array is valid utf-8
      void write​(DataOutput out)
      serialize write this object to out length uses zero-compressed encoding
      static int writeString​(DataOutput out, String s)
      Write a UTF8 encoded string to out
      void writeType​(DataOutput out)
      Write the type (a single short value) to the DataOutput.
    • Constructor Detail

      • Text

        public Text()
      • Text

        public Text​(String string)
        Construct from a string.
      • Text

        public Text​(Text utf8)
        Construct from another text.
      • Text

        public Text​(byte[] utf8)
        Construct from a byte array.
    • Method Detail

      • getLength

        public int getLength()
        Returns the number of bytes in the byte array
        Specified by:
        getLength in class BinaryComparable
      • charAt

        public int charAt​(int position)
        Returns the Unicode Scalar Value (32-bit integer value) for the character at position. Note that this method avoids using the converter or doing String instatiation
        Returns:
        the Unicode scalar value at position or -1 if the position is invalid or points to a trailing byte
      • find

        public int find​(String what)
      • find

        public int find​(String what,
                        int start)
        Finds any occurence of what in the backing buffer, starting as position start. The starting position is measured in bytes and the return value is in terms of byte position in the buffer. The backing buffer is not converted to a string for this operation.
        Returns:
        byte position of the first occurence of the search string in the UTF-8 buffer or -1 if not found
      • set

        public void set​(String string)
        Set to contain the contents of a string.
      • set

        public void set​(byte[] utf8)
        Set to a utf8 byte array
      • set

        public void set​(Text other)
        copy a text.
      • set

        public void set​(byte[] utf8,
                        int start,
                        int len)
        Set the Text to range of bytes
        Parameters:
        utf8 - the data to copy from
        start - the first position of the new string
        len - the number of bytes of the new string
      • append

        public void append​(byte[] utf8,
                           int start,
                           int len)
        Append a range of bytes to the end of the given text
        Parameters:
        utf8 - the data to copy from
        start - the first position to append from utf8
        len - the number of bytes to append
      • clear

        public void clear()
        Clear the string to empty.
      • equals

        public boolean equals​(Object o)
        Returns true iff o is a Text with the same contents.
        Overrides:
        equals in class BinaryComparable
      • hashCode

        public int hashCode()
        Description copied from class: BinaryComparable
        Return a hash of the bytes returned from {#getBytes()}.
        Overrides:
        hashCode in class BinaryComparable
        See Also:
        org.apache.hadoop.io.WritableComparator#hashBytes(byte[],int)
      • decode

        public static String decode​(byte[] utf8,
                                    int start,
                                    int length,
                                    boolean replace)
                             throws CharacterCodingException
        Converts the provided byte array to a String using the UTF-8 encoding. If replace is true, then malformed input is replaced with the substitution character, which is U+FFFD. Otherwise the method throws a MalformedInputException.
        Throws:
        CharacterCodingException
      • encode

        public static ByteBuffer encode​(String string)
                                 throws CharacterCodingException
        Converts the provided String to bytes using the UTF-8 encoding. If the input is malformed, invalid chars are replaced by a default value.
        Returns:
        ByteBuffer: bytes stores at ByteBuffer.array() and length is ByteBuffer.limit()
        Throws:
        CharacterCodingException
      • encode

        public static ByteBuffer encode​(String string,
                                        boolean replace)
                                 throws CharacterCodingException
        Converts the provided String to bytes using the UTF-8 encoding. If replace is true, then malformed input is replaced with the substitution character, which is U+FFFD. Otherwise the method throws a MalformedInputException.
        Returns:
        ByteBuffer: bytes stores at ByteBuffer.array() and length is ByteBuffer.limit()
        Throws:
        CharacterCodingException
      • validateUTF8

        public static void validateUTF8​(byte[] utf8)
                                 throws MalformedInputException
        Check if a byte array contains valid utf-8
        Parameters:
        utf8 - byte array
        Throws:
        MalformedInputException - if the byte array contains invalid utf-8
      • validateUTF8

        public static void validateUTF8​(byte[] utf8,
                                        int start,
                                        int len)
                                 throws MalformedInputException
        Check to see if a byte array is valid utf-8
        Parameters:
        utf8 - the array of bytes
        start - the offset of the first byte in the array
        len - the length of the byte sequence
        Throws:
        MalformedInputException - if the byte array contains invalid bytes
      • bytesToCodePoint

        public static int bytesToCodePoint​(ByteBuffer bytes)
        Returns the next code point at the current position in the buffer. The buffer's position will be incremented. Any mark set on this buffer will be changed by this method!
      • utf8Length

        public static int utf8Length​(String string)
        For the given string, returns the number of UTF-8 bytes required to encode the string.
        Parameters:
        string - text to encode
        Returns:
        number of UTF-8 bytes required to encode
      • toDouble

        public double toDouble()
        Description copied from interface: Writable
        Convert Writable to double. Whether this is supported depends on the specific writable.
        Specified by:
        toDouble in interface Writable
      • toFloat

        public float toFloat()
        Description copied from interface: Writable
        Convert Writable to float. Whether this is supported depends on the specific writable.
        Specified by:
        toFloat in interface Writable
      • toInt

        public int toInt()
        Description copied from interface: Writable
        Convert Writable to int. Whether this is supported depends on the specific writable.
        Specified by:
        toInt in interface Writable
      • toLong

        public long toLong()
        Description copied from interface: Writable
        Convert Writable to long. Whether this is supported depends on the specific writable.
        Specified by:
        toLong in interface Writable