Charset enhanced with features allowing it to work with Daffodil's Bit-wise DataInputStream and DataOutputStream.
Base class for byte based decoders
Base class for byte based decoders
Provides methods to get a single byte. Also handles logic related to error encoding policy and the replacement characters. Implementing class only need to use the provided methods to get a byte(s) and convert to a char and perform validation on the code point.
Some encodings need state, but only for the storing of a low surrogate pair.
Some encodings need state, but only for the storing of a low surrogate pair. This encapsulates that logic. When a class extends this class, it ust implement deocodeOneUnicodeChar, which should decode one char, and if there is a high/low surrogate pair it should call setLowSurrgoate on the low and return the high.
Implements BitsCharset based on encapsulation of a regular JavaCharset.
Some encodings are not byte-oriented.
Some encodings are not byte-oriented.
If we know the correspondence from integers to characters, and we can express that as a string, then everything else can be derived
This class is explicitly not a java.nio.charset.Charset. It is a BitsCharset, which is not a compatible type with a java.nio.charset.Charset on purpose so we don't confuse the two.
The problem is that java.nio.charset.Charset is designed in such a way that one cannot implement a proxy class that redirects methods to another class. This is due to all the final methods on the class.
So instead we do the opposite. We implement our own BitsCharset API, but implement the behavior in terms of a proxy JavaCharsetDecoder and proxy JavaCharsetEncoder that drive the decodeLoop and encodeLoop. This way we don't have to re-implement all the error handling and flush/end logic.
Implements BitsCharsetEncoder by encapsulating a standard JavaCharsetEncoder
Hyjack a JavaCharsetEncoder to drive the encodeLoop.
Hyjack a JavaCharsetEncoder to drive the encodeLoop.
This avoids us reimplementing all the error handling and flush/end logic.
TODO: Similar to our decoders, we should create custom encoders. Then we wouldn't need all this complex code related to proxying java charsets.
X-DFDL-5-BIT-PACKED-LSBF occupies only 5 bits with each code unit.
X-DFDL-6-BIT-DFI-264-DUI-001, special 6 bit encoding
Special purpose.
Special purpose. This is not used for decoding anything. The encoder is used to convert strings using the characters allowed, into binary data using the AIS Payload Armoring described here:
http://catb.org/gpsd/AIVDM.html#_aivdm_aivdo_payload_armoring
To convert a string of length N bytes, You will get 6N bits.
The decoder can be used for unit testing, but the point of this class is to make the encoder available for use in un-doing the AIS Payload armoring when parsing, and performing this armoring when unparsing.
When encoding from 8-bit say, ascii, or iso-8859-1, this can only encode things that stay within the 64 allowed characters. dfdl:encodingErrorPolicy='error' would check this (once implemented), otherwise where this is used the checking needs to be done separately somehow.
X-DFDL-BITS-LSBF occupies only 1 bit with each code unit.
X-DFDL-BITS-MSBF occupies only 1 bit with each code unit.
X-DFDL-HEX-LSBF occupies only 4 bits with each code unit.
X-DFDL-HEX-MSBF occupies only 4 bits with each code unit.
X-DFDL-OCTAL-LSBF occupies only 3 bits with each code unit.
X-DFDL-OCTAL-MSBF occupies only 3 bits with each code unit.
X-DFDL-US-ASCII-6-BIT-PACKED occupies only 6 bits with each code unit.
X-DFDL-US-ASCII-7-BIT-PACKED occupies only 7 bits with each code unit.
Provides BitsCharset objects corresponding to the usual java charsets found in StandardCharsets.
Charset enhanced with features allowing it to work with Daffodil's Bit-wise DataInputStream and DataOutputStream.
Daffodil uses BitsCharset as its primary abstraction for dealing with character sets, which enables it to support character sets where the code units are smaller than 1 byte.
Note that BitsCharset is NOT derived from java.nio.charset.Charset, nor are BitsCharsetDecoder or BitsCharsetEncoder derived from java.nio.charset.CharsetDecoder or CharsetEncoder respectively. This is partly because these Java classes have many final methods that make it impossible for us to implement what we need by extending them. But more importantly, we need much more low level control about how characters are decoded what what kind of information is returned during decode operations. Getting that information with the limitations of the java Charset API become an encumbrance. Replacing with our own Charset decoders grealy simplifies the code and allows for future enhancements as needed.