An Iterator[Char] with additional peek and peek2 methods.
Backtracking
Backtracking
The mark and reset system is more sophisticated than that of java's BufferedInputStream, which allows only a single outstanding mark.
This trait enables a stack of mark values to be created and reset, respecting stack ordering, that is, they are nested locations and must be created and released in an order consistent with stack ordering.
The mark contains additional state beyond just the position (which is maintained at bit granularity). All the other state aspects (decoder, bit order, etc.) are also maintained by a mark.
Implementation note: The oldest/deepest mark is expected to be built on top of a java BufferedInputStream mark.
Use of mark/reset should eliminate any need for random-access setters of the bit position.
For mini-marks that just mark/reset the position
This trait defines the low level API called by Daffodil's Parsers.
It has features to support
A goal is that this API does not allocate objects as I/O operations are performed unless boxed objects are being returned. For example getSignedLong(...) should not allocate anything per call; however, getSignedBigInt(...) does, because a BigInt is a heap-allocated object.
Internal buffers and such may be dropped/resized/reallocated as needed during method calls. The point is not that you never allocate. It's that the per-I/O operation overhead does not require object allocation for every data-accessing method call.
Similarly, text data can be retrieved into a char buffer, and the char buffer can provide a limit on size (available capacity of the char buffer) in characters. The text can be examined in the char buffer, or a string can be created from the char buffer's contents when needed.
This API is very stateful, and not-thread-safe i.e., each thread must have its own object. Some of this is inherent in this API style, and some is inherited from the underlying objects this API uses (such as CharsetDecoder).
This API is also intended to support some very highly optimized implementations. For example, if the schemas is all text, and the encoding is known to be iso-8859-1, then there is no notion of a decode error, and every byte value, extended to a Char value, *is* the Unicode codepoint. No decoder needs to be used in this case and this API becomes a quite thin layer on top of a java.io.BufferedStream.
Terminology:
Available Data - this is the data that is between the current bit position, and some limit. The limit can either be set (via setBitLimit calls), or it can be limited by tunable values, or implementation-specific upper limits, or it can simply be the end of the data stream.
Different kinds of DataInputStreams can have different limits. For example, a File-based DataInputStream may have no limit on the forward speculation distance, because the file can be randomly accessed if necessary. Contrasting that with a data stream that is directly connected to a network socket may have a upper limit on the amount of data that it is willing to buffer.
None of this is a commitment that this API will in fact have multiple specialized implementations. It's just a possibility for the future.
Implementation Note: It is the implementation of this interface which implements the Bucket Algorithm as described on the Daffodil Wiki. All of that bucket stuff is beneath this API.
In general, this API tries to return a value rather than throw exceptions whenever the behavior may be very common. This leaves it up to the caller to decide whether or not to throw an exception, and avoids the overhead of try-catch blocks. The exception to this rule are the methods that involve character decoding for textual data. These methods may throw CharacterCodingException when the encoding error policy is 'error'.