Throw to indicate that bitOrder changed, but not on a byte boundary.
Implements the InputSource interface, reading data from a generic java.io.InputStream and storing the data in buckets of a defined size.
Implements the InputSource interface, reading data from a generic java.io.InputStream and storing the data in buckets of a defined size. Buckets are freed when no "locks" exist inside the bucket to minimize memory usage. Note that "locks" in this sense are the InputSource locks on bytePosition and are not about syncrhonization. This more of a reference count, counting buckets to determine which buckets are no longer needed and can be freed when the reference count goes to zero.
Wraps a java.nio.ByteBuffer in a InputSource
Wraps a java.nio.ByteBuffer in a InputSource
When an instance of this class is created, it creates a readOnly copy of the ByteBuffer. The current position of the ByteBuffer is considered index 0. For example, if thed passed in ByteBuffer had position 2, calling setPosition(0) would reset the byteBuffer back to position 2. The limit of the ByteBuffer is considered the end of data.
Hex/Bits and text dump formats for debug/trace purposes.
Hex/Bits and text dump formats for debug/trace purposes.
By definition this is a dump, so doesn't know much about where the fields in the data are. (To do that you'd need a format description language, like DFDL, but this is here to help debug DFDL descriptions, so it really cannot exploit any information about the data format)
There is an asymmetry between DataInputStream and DataOutputStream with respect to the positions and limits in the bit stream.
There is an asymmetry between DataInputStream and DataOutputStream with respect to the positions and limits in the bit stream.
For the DataInputStream, we have this concept of the current bitPos0b, and optionally there may be abound called bitLimit0b. There are 1b variants of these.
For parsing, these are always absolute values, that is they contain bit position relative the ultimate start of the input stream where parsing began.
For DataOutputStream, we have slightly different concepts.
There are absolute and relative variants. The absolute bitPosOb or absBitPos0b is symmetric to the parser's bitPos0b. It's the position relative to the ultimate start of the output stream.
However, we often do not know this value. So the UState and DataOutputStream have a maybeAbsBitPos0b which can be MaybeULong.Nope if the value isn't known.
In addition we have the relative or relBitPos0b. This is relative to the start of whatever buffer we are doing unparsing into.
When unparsing, we often have to unparse into a buffer where the ultimate actual absolute position isn't yet known, but we have to do the unparsing anyway, for example so that we can measure exactly how long something is.
Conversely, sometimes we simply must have the absolute output bit position, for example, when computing the number of bits to insert to achieve the required alignment.
Hence we have relBitPos0b - always known and is a value >= 0, and we have maybeAbsBitPos0b which is a MaybeULong. If known it is >=0.
Corresponding to bit position we have bit limit, which is measured in the same 0b or 1b units, but is *always* a maybe type, because even in the case where we know the absolute position, we still may or may not have any limit in place. Hence the UState and DataOutputStream have a
maybeRelBitLimit0b
and
maybeAbsBitLimit0b.
One invariant is this: when the absolute bit pos is known, then it is the same as the relative bit pos. Similarly when the absolute bit limit is known, then the relative bit limit is known and is equal.
This is an interface trait, and it defines methods shared by both DataInputStream and DataOutputStream.
This is an interface trait, and it defines methods shared by both DataInputStream and DataOutputStream.
Implementation (partial) is in DataStreamCommonImplMixin.
Shared by both DataInputStream and DataOutputStream implementations
To support dfdl:outputValueCalc, we must suspend output.
To support dfdl:outputValueCalc, we must suspend output. This is done by taking the current "direct" output, and splitting it into a still direct part, and a following buffered output.
The direct part waits for the OVC calculation to complete, when that is written, it is finished and collapses into the following, which was buffered, but becomes direct as a result of this collapsing.
Hence, most output will be to direct data output streams, with some, while an OVC is pending, will be buffered, but this is eliminated as soon as possible.
A Buffered DOS can be finished or not. Not finished means that it might still be appended to. Not concurrently, but by other code invoked from this thread of control (which might traverse different co-routine "stack" threads, but it's still one thread of control).
Finished means that the Buffered DOS can never be appended to again.
Has two modes of operation, buffering or direct. When buffering, all output goes into a buffer. When direct, all output goes into a "real" DataOutputStream.
The isLayer parameter defines whether or not this instance originated from a layer or not. This is important to specify because this class is reponsible for closing the associated Java OutputStream, ultimately being written to the underlying underlying DataOutputStream. However, if the DataOutputStream is not related to a layer, that means the associated Java OutputStream came from the user and it is the users responsibility to close it. The isLayer provides the flag to know which streams should be closed or not.
This class can be used with any InputStream to restrict what is read from it to N bytes.
This class can be used with any InputStream to restrict what is read from it to N bytes.
This can be used to forcibly stop consumption of data from a stream at a length obtained explicitly.
Thread safety: This is inherently stateful - so not thread safe to use this object from more than one thread.
Abstract interface to obtain format properties or values derived from properties.
Abstract interface to obtain format properties or values derived from properties.
This includes anything the I/O layer needs, which includes properties that can be runtime-valued expressions, or that depend on such.
By passing in an object that provides quick access to these, we avoid the need to have setters/getters that call setters that change state in the I/O layer.
The InputSource class is really just a mechanism to provide bytes an InputSourceDataInputStream, which does the heavily lift about converter bits/bytes to numbers and characters.
The InputSource class is really just a mechanism to provide bytes an InputSourceDataInputStream, which does the heavily lift about converter bits/bytes to numbers and characters. This class does not need to know anything about bits, it is purely byte centric. One core difference from this vs an InputStream is that is must have the capability to backtrack to arbitrary points in the InputStreams history. To aide in this, methods are called to let the InputSource know which byte positions might we might need to backtrack to, which can allow it to free data that know longer is needed. One can almost thing of things as an InputStream that supports multiple marks with random access.
Realization of the DataInputStream API
Realization of the DataInputStream API
Underlying representation is an InputSource containing all input data.
Warning: Only mix this into thread-local state objects.
Warning: Only mix this into thread-local state objects. If mixed into a regular class this will end up sharing the local stack object across threads, which is a very bad idea (not thread safe).
The state that must be saved and restored by mark/reset calls
Can be used with any InputStream to restrict what is read from it to stop before a particular regex match.
Can be used with any InputStream to restrict what is read from it to stop before a particular regex match.
The regex must have a finite maximum length match string.
This can be used to forcibly stop consumption of data from a stream at a length obtained from a delimiter that is described using a regex.
The delimiter matching the regex is consumed from the underlying stream (if found), and the underlying stream is left positioned at the byte after the regex match string.
IMPORTANT: The delimiter regex cannot contain any Capturing Groups! Use (?: ... ) which is non-capturing, instead of regular ( ... ). For example: this regex matches CRLF not followed by tab or space:
"""\r\n(?!(?:\t|\ ))"""
Notice use of the ?: to avoid a capture group around the alternatives of tab or space.
Thread safety: This is inherently stateful - so not thread safe to use this object from more than one thread.
When unparsing, we reuse all the DFA logic to identify delimiters within the data that need to be escaped, so we need to treat the string data being unparsed as a DataInputStream.
Mixin to classes that are supposed to exist 1 to 1 with threads.
Mixin to classes that are supposed to exist 1 to 1 with threads. Such as DataInputStream derived classes and DataOutputStream derived classes.
Can be used with any InputStream to restrict what is read from it to stop before a boundary mark string.
Can be used with any InputStream to restrict what is read from it to stop before a boundary mark string.
The boundary mark string is exactly that, a string of characters. Not a regex, nor anything involving DFDL Character Entities or Character Class Entities. (No %WSP; no %NL; )
This can be used to forcibly stop consumption of data from a stream at a length obtained from a delimiter.
The boundary mark string is consumed from the underlying stream (if found), and the underlying stream is left positioned at the byte after the boundary mark string.
Thread safety: This is inherently stateful - so not thread safe to use this object from more than one thread.
This trait defines the low level API called by Daffodil's Parsers.
This trait defines the low level API called by Daffodil's Parsers.
It has features to support
A goal is that this API does not allocate objects as I/O operations are performed unless boxed objects are being returned. For example getSignedLong(...) should not allocate anything per call; however, getSignedBigInt(...) does, because a BigInt is a heap-allocated object.
Internal buffers and such may be dropped/resized/reallocated as needed during method calls. The point is not that you never allocate. It's that the per-I/O operation overhead does not require object allocation for every data-accessing method call.
Similarly, text data can be retrieved into a char buffer, and the char buffer can provide a limit on size (available capacity of the char buffer) in characters. The text can be examined in the char buffer, or a string can be created from the char buffer's contents when needed.
This API is very stateful, and not-thread-safe i.e., each thread must have its own object. Some of this is inherent in this API style, and some is inherited from the underlying objects this API uses (such as CharsetDecoder).
This API is also intended to support some very highly optimized implementations. For example, if the schemas is all text, and the encoding is known to be iso-8859-1, then there is no notion of a decode error, and every byte value, extended to a Char value, *is* the Unicode codepoint. No decoder needs to be used in this case and this API becomes a quite thin layer on top of a java.io.BufferedStream.
Terminology:
Available Data - this is the data that is between the current bit position, and some limit. The limit can either be set (via setBitLimit calls), or it can be limited by tunable values, or implementation-specific upper limits, or it can simply be the end of the data stream.
Different kinds of DataInputStreams can have different limits. For example, a File-based DataInputStream may have no limit on the forward speculation distance, because the file can be randomly accessed if necessary. Contrasting that with a data stream that is directly connected to a network socket may have a upper limit on the amount of data that it is willing to buffer.
None of this is a commitment that this API will in fact have multiple specialized implementations. It's just a possibility for the future.
Implementation Note: It is the implementation of this interface which implements the Bucket Algorithm as described on the Daffodil Wiki. All of that bucket stuff is beneath this API.
In general, this API tries to return a value rather than throw exceptions whenever the behavior may be very common. This leaves it up to the caller to decide whether or not to throw an exception, and avoids the overhead of try-catch blocks. The exception to this rule are the methods that involve character decoding for textual data. These methods may throw CharacterCodingException when the encoding error policy is 'error'.
Highly optimized converter for Ascii to Unicode
Factory for creating this type of DataInputStream
Factory for creating this type of DataInputStream
Provides only core input sources to avoid making any assumptions about the incoming data (i.e. should a File be mapped to a ByteBuffer or be streamed as an InputStream). The user knows better than us, so have them make the decision.
Throw to indicate that bitOrder changed, but not on a byte boundary.
Must be caught at higher level and turned into a RuntimeSDE where we have the context to do so.
All calls to setFinished should, somewhere, be surrounded by a catch of this.