Package

org.apache.daffodil

io

Permalink

package io

Visibility
  1. Public
  2. All

Type Members

  1. class BitOrderChangeException extends Exception with ThinThrowable

    Permalink

    Throw to indicate that bitOrder changed, but not on a byte boundary.

    Throw to indicate that bitOrder changed, but not on a byte boundary.

    Must be caught at higher level and turned into a RuntimeSDE where we have the context to do so.

    All calls to setFinished should, somewhere, be surrounded by a catch of this.

  2. class BucketingInputSource extends InputSource

    Permalink

    Implements the InputSource interface, reading data from a generic java.io.InputStream and storing the data in buckets of a defined size.

    Implements the InputSource interface, reading data from a generic java.io.InputStream and storing the data in buckets of a defined size. Buckets are freed when no "locks" exist inside the bucket to minimize memory usage. Note that "locks" in this sense are the InputSource locks on bytePosition and are not about syncrhonization. This more of a reference count, counting buckets to determine which buckets are no longer needed and can be freed when the reference count goes to zero.

  3. class ByteBufferInputSource extends InputSource

    Permalink

    Wraps a java.nio.ByteBuffer in a InputSource

    Wraps a java.nio.ByteBuffer in a InputSource

    When an instance of this class is created, it creates a readOnly copy of the ByteBuffer. The current position of the ByteBuffer is considered index 0. For example, if thed passed in ByteBuffer had position 2, calling setPosition(0) would reset the byteBuffer back to position 2. The limit of the ByteBuffer is considered the end of data.

  4. sealed trait DOSState extends AnyRef

    Permalink
  5. class DataDumper extends AnyRef

    Permalink

    Hex/Bits and text dump formats for debug/trace purposes.

    Hex/Bits and text dump formats for debug/trace purposes.

    By definition this is a dump, so doesn't know much about where the fields in the data are. (To do that you'd need a format description language, like DFDL, but this is here to help debug DFDL descriptions, so it really cannot exploit any information about the data format)

  6. trait DataInputStream extends DataStreamCommon

    Permalink
  7. trait DataInputStreamImplMixin extends DataInputStream with DataStreamCommonImplMixin with LocalBufferMixin

    Permalink
  8. trait DataOutputStream extends DataStreamCommon with Logging

    Permalink

    There is an asymmetry between DataInputStream and DataOutputStream with respect to the positions and limits in the bit stream.

    There is an asymmetry between DataInputStream and DataOutputStream with respect to the positions and limits in the bit stream.

    For the DataInputStream, we have this concept of the current bitPos0b, and optionally there may be abound called bitLimit0b. There are 1b variants of these.

    For parsing, these are always absolute values, that is they contain bit position relative the ultimate start of the input stream where parsing began.

    For DataOutputStream, we have slightly different concepts.

    There are absolute and relative variants. The absolute bitPosOb or absBitPos0b is symmetric to the parser's bitPos0b. It's the position relative to the ultimate start of the output stream.

    However, we often do not know this value. So the UState and DataOutputStream have a maybeAbsBitPos0b which can be MaybeULong.Nope if the value isn't known.

    In addition we have the relative or relBitPos0b. This is relative to the start of whatever buffer we are doing unparsing into.

    When unparsing, we often have to unparse into a buffer where the ultimate actual absolute position isn't yet known, but we have to do the unparsing anyway, for example so that we can measure exactly how long something is.

    Conversely, sometimes we simply must have the absolute output bit position, for example, when computing the number of bits to insert to achieve the required alignment.

    Hence we have relBitPos0b - always known and is a value >= 0, and we have maybeAbsBitPos0b which is a MaybeULong. If known it is >=0.

    Corresponding to bit position we have bit limit, which is measured in the same 0b or 1b units, but is *always* a maybe type, because even in the case where we know the absolute position, we still may or may not have any limit in place. Hence the UState and DataOutputStream have a

    maybeRelBitLimit0b

    and

    maybeAbsBitLimit0b.

    One invariant is this: when the absolute bit pos is known, then it is the same as the relative bit pos. Similarly when the absolute bit limit is known, then the relative bit limit is known and is equal.

  9. trait DataOutputStreamImplMixin extends DataStreamCommonState with DataOutputStream with DataStreamCommonImplMixin with LocalBufferMixin

    Permalink
  10. trait DataStreamCommon extends AnyRef

    Permalink

    This is an interface trait, and it defines methods shared by both DataInputStream and DataOutputStream.

    This is an interface trait, and it defines methods shared by both DataInputStream and DataOutputStream.

    Implementation (partial) is in DataStreamCommonImplMixin.

  11. trait DataStreamCommonImplMixin extends DataStreamCommon with Logging

    Permalink

    Shared by both DataInputStream and DataOutputStream implementations

  12. trait DataStreamCommonState extends AnyRef

    Permalink
  13. final class DirectOrBufferedDataOutputStream extends DataOutputStreamImplMixin

    Permalink

    To support dfdl:outputValueCalc, we must suspend output.

    To support dfdl:outputValueCalc, we must suspend output. This is done by taking the current "direct" output, and splitting it into a still direct part, and a following buffered output.

    The direct part waits for the OVC calculation to complete, when that is written, it is finished and collapses into the following, which was buffered, but becomes direct as a result of this collapsing.

    Hence, most output will be to direct data output streams, with some, while an OVC is pending, will be buffered, but this is eliminated as soon as possible.

    A Buffered DOS can be finished or not. Not finished means that it might still be appended to. Not concurrently, but by other code invoked from this thread of control (which might traverse different co-routine "stack" threads, but it's still one thread of control).

    Finished means that the Buffered DOS can never be appended to again.

    Has two modes of operation, buffering or direct. When buffering, all output goes into a buffer. When direct, all output goes into a "real" DataOutputStream.

    The isLayer parameter defines whether or not this instance originated from a layer or not. This is important to specify because this class is reponsible for closing the associated Java OutputStream, ultimately being written to the underlying underlying DataOutputStream. However, if the DataOutputStream is not related to a layer, that means the associated Java OutputStream came from the user and it is the users responsibility to close it. The isLayer provides the flag to know which streams should be closed or not.

  14. class ExplicitLengthLimitingStream extends FilterInputStream

    Permalink

    This class can be used with any InputStream to restrict what is read from it to N bytes.

    This class can be used with any InputStream to restrict what is read from it to N bytes.

    This can be used to forcibly stop consumption of data from a stream at a length obtained explicitly.

    Thread safety: This is inherently stateful - so not thread safe to use this object from more than one thread.

  15. trait FormatInfo extends AnyRef

    Permalink

    Abstract interface to obtain format properties or values derived from properties.

    Abstract interface to obtain format properties or values derived from properties.

    This includes anything the I/O layer needs, which includes properties that can be runtime-valued expressions, or that depend on such.

    By passing in an object that provides quick access to these, we avoid the need to have setters/getters that call setters that change state in the I/O layer.

  16. abstract class InputSource extends AnyRef

    Permalink

    The InputSource class is really just a mechanism to provide bytes an InputSourceDataInputStream, which does the heavily lift about converter bits/bytes to numbers and characters.

    The InputSource class is really just a mechanism to provide bytes an InputSourceDataInputStream, which does the heavily lift about converter bits/bytes to numbers and characters. This class does not need to know anything about bits, it is purely byte centric. One core difference from this vs an InputStream is that is must have the capability to backtrack to arbitrary points in the InputStreams history. To aide in this, methods are called to let the InputSource know which byte positions might we might need to backtrack to, which can allow it to free data that know longer is needed. One can almost thing of things as an InputStream that supports multiple marks with random access.

  17. final class InputSourceDataInputStream extends DataInputStreamImplMixin

    Permalink

    Realization of the DataInputStream API

    Realization of the DataInputStream API

    Underlying representation is an InputSource containing all input data.

  18. class InputSourceDataInputStreamCharIterator extends CharIterator

    Permalink
  19. class InputSourceDataInputStreamCharIteratorState extends AnyRef

    Permalink
  20. class LayerBoundaryMarkInsertingJavaOutputStream extends FilterOutputStream

    Permalink
  21. abstract class LocalBuffer[T <: Buffer] extends AnyRef

    Permalink
  22. trait LocalBufferMixin extends AnyRef

    Permalink

    Warning: Only mix this into thread-local state objects.

    Warning: Only mix this into thread-local state objects. If mixed into a regular class this will end up sharing the local stack object across threads, which is a very bad idea (not thread safe).

  23. final class MarkState extends DataStreamCommonState with Mark

    Permalink

    The state that must be saved and restored by mark/reset calls

  24. class RegexLimitingStream extends InputStream

    Permalink

    Can be used with any InputStream to restrict what is read from it to stop before a particular regex match.

    Can be used with any InputStream to restrict what is read from it to stop before a particular regex match.

    The regex must have a finite maximum length match string.

    This can be used to forcibly stop consumption of data from a stream at a length obtained from a delimiter that is described using a regex.

    The delimiter matching the regex is consumed from the underlying stream (if found), and the underlying stream is left positioned at the byte after the regex match string.

    IMPORTANT: The delimiter regex cannot contain any Capturing Groups! Use (?: ... ) which is non-capturing, instead of regular ( ... ). For example: this regex matches CRLF not followed by tab or space:

    """\r\n(?!(?:\t|\ ))"""

    Notice use of the ?: to avoid a capture group around the alternatives of tab or space.

    Thread safety: This is inherently stateful - so not thread safe to use this object from more than one thread.

  25. class StreamIterator[T] extends Iterator[T]

    Permalink
  26. final class StringDataInputStreamForUnparse extends DataInputStreamImplMixin

    Permalink

    When unparsing, we reuse all the DFA logic to identify delimiters within the data that need to be escaped, so we need to treat the string data being unparsed as a DataInputStream.

  27. trait ThreadCheckMixin extends AnyRef

    Permalink

    Mixin to classes that are supposed to exist 1 to 1 with threads.

    Mixin to classes that are supposed to exist 1 to 1 with threads. Such as DataInputStream derived classes and DataOutputStream derived classes.

Value Members

  1. object BoundaryMarkLimitingStream

    Permalink

    Can be used with any InputStream to restrict what is read from it to stop before a boundary mark string.

    Can be used with any InputStream to restrict what is read from it to stop before a boundary mark string.

    The boundary mark string is exactly that, a string of characters. Not a regex, nor anything involving DFDL Character Entities or Character Class Entities. (No %WSP; no %NL; )

    This can be used to forcibly stop consumption of data from a stream at a length obtained from a delimiter.

    The boundary mark string is consumed from the underlying stream (if found), and the underlying stream is left positioned at the byte after the boundary mark string.

    Thread safety: This is inherently stateful - so not thread safe to use this object from more than one thread.

  2. object DataInputStream

    Permalink

    This trait defines the low level API called by Daffodil's Parsers.

    This trait defines the low level API called by Daffodil's Parsers.

    It has features to support

    • backtracking
    • regex pattern matching using Java Pattern regexs (for lengthKind pattern and pattern asserts)
    • character-by-character access as needed by our DFA delimiter/escaping
    • very efficient access to small binary data (64-bits or smaller)
    • alignment and skipping
    • encodingErrorPolicy 'error' and 'replace'
    • convenient use of zero-based values because java/scala APIs for I/O are all zero-based
    • convenient use of 1-based values because DFDL is 1-based, so debug/trace and such all want to be 1-based values.

    A goal is that this API does not allocate objects as I/O operations are performed unless boxed objects are being returned. For example getSignedLong(...) should not allocate anything per call; however, getSignedBigInt(...) does, because a BigInt is a heap-allocated object.

    Internal buffers and such may be dropped/resized/reallocated as needed during method calls. The point is not that you never allocate. It's that the per-I/O operation overhead does not require object allocation for every data-accessing method call.

    Similarly, text data can be retrieved into a char buffer, and the char buffer can provide a limit on size (available capacity of the char buffer) in characters. The text can be examined in the char buffer, or a string can be created from the char buffer's contents when needed.

    This API is very stateful, and not-thread-safe i.e., each thread must have its own object. Some of this is inherent in this API style, and some is inherited from the underlying objects this API uses (such as CharsetDecoder).

    This API is also intended to support some very highly optimized implementations. For example, if the schemas is all text, and the encoding is known to be iso-8859-1, then there is no notion of a decode error, and every byte value, extended to a Char value, *is* the Unicode codepoint. No decoder needs to be used in this case and this API becomes a quite thin layer on top of a java.io.BufferedStream.

    Terminology:

    Available Data - this is the data that is between the current bit position, and some limit. The limit can either be set (via setBitLimit calls), or it can be limited by tunable values, or implementation-specific upper limits, or it can simply be the end of the data stream.

    Different kinds of DataInputStreams can have different limits. For example, a File-based DataInputStream may have no limit on the forward speculation distance, because the file can be randomly accessed if necessary. Contrasting that with a data stream that is directly connected to a network socket may have a upper limit on the amount of data that it is willing to buffer.

    None of this is a commitment that this API will in fact have multiple specialized implementations. It's just a possibility for the future.

    Implementation Note: It is the implementation of this interface which implements the Bucket Algorithm as described on the Daffodil Wiki. All of that bucket stuff is beneath this API.

    In general, this API tries to return a value rather than throw exceptions whenever the behavior may be very common. This leaves it up to the caller to decide whether or not to throw an exception, and avoids the overhead of try-catch blocks. The exception to this rule are the methods that involve character decoding for textual data. These methods may throw CharacterCodingException when the encoding error policy is 'error'.

  3. object DirectOrBufferedDataOutputStream

    Permalink
  4. object FastAsciiToUnicodeConverter

    Permalink

    Highly optimized converter for Ascii to Unicode

  5. object InputSourceDataInputStream

    Permalink

    Factory for creating this type of DataInputStream

    Factory for creating this type of DataInputStream

    Provides only core input sources to avoid making any assumptions about the incoming data (i.e. should a File be mapped to a ByteBuffer or be streamed as an InputStream). The user knows better than us, so have them make the decision.

  6. object Utils

    Permalink

Ungrouped