io.dylemma.spac
SPaC (short for "Streaming Parser Combinators") is a library for building stream consumers in a declarative style, specialized for tree-like data types like XML and JSON.
Many utilities for handling XML and JSON data involve parsing the entire "document" to some DOM model, then inspecting and transforming that model to extract information. The downside to these utilities is that when the document is very large, the DOM may not fit in memory. The workaround for this type of problem is to treat the document as a stream of "events", e.g. "StartElement" and "EndElement" for XML, or "StartObject" and "EndObject" for JSON. The downside to this workaround is that writing code to handle these streams can be complicated and error-prone, especially when the DOM is complicated.
SPaC's goal is to drastically simplify the process of creating code to handle these streams.
This package contains the "core" SPaC traits; Parser
, Transformer
, Splitter
, and ContextMatcher
.
See the xml
and json
subpackages (provided by the xml-spac
and json-spac
libraries respectively)
for specific utilities related to handling XML and JSON event streams.
Type members
Classlikes
Represents a location in code that called a method.
An implicit instance of this class will be automatically derived by a macro on-demand.
CallerPos's ultimate purpose is to be present in certain SpacTraceElement
classes,
helping to point to specific splitters or parse
calls in the event of a parsing error.
Represents a location in code that called a method.
An implicit instance of this class will be automatically derived by a macro on-demand.
CallerPos's ultimate purpose is to be present in certain SpacTraceElement
classes,
helping to point to specific splitters or parse
calls in the event of a parsing error.
- Companion:
- object
Represents either entering (ContextPush
) or exiting (ContextPop
) some matched context within a stream of inputs.
Represents either entering (ContextPush
) or exiting (ContextPop
) some matched context within a stream of inputs.
ContextChanges will generally be used to designate "sub-stream" boundaries, i.e. a selection of xml elements from within a stream, but may be used more generally to attach a stack-like state to stream transformers.
- Type parameters:
- C
The type of the matched context
- In
The value type of the elements in the stream being inspected
A map-like representation of some location in a stream, used like stack trace elements for reporting errors in stream processing.
A map-like representation of some location in a stream, used like stack trace elements for reporting errors in stream processing.
- Companion:
- object
An object responsible for inspecting a stack of StartElement
events and determining if they correspond
to some "context" value of type A
.
An object responsible for inspecting a stack of StartElement
events and determining if they correspond
to some "context" value of type A
.
ContextMatcher
s play a primary role in splitting an XML event stream into "substreams", i.e. each
substream is defined as the series of consecutive events during which the XML tag stack matches a context.
ContextMatcher
s are intended to be transformed and combined with each other in order to build up
more complex matching functionality. See also: SingleElementContextMatcher
, which contains additional
combination methods and some specialized transformation methods.
- Type parameters:
- A
The type of the matched context.
- Companion:
- object
Marker trait used by SpacTraceElement.InInput
to extract location information from inputs that cause parsing exceptions.
Marker trait used by SpacTraceElement.InInput
to extract location information from inputs that cause parsing exceptions.
Primary "spac" abstraction which represents a sink for data events.
Primary "spac" abstraction which represents a sink for data events.
Parsers are responsible for interpreting a stream of In
events as a single result of type Out
.
The actual interpretation is performed by a Parser.Handler
which the Parser is responsible for constructing.
Handlers may be internally-mutable, and so they are generally only constructed by the parse
helper methods or by other handlers.
Parsers themselves are immutable, acting as "handler factories", and so they may be freely reused.
A parser differs from typical "fold" operations in that it may choose to abort early with a result, leaving the remainder of the data stream untouched.
- Type parameters:
- In
event/input type
- Out
result type
- Companion:
- object
Convenience version of the Parser
companion object, which provides parser constructors with the In
type already specified.
Integrations for XML and JSON will generally create implicit classes to add methods to this class for In = XmlEvent
and In = JsonEvent
respectively.
Convenience version of the Parser
companion object, which provides parser constructors with the In
type already specified.
Integrations for XML and JSON will generally create implicit classes to add methods to this class for In = XmlEvent
and In = JsonEvent
respectively.
Value used by Transformer.Handler
to indicate to its upstream producer
whether or not the handler wants to continue receiving values.
Value used by Transformer.Handler
to indicate to its upstream producer
whether or not the handler wants to continue receiving values.
- Companion:
- object
Specialization of ContextMatcher which only checks the first element in the stack for matching operations. Transformation operations on single-element matchers will yield other single-element matchers (rather than the base ContextMatcher type). Combination operations involving other single-element matchers will also yield single-element matchers. SingleElementContextMatchers form the building blocks of more complex matchers.
Specialization of ContextMatcher which only checks the first element in the stack for matching operations. Transformation operations on single-element matchers will yield other single-element matchers (rather than the base ContextMatcher type). Combination operations involving other single-element matchers will also yield single-element matchers. SingleElementContextMatchers form the building blocks of more complex matchers.
- Type parameters:
- A
The type of the matched context.
- Companion:
- object
A Source[A]
is like an Iterable[A]
but with a built-in assumption that the iterator may be closeable,
intended for use as a convenient argument to a Parser
's parse
method.
A Source[A]
is like an Iterable[A]
but with a built-in assumption that the iterator may be closeable,
intended for use as a convenient argument to a Parser
's parse
method.
The spac core library avoids depending on Cats-Effect and FS2 (to avoid introducing "dependency hell"
situations for projects that must depend on pre-3.0 versions of those projects), so this class acts as
a stand-in for both cats.effect.Resource
and fs2.Stream
for non-async usage.
- Type parameters:
- A
Type of item emitted by Iterators from this Source
- Companion:
- object
Note: this companion object provides a few very basic Source-constructor helpers,
but the real useful functionality is provided by the "parser backend" modules like
xml-spac-javax
and json-spac-jackson
, via JavaxSource
and JacksonSource
.
Note: this companion object provides a few very basic Source-constructor helpers,
but the real useful functionality is provided by the "parser backend" modules like
xml-spac-javax
and json-spac-jackson
, via JavaxSource
and JacksonSource
.
- Companion:
- class
Base class for all exceptions thrown by Spac parsers.
A SpacException
holds a spacTrace
, which is similar to a stack trace, but uses a specialized element type
to hold helpful debug information about the cause and context of the exception, and the input that caused it.
Base class for all exceptions thrown by Spac parsers.
A SpacException
holds a spacTrace
, which is similar to a stack trace, but uses a specialized element type
to hold helpful debug information about the cause and context of the exception, and the input that caused it.
SpacException uses NoStackTrace
to suppress the usual stack trace, since exceptions thrown by a Parser
will not have useful stack trace information for end users of the Spac framework.
- Type parameters:
- Self
self-type used in the type signature of
withSpacTrace
- Value parameters:
- detail
a
Left
containing a spac-specific error message, or aRight
containing some non-Spac exception that was caught inside a Parser- spacTrace
chain of SpacTraceElements, with the "top" of the stack at the beginning, and the "bottom" of the stack at the end
- Companion:
- object
A play on words vs StackTraceElement, a Spac trace element represents some contextual location inside the logic of a spac Parser,
or the location of an input to that parser.
SpacTraceElement
s are used by SpacException
to provide useful debugging information for when a Parser fails.
A play on words vs StackTraceElement, a Spac trace element represents some contextual location inside the logic of a spac Parser,
or the location of an input to that parser.
SpacTraceElement
s are used by SpacException
to provide useful debugging information for when a Parser fails.
- Companion:
- object
Primary "spac" abstraction that acts as a selector for sub-streams within a single input stream.
Primary "spac" abstraction that acts as a selector for sub-streams within a single input stream.
A "sub-stream" is some series of consecutive values from the original stream, identified by a "context" value. Sub-streams do not overlap with each other.
For example, when handling a stream of XML events, you might want to create a Splitter that identifies
the events representing elements at a specific location within the XML; something like an XPATH that operates on streams.
When using xml-spac
, you might construct a splitter like Splitter.xml("rootElem" \ "things" \ "thing")
.
This would identify a new sub-stream for each <thing>
element that appears inside a <things>
element, inside the <rootElem>
element.
An example sub-stream for a <thing>
element might be ElemStart("thing"), Text("hello"), ElemEnd("thing")
.
A Splitter's general goal is to attach a Parser or Transformer to each sub-stream, passing the contents of that sub-stream
through the attached Parser or Transformer in order to get an interpretation of that sub-stream (i.e. the Parser's result,
or some emitted outputs from a Transformer).
With the <thing>
example above, you might attach a parser that concatenates the context all Text events it sees.
I.e. XmlParser.forText
. Since a separate parser handler will run for each sub-stream, this becomes something like
"A stream of Strings which each represent the concatenated text from an individual <thing>
element".
- Type parameters:
- C
Context type used to identify each sub-stream
- In
Data event type for the input stream
- Companion:
- object
Outcome of a StackLike[In, Elem]
, indicating whether a given input was a stack push/pop,
and whether that push/pop should be treated as happening before or after the input that caused it.
Outcome of a StackLike[In, Elem]
, indicating whether a given input was a stack push/pop,
and whether that push/pop should be treated as happening before or after the input that caused it.
- Companion:
- object
Typeclass that perceives a subset of In
values as either "stack push" or "stack pop" events.
For example, with XML, an ElemStart
event can be perceived as a "stack push", and a corresponding
ElemEnd
event can be preceived as a "stack pop".
Typeclass that perceives a subset of In
values as either "stack push" or "stack pop" events.
For example, with XML, an ElemStart
event can be perceived as a "stack push", and a corresponding
ElemEnd
event can be preceived as a "stack pop".
Primary "spac" abstraction which represents a transformation stage for a stream of data events
Primary "spac" abstraction which represents a transformation stage for a stream of data events
Transformers effectively transform a stream of In
events into a stream of Out
events.
The actual stream handling logic is defined by a Transformer.Handler
, which a Transformer
is responsible for constructing.
Handlers may be internally-mutable, and so they are generally only constructed by other handlers.
Transformers themselves are immutable, acting as "handler factories", and so they may be freely reused.
A transformer may choose to abort in response to any input event, as well as emit any number of outputs in response to an input event or the EOF signal.
- Type parameters:
- In
The incoming event type
- Out
The outgoing event type
- Companion:
- object
Convenience version of the Transformer
companion object,
which provides transformer constructors with the In
type already specified.
Convenience version of the Transformer
companion object,
which provides transformer constructors with the In
type already specified.
Type-level tuple reduction function that treats Unit
as an Identity.
For example:
Type-level tuple reduction function that treats Unit
as an Identity.
For example:
TypeReduce[(Unit, Unit)]{ type Out = Unit }
TypeReduce[(T, Unit)]{ type Out = T }
TypeReduce[(Unit, T)]{ type Out = T }
TypeReduce[(L, R)]{ type Out = (L, R) }
- Companion:
- object
Typeclass for collections that can be efficiently split into a
head
element and a tail
collection as long as they are not empty.
Typeclass for collections that can be efficiently split into a
head
element and a tail
collection as long as they are not empty.
- Companion:
- object