Package

org.apache.daffodil

grammar

Permalink

package grammar

Grammar Mixins for Schema Compilation

DFDL is defined using a data syntax grammar. Daffodil compiles a DFDL schema into runtime data structures using a technique that tries to be faithful to this notion of a data syntax grammar. The exact data syntax grammar used in the DFDL specification is not really suitable to base an implementation on, but where possible we have named the grammar production rules, and terminals of the grammar consistently with the DFDL specification.

The grammar rules are members of the DSOM traits/classes that use a grammar notation, organized as grammar rules the applicability of which is controlled by boolean test guard expressions.

In addition to the grammar rules, these mixins contain the methods that analzye the DFDL schema with its DFDL annotations, to determine whether the grammar rule is applicable. Most optimizations in the Daffodil schema compiler are implemented by way of a grammar rule or Terminal which is included or excluded (aka guarded) by an optimization.

The grammar rule mixins all inherit from the Gram trait which provides operators for expressing the rules. The concrete classes are the terminals of the grammar (instances of Terminal) and these are either primitives or combinators. The primitves and combinators are defined in the org.apache.daffodil.grammar.primitives package.

This all works very much like Scala's scala.util.parsing.combinator library, which is described in the Programming in Scala book in the chapter on Combinator Parsing. However, Daffodil's grammar adds the notion of rich predicate guards controlling the rules, and of course the result of the grammar is an entirely different data structure. But but the way rules are expressed and use of operators like "~" and "||" to create a little grammar composition language are similar.

It is best to illustrate how the grammar works by an example drawn from model groups. This example no longer matches the code of the actual implementation, but illustrates the ideas behind the grammar:

Example:
  1. trait ModelGroupMixin extends ... {
    lazy val modelGroup = groupLeftFraming ~ groupContent ~ groupRightFraming
    lazy val groupLeftFraming = LeadingSkipRegion(this) ~ AlignmentFill(this)
    lazy val groupRightFraming = TrailingSkipRegion(this)
    }

    Non-terminals of the data syntax grammar are ordinary lazy val members named beginning with lower case letter. Terminals of the grammar are classes and so are named beginning with an upper case letter. A terminal like LeadingSkipRegion will optimize itself out by evaluating a guard predicate test. Basically, if the DFDL schema has a dfdl:leadingSkip property of '0', then the LeadingSkipRegion of the data syntax grammar is zero width, so this terminal defines itself so that the isEmpty method returns true. That enables the "~" operator to see that it has an empty grammar term on it's left, hence, the "~" operator can reduce to just the right grammar term. The right grammar term, AlignmentFill, requires a much more sophisticated analysis of the schema and properties, but ultimately could also decide that it is not needed, and so isEmpty. Then the "~" operator will see that both sides are empty and it itself is then empty. That allows the modelGroup rule to recognize that for this DFDL schema there is no groupLeftFraming. The net result of this is all the grammar regions that are not applicable disappear from the grammar. What is left is a nest of combinators and primitives suitable for this specific DFDL schema's runtime to be generated. Lazy evaluation and by-name argument passing insure that this occurs without evaluating any irrelevant grammar rules. At the end, the schema compiler generates a parser or unparser instance by "asking" the now-optimized grammar data structure for a parser or unparser. This means really by invoking the corresponding method (parser() or unparser()) on the grammar object. Recursively each combinator constructs an instance of a class that implements org.apache.daffodil.processors.parsers.Parser (or org.apache.daffodil.processors.unparsers.Unparser) and that instance recursively calls the parsers (or unparsers) generated from the parts of the grammar which did not optimize themselves away. Primitves construct atomic primitive parsers (and unparsers) for things like delimiters, alignment regions, or simple type values.

    Futures

    Many grammar rules are defined as instances of the Prod class using the prod method. This is not strictly speaking needed (the example above doesn't use it), but is intended to define named extension points in the grammar for a future DFDL extension capability. The idea is that the named productions would be the places where external extensions to the grammar would be examined and utilized to augment the internal grammar rule definitions. These externally defined productions could then incorporate into the grammar, new externally defined primitives, or possibly even combinators, which ultimately generate calls to externally defined parser (or unparser) instances. Currently the use of prod is ad-hoc and haphazard.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. grammar
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. trait AlignedMixin extends GrammarMixin

    Permalink
  2. case class AlignmentMultipleOf(nBits: Long) extends Product with Serializable

    Permalink
  3. abstract class BinaryGram extends Gram

    Permalink

    BinaryGram isn't really 'binary' it's n-ary.

    BinaryGram isn't really 'binary' it's n-ary. It is called binary because it comes from the binary grammar operations ~ and |, but in the abstract syntax tree we want these flattened to lists of children so that a ~ b ~ c is ONE SeqComp with 3 children, not a tree of two binary SeqComps.

  4. trait BitOrderMixin extends GrammarMixin with ByteOrderAnalysisMixin

    Permalink
  5. trait ByteOrderAnalysisMixin extends GrammarMixin

    Permalink
  6. trait ChoiceGrammarMixin extends GrammarMixin with ChoiceTermRuntime1Mixin

    Permalink
  7. trait ElementBaseGrammarMixin extends InitiatedTerminatedMixin with AlignedMixin with HasStatementsGrammarMixin with PaddingInfoMixin with ElementBaseRuntime1Mixin

    Permalink
  8. abstract class Gram extends OOLAGHostImpl with BasicComponent with GramRuntime1Mixin

    Permalink

    Gram - short for "Grammar Term"

    Gram - short for "Grammar Term"

    These are the objects in the grammar.

    This grammar really differs a great deal from what we find in the DFDL specification because it actually has to be operationalized.

    Many of the grammar productions really aren't terribly grammar-like in appearance, because the conditional logic overwhelms the aspects that look like grammar productions.

    Another way to think of this as it's just the "second tree". Daffodil starts by creating the DSOM "first tree" which is just the AST (Abstract Syntax Tree) of the DFDL language. Then by way of compiling creates this Gram tree from that which enables a variety of optimizations based on a simple rules-with-guards idiom.

    This Gram tree is then a generator of a Parser and an Unparser which incorporate both the parsing/unparsing logic and all RuntimeData structures in their members. If something completely optimizes out then it becomes the EmptyGram which other Gram combinators recognize and optimize out.

  9. trait GrammarMixin extends AnyRef

    Permalink
  10. trait HasStatementsGrammarMixin extends GrammarMixin

    Permalink
  11. trait LengthApprox extends AnyRef

    Permalink
  12. case class LengthExact(nBits: Long) extends LengthApprox with Product with Serializable

    Permalink
  13. case class LengthMultipleOf(nBits: Long) extends LengthApprox with Product with Serializable

    Permalink
  14. trait LocalElementGrammarMixin extends GrammarMixin

    Permalink
  15. trait ModelGroupGrammarMixin extends InitiatedTerminatedMixin with AlignedMixin with HasStatementsGrammarMixin with GroupCommonAGMixin with ModelGroupRuntime1Mixin

    Permalink
  16. abstract class NamedGram extends Gram

    Permalink
  17. final class Prod extends NamedGram

    Permalink

    Prod or Grammar Production

    Prod or Grammar Production

    Note the call by name on the GramArg. We don't evaluate the GramArg at all unless the guard is true.

    Guards are used so we can have grammars that include all possibilities, but where examining the format properties specifically would indicate that some of those possibilities are precluded. The guard causes that term to just splice itself out of the grammar.

    Note that it is crucial that the guardArg is passed by value, and the gramArg is passed by name.

    Prod objects are not required. They essentially provide some useful debug capability because a grammar term object will display as it's name, not as some anonymous object.

  18. trait RootGrammarMixin extends LocalElementGrammarMixin

    Permalink
  19. class SeqComp extends BinaryGram

    Permalink
  20. trait SequenceGrammarMixin extends GrammarMixin with SequenceTermRuntime1Mixin

    Permalink
  21. trait TermGrammarMixin extends AlignedMixin with BitOrderMixin with TermRuntime1Mixin

    Permalink
  22. abstract class Terminal extends NamedGram

    Permalink

    Primitives will derive from this base

Value Members

  1. object ENoWarn

    Permalink
  2. object EmptyGram extends Gram

    Permalink
  3. object INoWarn

    Permalink
  4. object SeqComp

    Permalink

    Sequential composition of grammar terms.

    Sequential composition of grammar terms.

    Flattens nests of these into a flat list of terms.

  5. package primitives

    Permalink

Inherited from AnyRef

Inherited from Any

Ungrouped