



package grammar

Grammar Mixins for Schema Compilation

DFDL is defined using a data syntax grammar. Daffodil compiles a DFDL schema into runtime data structures using a technique that tries to be faithful to this notion of a data syntax grammar. The exact data syntax grammar used in the DFDL specification is not really suitable to base an implementation on, but where possible we have named the grammar production rules, and terminals of the grammar consistently with the DFDL specification.

The grammar rules are members of the DSOM traits/classes that use a grammar notation, organized as grammar rules the applicability of which is controlled by boolean test guard expressions.

In addition to the grammar rules, these mixins contain the methods that analzye the DFDL schema with its DFDL annotations, to determine whether the grammar rule is applicable. Most optimizations in the Daffodil schema compiler are implemented by way of a grammar rule or Terminal which is included or excluded (aka guarded) by an optimization.

The grammar rule mixins all inherit from the Gram trait which provides operators for expressing the rules. The concrete classes are the terminals of the grammar (instances of Terminal) and these are either primitives or combinators. The primitves and combinators are defined in the org.apache.daffodil.grammar.primitives package.

This all works very much like Scala's scala.util.parsing.combinator library, which is described in the Programming in Scala book in the chapter on Combinator Parsing. However, Daffodil's grammar adds the notion of rich predicate guards controlling the rules, and of course the result of the grammar is an entirely different data structure. But but the way rules are expressed and use of operators like "~" and "||" to create a little grammar composition language are similar.

It is best to illustrate how the grammar works by an example drawn from model groups. This example no longer matches the code of the actual implementation, but illustrates the ideas behind the grammar:

  1. trait ModelGroupMixin extends ... {
    lazy val modelGroup = groupLeftFraming ~ groupContent ~ groupRightFraming
    lazy val groupLeftFraming = LeadingSkipRegion(this) ~ AlignmentFill(this)
    lazy val groupRightFraming = TrailingSkipRegion(this)

    Non-terminals of the data syntax grammar are ordinary lazy val members named beginning with lower case letter. Terminals of the grammar are classes and so are named beginning with an upper case letter. A terminal like LeadingSkipRegion will optimize itself out by evaluating a guard predicate test. Basically, if the DFDL schema has a dfdl:leadingSkip property of '0', then the LeadingSkipRegion of the data syntax grammar is zero width, so this terminal defines itself so that the isEmpty method returns true. That enables the "~" operator to see that it has an empty grammar term on it's left, hence, the "~" operator can reduce to just the right grammar term. The right grammar term, AlignmentFill, requires a much more sophisticated analysis of the schema and properties, but ultimately could also decide that it is not needed, and so isEmpty. Then the "~" operator will see that both sides are empty and it itself is then empty. That allows the modelGroup rule to recognize that for this DFDL schema there is no groupLeftFraming. The net result of this is all the grammar regions that are not applicable disappear from the grammar. What is left is a nest of combinators and primitives suitable for this specific DFDL schema's runtime to be generated. Lazy evaluation and by-name argument passing insure that this occurs without evaluating any irrelevant grammar rules. At the end, the schema compiler generates a parser or unparser instance by "asking" the now-optimized grammar data structure for a parser or unparser. This means really by invoking the corresponding method (parser() or unparser()) on the grammar object. Recursively each combinator constructs an instance of a class that implements org.apache.daffodil.processors.parsers.Parser (or org.apache.daffodil.processors.unparsers.Unparser) and that instance recursively calls the parsers (or unparsers) generated from the parts of the grammar which did not optimize themselves away. Primitves construct atomic primitive parsers (and unparsers) for things like delimiters, alignment regions, or simple type values.


    Many grammar rules are defined as instances of the Prod class using the prod method. This is not strictly speaking needed (the example above doesn't use it), but is intended to define named extension points in the grammar for a future DFDL extension capability. The idea is that the named productions would be the places where external extensions to the grammar would be examined and utilized to augment the internal grammar rule definitions. These externally defined productions could then incorporate into the grammar, new externally defined primitives, or possibly even combinators, which ultimately generate calls to externally defined parser (or unparser) instances. Currently the use of prod is ad-hoc and haphazard.

Type Members

  1. trait AlignedMixin extends GrammarMixin

  2. case class AlignmentMultipleOf(nBits: Long) extends Product with Serializable

  3. class AltComp extends BinaryGram with HasNoUnparser

  4. abstract class BinaryGram extends Gram


    BinaryGram isn't really 'binary' it's n-ary.

    BinaryGram isn't really 'binary' it's n-ary. It is called binary because it comes from the binary grammar operations ~ and |, but in the abstract syntax tree we want these flattened to lists of children so that a ~ b ~ c is ONE SeqComp with 3 children, not a tree of two binary SeqComps.

  5. trait BitOrderMixin extends GrammarMixin with ByteOrderAnalysisMixin

  6. trait ByteOrderAnalysisMixin extends GrammarMixin

  7. trait ChoiceGrammarMixin extends GrammarMixin

  8. trait ComplexTypeBaseGrammarMixin extends GrammarMixin

  9. trait ElementBaseGrammarMixin extends InitiatedTerminatedMixin with AlignedMixin with HasStatementsGrammarMixin with PaddingInfoMixin

  10. trait ElementReferenceGrammarMixin extends AnyRef

  11. abstract class Gram extends OOLAGHostImpl


    Gram - short for "Grammar Term"

    Gram - short for "Grammar Term"

    These are the objects in the grammar. The grammar is supposed to be roughly the grammar in the DFDL specification, but some differences are expected because this one has to actually be operationalized.

  12. trait GrammarMixin extends AnyRef

  13. trait GroupRefGrammarMixin extends GrammarMixin

  14. trait HasNoUnparser extends AnyRef

  15. trait HasStatementsGrammarMixin extends GrammarMixin

  16. trait LengthApprox extends AnyRef

  17. case class LengthExact(nBits: Long) extends LengthApprox with Product with Serializable

  18. case class LengthMultipleOf(nBits: Long) extends LengthApprox with Product with Serializable

  19. trait LocalElementGrammarMixin extends GrammarMixin

  20. trait ModelGroupGrammarMixin extends InitiatedTerminatedMixin with AlignedMixin with HasStatementsGrammarMixin with GroupCommonAGMixin

  21. abstract class NamedGram extends Gram

  22. final class Prod extends NamedGram


    Prod or Grammar Production

    Prod or Grammar Production

    Note the call by name on the GramArg. We don't evaluate the GramArg at all unless the guard is true.

    Guards are used so we can have grammars that include all possibilities, but where examining the format properties specifically would indicate that some of those possibilities are precluded. The guard causes that term to just splice itself out of the grammar.

    Note that it is crucial that the guardArg is passed by value, and the gramArg is passed by name.

    Prod objects are not required. They essentially provide some useful debug capability because a grammar term object will display as it's name, not as some anonymous object.

  23. trait RootGrammarMixin extends LocalElementGrammarMixin

  24. class SeqComp extends BinaryGram

  25. trait SequenceGrammarMixin extends GrammarMixin

  26. trait TermGrammarMixin extends AlignedMixin with BitOrderMixin

  27. abstract class Terminal extends NamedGram


    Primitives will derive from this base

  28. abstract class UnaryGram extends NamedGram


Value Members

  1. object AltComp


    Alternative composition of grammar terms.

    Alternative composition of grammar terms.

    Flattens nests of these into a single flat list.

  2. object ENoWarn

  3. object ENoWarn2

  4. object EmptyGram extends Gram

  5. object ErrorGram extends Gram with HasNoUnparser

  6. object INoWarn

  7. object SeqComp


    Sequential composition of grammar terms.

    Sequential composition of grammar terms.

    Flattens nests of these into a flat list of terms.

  8. package primitives


