An integer which is the alignment of this term.
An integer which is the alignment of this term. This takes into account the representation, type, charset encoding and alignment-related properties.
Anything annotated must be able to construct the appropriate DFDLAnnotation object from the xml.
Anything annotated must be able to construct the appropriate DFDLAnnotation object from the xml.
The DFDL annotations on the component, as objects that are subtypes of DFDLAnnotation.
The DFDL annotations on the component, as objects that are subtypes of DFDLAnnotation.
Override to perform necessary checks that require information about the concrete Term.
Override to perform necessary checks that require information about the concrete Term.
This avoids the need for the checking code to have a backpointer to the Term.
Provides unordered sequence checks.
Provides unordered sequence checks. Will SDE if invalid.
check for overlap.
check for overlap.
Insures at compile time if the separator and terminator are both statically known, that they are not the same.
Insures at compile time if the separator and terminator are both statically known, that they are not the same.
If there is a possible terminator that could be after this, or enclosing group separator, that could be after this, then it has to not be ambiguous with this sequence's separator.
Note that checking, in general, for whether two delimiter DFA things can accept the same string, or one can accept a prefix of something the other accepts, is generally hard, and even if someone creates things with some ambiguity of that sort, real data might not ever run into that ambiguity. So spurious warnings are a possible outcome.
DFDL specifically does not check for, nor require detection of this sort of ambiguity at runtime or at compile time. But when it's completely obvious at compile time it's sensible to give an error.
TODO: An improvement - the enclosing sequence object should really be passing a list of possible terminating markup down to each sequence child object. Those that aren't runtime-valued exprsesions could be checked for ambiguity.
For now, we just check if this sequence itself has a constant separator and terminator that are the same. That is, we're checking for an obvious kind of cut/paste error by the schema author.
Perform checking of an object against the supplied Term arg.
Perform checking of an object against the supplied Term arg.
Used to recursively go through Terms and look for DFDL properties that have not been accessed and record it as a warning.
Used to recursively go through Terms and look for DFDL properties that have not been accessed and record it as a warning. This function uses the property cache state to determine which properties have been access, so this function must only be called after all property accesses are complete (e.g. schema compilation has finished) to ensure there are no false positives.
Abbreviation.
Abbreviation. We use this very often.
Set of elements referenced from an expression in the scope of this term.
Set of elements referenced from an expression in the scope of this term.
Specific to certain function call contexts e.g., only elements referenced by dfdl:valueLength or dfdl:contentLength.
Separated by parser/unparser since parsers have to derive from dfdl:inputValueCalc, and must include discriminators and assert test expressions. Unparsers must derive from dfdl:outputValueCalc and exclude discriminators and asserts. Both must include setVariable/newVariableInstance, and property expressions are nearly the same. There are some unparser-specfic properties that take runtime-valued expressions - dfdl:outputNewLine is one example.
Any element referenced from an expression in the scope of this term is in this set.
Any element referenced from an expression in the scope of this term is in this set.
Always false as model groups can't be elements.
Always false as model groups can't be elements.
For streaming unparser, determines if this Term could have suspensions associated with it.
For streaming unparser, determines if this Term could have suspensions associated with it.
True if this term is known to have some text aspect.
True if this term is known to have some text aspect. This can be the value, or it can be delimiters.
False only if this term cannot ever have text in it. Example: a sequence with no delimiters. Example: a binary int with no delimiters.
Note: this is not recursive - it does not roll-up from children terms. TODO: it does have to deal with the prefix length situation. The type of the prefix may be textual.
Override in element base to take simple type or prefix length situations into account
Mandatory text alignment for delimiters
Mandatory text alignment for delimiters
Here we establish an invariant which is that every annotatable schema component has, definitely, has an annotation object.
Here we establish an invariant which is that every annotatable schema component has, definitely, has an annotation object. It may have no properties on it, but it will be there. Hence, we can delegate various property-related attribute calculations to it.
To realize this, every concrete class must implement (or inherit) an implementation of emptyFormatFactory, which constructs an empty format annotation, and isMyFormatAnnotation which tests if an annotation is the corresponding kind.
Given that, formatAnnotation then either finds the right annotation, or constructs one, but our invariant is imposed. There *is* a formatAnnotation.
The enclosing component, and follows back-references from types to their elements, from globalElementDef to elementRefs, from simpleType defs to derived simpletype defs, from global group defs to group refs
The enclosing component, and follows back-references from types to their elements, from globalElementDef to elementRefs, from simpleType defs to derived simpletype defs, from global group defs to group refs
Note: the enclosing component of a global element or global group referenced from a element ref or group ref, is NOT the ref object, but the component that contains the ref object
Define this for schema components that have back-references to ref objects.
Define this for schema components that have back-references to ref objects. So group def to group ref, globalelementdecl to element ref, type to element, base type to derived type.
Not for format annotations however. We don't backpoint those to other format annotations that ref them.
All schema components except the root have an enclosing element.
All schema components except the root have an enclosing element.
Elements that enclose this.
Elements that enclose this.
If this is already an element, this still walks outward to find the next tier out.
The terms that can enclose this.
The terms that can enclose this.
Even if this is already a term, this walks outward to find those enclosing this.
public for unit testing use.
public for unit testing use.
For unit testing, we want to create GrammarMixin objects that are not schema components.
For unit testing, we want to create GrammarMixin objects that are not schema components. So we can't use a self-type here. Instead we define this abstract grammarContext.
Returns the group members that are elements or model groups.
Returns the group members that are elements or model groups.
True if this term has initiator, terminator, or separator that are either statically present, or there is an expression.
True if this term has initiator, terminator, or separator that are either statically present, or there is an expression. (Such expressions are not allowed to evaluate to "" - you can't turn off a delimiter by providing "" at runtime. Minimum length is 1 for these at runtime.
Override in SequenceTermBase to also check for separator.
FIXME: DAFFODIL-2132.
FIXME: DAFFODIL-2132. This tells us if framing is expressed on the schema. It does NOT tell us if that framing occupies bits in the data stream or not.
True if the term has an initiator expressed on it.
True if the term has an initiator expressed on it.
Do not confuse with the concept of the delimiter being able to match or not match zero-length data. Whether the representation of a term in the data stream "has an initiator", as in the initator occupies a non-zero number of bits in the data stream, is an entirely different question.
True if the term has some syntax itself or recursively within itself that must appear in the data stream.
True if the term has some syntax itself or recursively within itself that must appear in the data stream.
False only if the term has possibly no representation whatsoever in the data stream.
These are static properties even though the delimiters can have runtime-computed values.
These are static properties even though the delimiters can have runtime-computed values. The existence of an expression to compute a delimiter is assumed to imply a non-zero-length, aka a real delimiter.
True if the term has a separator expressed on it.
True if the term has a separator expressed on it.
Do not confuse with the concept of the delimiter being able to match or not match zero-length data. Whether the representation of a term in the data stream "has a separator", as in a specific separator occupies a non-zero number of bits, is an entirely different question.
Does this term have always have statically required instances in the data stream.
Does this term have always have statically required instances in the data stream.
This excludes elements that have no representation e.g., elements with dfdl:inputValueCalc.
Terms that are optional either via element having zero occurrences, or via a choice branch fail this test.
True if the term has a terminator expressed on it.
True if the term has a terminator expressed on it.
Do not confuse with the concept of the delimiter being able to match or not match zero-length data. Whether the representation of a term in the data stream "has a terminator", as in the terminator occupies a non-zero number of bits, is an entirely different question.
True if this term has no alignment properties that would explicitly create a need to align in a way that is not on a suitable boundary for a character.
True if this term has no alignment properties that would explicitly create a need to align in a way that is not on a suitable boundary for a character.
Not the same as AlignedMixin.isKnownToBeTextAligned. That depends on this but goes further to consider whether alignment is achieved even when this is false.
An array can have more than 1 occurrence.
An array can have more than 1 occurrence.
An optional element (minOccurs=0, maxOccurs=1) is an array only if occursCountKind is parsed, because then the max/min are ignored.
Whether the component is hidden.
Whether the component is hidden.
Override this in the components that can hide - SequenceGroupRef and ChoiceGroupRef
Character encoding common attributes
Character encoding common attributes
Note that since encoding can be computed at runtime, we create values to tell us if the encoding is known or not so that we can decide things at compile time when possible.
Conservatively determines if this term is known to have the same bit order as the previous thing.
Conservatively determines if this term is known to have the same bit order as the previous thing.
If uncertain, returns false.
true if we can statically determine that the start of this will be properly aligned by where the prior thing left us positioned.
true if we can statically determine that the start of this will be properly aligned by where the prior thing left us positioned. Hence we are guaranteed to be properly aligned.
True if alignment for a text feature of this Term (e.g., an initiator) is provably not needed, either because there is no requirement for such alignment, or we can prove that the required alignment is already established.
True if alignment for a text feature of this Term (e.g., an initiator) is provably not needed, either because there is no requirement for such alignment, or we can prove that the required alignment is already established.
This goes further TermEncodingMixin.hasTextAlignment because it considers the surrounding context meeting the alignment needs.
True if this term is the last one in the enclosing sequence that is represented in the data stream.
True if this term is the last one in the enclosing sequence that is represented in the data stream. That is, it is not an element with dfdl:inputValueCalc.
This means whether the enclosing sequence's separator (if one is defined) is relevant.
True if this element itself consists only of text.
True if this element itself consists only of text. No binary stuff like alignment or skips.
Not recursive into contained children.
Determines if the element is optional, as in has zero or one instance only.
Determines if the element is optional, as in has zero or one instance only.
There are two senses of optional
1) Optional as in "might not be present" but for any reason. Consistent with this is Required meaning must occur but for any reason. So all the occurrences of an array that has fixed number of occurrences are required, and some of the occurrences of an array that has a variable number of occurrences are optional.
2) Optional is in minOccurs="0" maxOccurs="1".
Consistent with (2) is defining array as maxOccurs >= 2, and Required as minOccurs=maxOccurs=1, but there are also special cases for occursCountKind parsed and stopValue since they don't examine min/max occurs - they are only used for validation in those occursCountKinds.
The DFDL spec is not entirely consistent here either I don't believe.
The concept of potentially trailing is defined in the DFDL specification.
The concept of potentially trailing is defined in the DFDL specification.
This concept applies to terms that are direct children of a sequence only.
It is true for terms that may be absent from the representation, but furthermore, may be last in a sequence, so that the notion of whether they are trailing, and so their separator may not be present, is a relevant issue.
If an element is an array, and has some required instances, then it is not potentially trailing, as some instances will have to appear, with separators.
This concept applies only to elements and model groups that have representation in the data stream.
Previously there was a misguided notion that since only DFDL elements can have minOccurs/maxOccurs that this notion of potentially trailing didn't apply to model groups. (Sequences and Choices, the other kind of Term). But this is not the case.
A sequence/choice which has no framing, and whose content doesn't exist - no child elements, any contained model groups recursively with no framing and no content - such a model group effectively "dissapears" from the data stream, and in some cases need not have a separator.
This is computed by way of couldBePotentiallyTrailing. This value means that the term, in isolation, looking only at its own characteristics, disregarding its following siblings in any given sequence, has the characteristics of being potentially trailing.
Then that is combined with information about following siblings in a sequence to determine if a given term, that is a child of a sequence, is in fact potentially trailing within that sequence.
These two concepts are mutually recursive, since a sequence that is entirely composed of potentially trailing children satisfies couldBePotentialyTrailing in whatever sequence encloses it.
Overridden as false for elements with dfdl:inputValueCalc property.
Overridden as false for elements with dfdl:inputValueCalc property.
A scalar means has no dimension.
A scalar means has no dimension. Exactly one occurrence.
Since terms include both model groups and elements, in DFDL v1.0, model groups are always scalar, as DFDL v1.0 doesn't allow min/max occurs on model groups.
True if it is sensible to scan this data e.g., with a regular expression.
True if it is sensible to scan this data e.g., with a regular expression. Requires that all children have same encoding as enclosing groups and elements, requires that there is no leading or trailing alignment regions, skips. We have to be able to determine that we are for sure going to always be properly aligned for text.
Caveat: we only care that the encoding is the same if the term actually could have text (couldHaveText is an LV) as part of its representation. For example, a sequence with no initiator, terminator, nor separators can have any encoding at all, without disqualifying an element containing it from being scannable. There has to be text that would be part of the scan.
If the root element isScannable, and encodingErrorPolicy is 'replace', then we can use a lower-overhead I/O layer - basically we can use a java.io.InputStreamReader directly.
We are going to depend on the fact that if the encoding is going to be this X-DFDL-US-ASCII-7-BIT-PACKED thingy (7-bits wide code units, so aligned at 1 bit) that this encoding must be specified statically in the schema.
If an encoding is determined at runtime, then we will insist on it being 8-bit aligned code units.
True when a term's immediately enclosing model group is a Sequence.
True when a term's immediately enclosing model group is a Sequence.
Can have a varying number of occurrences.
Can have a varying number of occurrences.
Overridden for elements. See ParticleMixin.isVariableOccurrences
When the encoding is known, this tells us the mandatory alignment required.
When the encoding is known, this tells us the mandatory alignment required. This is always 1 or 8.
Annotations can contain expressions, so we need to be able to compile them.
Annotations can contain expressions, so we need to be able to compile them.
We need our own instance so that the expression compiler has this schema component as its context.
Does lookup of property using DFDL scoping rules, checking first non-default properties, then default property locations.
Does lookup of property using DFDL scoping rules, checking first non-default properties, then default property locations.
Use when we might or might not need the outputNewLine property
Use when we might or might not need the outputNewLine property
Mandatory text alignment or mta
Mandatory text alignment or mta
mta can only apply to things with encodings. No encoding, no MTA.
In addition, it has to be textual data. Just because there's an encoding in the property environment shouldn't get you an MTA region. It has to be textual.
Namespace scope for resolving QNames.
Namespace scope for resolving QNames.
We insist that the prefix "xsi" is properly defined for use in xsi:nil attributes, which is how we represent nilled elements when we convert to XML.
nearestEnclosingSequence
nearestEnclosingSequence
An attribute that looks upward to the surrounding context of the schema, and not just lexically surrounding context. It needs to see what declarations will physically surround the place. This is the dynamic scope, not just the lexical scope. So, a named global type still has to be able to ask what sequence is surrounding the element that references the global type.
This is why we have to have the GlobalXYZDefFactory stuff. Because this kind of back pointer (contextual sensitivity) prevents sharing.
Used as factory for the XML Node with the right namespace and prefix etc.
Used as factory for the XML Node with the right namespace and prefix etc.
Given "element" it creates <dfdl:element /> with the namespace definitions based on this schema component's corresponding XSD construct.
Makes sure to inherit the scope so we have all the namespace bindings.
The lexically enclosing schema component
The lexically enclosing schema component
Combine our statements with those of what we reference.
Combine our statements with those of what we reference. Elements reference types ElementRefs reference elements, etc.
The order here is important. The statements from type come first, then from declaration, then from reference.
Changed to use findProperty, and to resolve the namespace properly.
Changed to use findProperty, and to resolve the namespace properly.
We lookup a property like escapeSchemeRef, and that actual property binding can be local, in scope, by way of a format reference, etc.
It's value is a QName, and the definition of the prefix is from the location where we found the property, and NOT where we consume the property.
Hence, we resolve w.r.t. the location that provided the property.
The point of findProperty vs. getProperty is just that the former returns both the value, and the object that contained it. That object is what we resolve QNames with respect to.
Note: Same is needed for properties that have expressions as their values. E.g., consider "{ ../foo:bar/.. }". That foo prefix must be resolved relative to the object where this property was written, not where it is evaluated. (JIRA issue DFDL-77)
The PartialNextElementResolver is used to determine what infoset event comes next, and "resolves" which is to say determines the ElementRuntimeData for that infoset event.
The PartialNextElementResolver is used to determine what infoset event comes next, and "resolves" which is to say determines the ElementRuntimeData for that infoset event. This can be used to construct the initial infoset from a stream of XML events.
path is used in diagnostic messages and code debug messages; hence, it is very important that it be very dependable.
path is used in diagnostic messages and code debug messages; hence, it is very important that it be very dependable.
One-based position in the nearest enclosing sequence.
One-based position in the nearest enclosing sequence. Follows backpointers from group defs to group refs until it finds a sequence.
Returns tuple, where the first is children that could be last, and the second is a boolean if all children could be optional, and thus this could be last
Returns tuple, where the first is children that could be last, and the second is a boolean if all children could be optional, and thus this could be last
Returns a tuple, where the first item in the tuple is the list of sibling terms that could appear before this.
Returns a tuple, where the first item in the tuple is the list of sibling terms that could appear before this. The second item in the tuple is a One(enclosingParent) if all prior siblings are optional or this element has no prior siblings
Use when production has no guard, but you want to name the production anyway (for debug visibility perhaps).
Use when production has no guard, but you want to name the production anyway (for debug visibility perhaps).
Use when production has a guard predicate
Use when production has a guard predicate
Convenience method to make gathering up all elements referenced in expressions easier.
Convenience method to make gathering up all elements referenced in expressions easier.
For property combining only.
For property combining only. E.g., doesn't refer from an element to its complex type because we don't combine properties with that in DFDL v1.0. (I consider that a language design bug in DFDL v1.0, but that is the way it's defined.)
Elements only e.g., /foo/ex:bar
Elements only e.g., /foo/ex:bar
Roll up from the bottom.
Roll up from the bottom. This is abstract interpretation. The top (aka conflicting encodings) is "mixed" The bottom is "noText" (combines with anything) The values are encoding names, or "runtime" for expressions.
By doing expression analysis we could do a better job here and determine when things that use expressions to get the encoding are all going to get the same expression value. For now, if it is an expression then we lose.
This is the root, or basic target namespace.
This is the root, or basic target namespace. Every schema component gets its target namespace from its xmlSchemaDocument.
Abbreviation analogous to trd, tci is the compile-time counterpart.
Abbreviation analogous to trd, tci is the compile-time counterpart.
The termChildren are the children that are Terms, i.e., derived from the Term base class.
The termChildren are the children that are Terms, i.e., derived from the Term base class. This is to make it clear we're not talking about the XML structures inside the XML parent (which might include annotations, etc.
For elements this is Nil for simple types, a single model group for complex types. For model groups there can be more children.
Used in diagnostic messages; hence, valueOrElse to avoid problems when this can't get a value due to an error.
Used in diagnostic messages; hence, valueOrElse to avoid problems when this can't get a value due to an error.
Any element referenced from an expression in the scope of this term is in this set.
Any element referenced from an expression in the scope of this term is in this set.
Any element referenced from an expression in the scope of this term is in this set.
Any element referenced from an expression in the scope of this term is in this set.
Represents a local sequence definition.