no alignment properties that would explicitly create a need to align in a way that is not on a suitable boundary for a character.
Character encoding common attributes
Character encoding common attributes
Note that since encoding can be computed at runtime, we create values to tell us if the encoding is known or not so that we can decide things at compile time when possible.
True if this element itself consists only of text.
True if this element itself consists only of text. No binary stuff like alignment or skips.
Not recursive into contained children.
True if it is sensible to scan this data e.g., with a regular expression.
True if it is sensible to scan this data e.g., with a regular expression. Requires that all children have same encoding as enclosing groups and elements, requires that there is no leading or trailing alignment regions, skips. We have to be able to determine that we are for sure going to always be properly aligned for text.
Caveat: we only care that the encoding is the same if the term actually could have text (couldHaveText is an LV) as part of its representation. For example, a sequence with no initiator, terminator, nor separators can have any encoding at all, without disqualifying an element containing it from being scannable. There has to be text that would be part of the scan.
If the root element isScannable, and encodingErrorPolicy is 'replace', then we can use a lower-overhead I/O layer - basically we can use a java.io.InputStreamReader directly.
We are going to depend on the fact that if the encoding is going to be this X-DFDL-US-ASCII-7-BIT-PACKED thingy (7-bits wide code units, so aligned at 1 bit) that this encoding must be specified statically in the schema.
If an encoding is determined at runtime, then we will insist on it being 8-bit aligned code units.
When the encoding is known, this tells us the mandatory alignment required.
When the encoding is known, this tells us the mandatory alignment required. This is always 1 or 8.
Roll up from the bottom.
Roll up from the bottom. This is abstract interpretation. The top (aka conflicting encodings) is "mixed" The bottom is "noText" (combines with anything) The values are encoding names, or "runtime" for expressions.
By doing expression analysis we could do a better job here and determine when things that use expressions to get the encoding are all going to get the same expression value. For now, if it is an expression then we lose.
Captures concepts around dfdl:encoding property and Terms.
Just factored out into a trait for isolation of related code.