Parent class of all DFA text parsers.
A state is an ordered array of rules.
StartState for Field portion of EscapeBlock.
StartState for Field portion of EscapeBlock. Here compiledDelims should only contain the endBlock DFADelimiter.
Ambiguity Truth Table 0 means not same, 1 means same ------------------------------------ EC EEC PTERM Ambiguous? 0 1 1 Y 1 0 1 Y 1 1 0 Y 1 1 1 Y
This base class handles the connections from state to state (which form a directed graph) through use of call-by-name here and lazy vals within.
This base class handles the connections from state to state (which form a directed graph) through use of call-by-name here and lazy vals within. This allows us to define the state-to-state connections functionally.
This object itself will exist and be a member of the states ArrayBuffer as well as the other states before any access to the lazy val members.
Assumes that the delims DFAs were constructed with the Esc and EscEsc in mind.
Assumes that endBlock DFA was constructed with the EscEsc in mind.
When 'escapeCharacter': On unparsing a single character of the data is escaped by adding an dfdl:escapeCharacter before it.
When 'escapeCharacter': On unparsing a single character of the data is escaped by adding an dfdl:escapeCharacter before it. The following are escaped if they are in the data:
- Any in-scope terminating delimiter by escaping its first character. - dfdl:escapeCharacter (escaped by dfdl:escapeEscapeCharacter) - Any dfdl:extraEscapedCharacters
When 'escapeBlock': On unparsing the entire data are escaped by adding dfdl:escapeBlockStart to the beginning and dfdl:escapeBlockEnd to the end of the data. The data is either always escaped or escaped when needed as specified by dfdl:generateEscapeBlock. If the data is escaped and contains the dfdl:escapeBlockEnd then first character of each appearance of the dfdl:escapeBlockEnd is escaped by the dfdl:escapeEscapeCharacter.
Parent class of all DFA text parsers.
(12:12:21 PM) Mike Beckerle: I think I understand this.
(12:12:21 PM) Mike Beckerle: I think I understand this. Let me explain why the 'backtrack" is ok. (12:12:38 PM) Mike Beckerle: We have this DFA, but in our pictures there's this PTERM state. (12:12:56 PM) Mike Beckerle: That PTERM state is a "macro" for a much more complicated DFA. (12:13:54 PM) Mike Beckerle: The upshot is this "macro" accepts - having consumed a bunch of input, or it fails, and we are supposed to be back at the start of the delimiter. (12:15:06 PM) Mike Beckerle: That "macro" can be expressed by exploding it into a big DFA, or by some claver logic that orchestrates part-specific matchers. (12:16:46 PM) Mike Beckerle: The fastest thing would be to explode it into a big DFA, but pragmatically this might not be noticably faster than the other way. (12:17:38 PM) Taylor: Why I was wondering if we should have two DFA types. Field and delimiter. Where the 'Field' can be 'paused' while we check the delimiters. (12:17:47 PM) Taylor: If the delimiter match succeeds, we're done. (12:18:06 PM) Taylor: if it fails, resume the 'field' DFA with the r.data0 as a char. (12:18:37 PM) Taylor: all the field is doing is handling escape schemes if any (12:18:39 PM) Mike Beckerle: That is, in principle, what the transition to the "macro" PTERM state is. It's suspending current state. So yeah, I'm good with that. (12:18:46 PM) Taylor: ok cool :D (12:19:13 PM) Mike Beckerle: Cut paste this dialog into a comment in the code somewhere.
Some constants
Convenient thingy for hand-created rules
Convenient thingy for hand-created rules
Can write things like Rule { r.data0 == EC } { r.resultString.append(...)...}
Examples in other file.
TODO: Get rid of all the anonymous Rule objects. No reason these can't be ordinary classes with test and act methods.
Reflects the status of the DFA's State.
Reflects the status of the DFA's State.
EndOfData - Reached end of data character Failed - All rules within a DFA failed at a state Succeeded - There was some combination of rules in the states that succeeded and matched. Paused - We encountered something that could be a delimiter (only applicable to DFAField). We need to make a determination of what comes next (the longest match of a whole delimiter?). If it's not a whole delimiter then we will add the character to the field and continue/resume.
A state is an ordered array of rules. Each rule is a guard/test and an action. Only one rule "fires" for each state, and the action modifies the register state and returns the identifying integer for the next state (in r.nextState), if successful. Otherwise, a status is returned indicating whether or not we Succeeded, Failed, reached EndOfData, or need to Pause to gather further information.