Package org.antlr.tool
Class NFAFactory
- java.lang.Object
-
- org.antlr.tool.NFAFactory
-
public class NFAFactory extends java.lang.Object
Routines to construct StateClusters from EBNF grammar constructs. No optimization is done to remove unnecessary epsilon edges. TODO: add an optimization that reduces number of states and transitions will help with speed of conversion and make it easier to view NFA. For example, o-A->o-->o-B->o should be o-A->o-B->o
-
-
Constructor Summary
Constructors Constructor Description NFAFactory(NFA nfa)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description StateCluster
build_AB(StateCluster A, StateCluster B)
From A B build A-e->B (that is, build an epsilon arc from right of A to left of B).StateCluster
build_Action(GrammarAST action)
Build what amounts to an epsilon transition with an action.StateCluster
build_AlternativeBlock(java.util.List<StateCluster> alternativeStateClusters)
From A|B|..|Z alternative block build o->o-A->o->o (last NFAState is blockEndNFAState pointed to by all alts) | ^ o->o-B->o--| | | ...StateCluster
build_AlternativeBlockFromSet(StateCluster set)
From a set ('a'|'b') build o->o-'a'..'b'->o->o (last NFAState is blockEndNFAState pointed to by all alts)StateCluster
build_Aoptional(StateCluster A)
From (A)? build either: o--A->o | ^ o---->| or, if A is a block, just add an empty alt to the end of the blockStateCluster
build_Aplus(StateCluster A)
From (A)+ build |---| (Transition 2 from A.right points at alt 1) v | (follow of loop is Transition 1) o->o-A-o->o Meaning that the last NFAState in A points back to A's left Transition NFAState and we add a new begin/end NFAState.StateCluster
build_Astar(StateCluster A)
From (A)* build |---| v | o->o-A-o--o (Transition 2 from block end points at alt 1; follow is Transition 1) | ^ o---------| (optional branch is 2nd alt of optional block containing A+) Meaning that the last (end) NFAState in A points back to A's left side NFAState and we add 3 new NFAStates (the optional branch is built just like an optional subrule).StateCluster
build_Atom(int label, GrammarAST associatedAST)
From label A build Graph o-A->oStateCluster
build_Atom(GrammarAST atomAST)
StateCluster
build_CharLiteralAtom(GrammarAST charLiteralAST)
From char 'c' build StateCluster o-intValue(c)->oStateCluster
build_CharRange(java.lang.String a, java.lang.String b)
From char 'c' build StateCluster o-intValue(c)->o can include unicode spec likes '$' later.int
build_EOFStates(java.util.Collection<Rule> rules)
add an EOF transition to any rule end NFAState that points to nothing (i.e., for all those rules not invoked by another rule).StateCluster
build_Epsilon()
From an empty alternative build StateCluster o-e->oStateCluster
build_Range(int a, int b)
Can only complement block of simple alts; can complement build_Set() result, that is.StateCluster
build_RuleRef(Rule refDef, NFAState ruleStart)
For reference to rule r, build o-e->(r) o where (r) is the start of rule r and the trailing o is not linked to from rule ref state directly (it's done thru the transition(0) RuleClosureTransition.StateCluster
build_SemanticPredicate(GrammarAST pred)
Build what amounts to an epsilon transition with a semantic predicate action.StateCluster
build_Set(IntSet set, GrammarAST associatedAST)
From set build single edge graph o->o-set->o.StateCluster
build_StringLiteralAtom(GrammarAST stringLiteralAST)
For a non-lexer, just build a simple token reference atom.StateCluster
build_Wildcard(GrammarAST associatedAST)
Build an atom with all possible values in its labelStateCluster
build_WildcardTree(GrammarAST associatedAST)
Build a subrule matching ^(.protected IntSet
getCollapsedBlockAsSet(State blk)
Given a collapsed block of alts (a set of atoms), pull out the set and return it.Rule
getCurrentRule()
NFAState
newState()
void
optimizeAlternative(StateCluster alt)
Optimize an alternative (list of grammar elements).void
setCurrentRule(Rule currentRule)
-
-
-
Constructor Detail
-
NFAFactory
public NFAFactory(NFA nfa)
-
-
Method Detail
-
getCurrentRule
public Rule getCurrentRule()
-
setCurrentRule
public void setCurrentRule(Rule currentRule)
-
newState
public NFAState newState()
-
optimizeAlternative
public void optimizeAlternative(StateCluster alt)
Optimize an alternative (list of grammar elements). Walk the chain of elements (which can be complicated loop blocks...) and throw away any epsilon transitions used to link up simple elements. This only removes 195 states from the java.g's NFA, but every little bit helps. Perhaps I can improve in the future.
-
build_Atom
public StateCluster build_Atom(int label, GrammarAST associatedAST)
From label A build Graph o-A->o
-
build_Atom
public StateCluster build_Atom(GrammarAST atomAST)
-
build_Set
public StateCluster build_Set(IntSet set, GrammarAST associatedAST)
From set build single edge graph o->o-set->o. To conform to what an alt block looks like, must have extra state on left.
-
build_Range
public StateCluster build_Range(int a, int b)
Can only complement block of simple alts; can complement build_Set() result, that is. Get set and complement, replace old with complement. public StateCluster build_AlternativeBlockComplement(StateCluster blk) { State s0 = blk.left; IntSet set = getCollapsedBlockAsSet(s0); if ( set!=null ) { // if set is available, then structure known and blk is a set set = nfa.grammar.complement(set); Label label = s0.transition(0).target.transition(0).label; label.setSet(set); } return blk; }
-
build_CharLiteralAtom
public StateCluster build_CharLiteralAtom(GrammarAST charLiteralAST)
From char 'c' build StateCluster o-intValue(c)->o
-
build_CharRange
public StateCluster build_CharRange(java.lang.String a, java.lang.String b)
From char 'c' build StateCluster o-intValue(c)->o can include unicode spec likes '$' later. Accepts actual unicode 16-bit now, of course, by default. TODO not supplemental char clean!
-
build_StringLiteralAtom
public StateCluster build_StringLiteralAtom(GrammarAST stringLiteralAST)
For a non-lexer, just build a simple token reference atom. For a lexer, a string is a sequence of char to match. That is, "fog" is treated as 'f' 'o' 'g' not as a single transition in the DFA. Machine== o-'f'->o-'o'->o-'g'->o and has n+1 states for n characters.
-
build_RuleRef
public StateCluster build_RuleRef(Rule refDef, NFAState ruleStart)
For reference to rule r, build o-e->(r) o where (r) is the start of rule r and the trailing o is not linked to from rule ref state directly (it's done thru the transition(0) RuleClosureTransition. If the rule r is just a list of tokens, it's block will be just a set on an edge o->o->o-set->o->o->o, could inline it rather than doing the rule reference, but i'm not doing this yet as I'm not sure it would help much in the NFA→DFA construction. TODO add to codegen: collapse alt blks that are sets into single matchSet
-
build_Epsilon
public StateCluster build_Epsilon()
From an empty alternative build StateCluster o-e->o
-
build_SemanticPredicate
public StateCluster build_SemanticPredicate(GrammarAST pred)
Build what amounts to an epsilon transition with a semantic predicate action. The pred is a pointer into the AST of the SEMPRED token.
-
build_Action
public StateCluster build_Action(GrammarAST action)
Build what amounts to an epsilon transition with an action. The action goes into NFA though it is ignored during analysis. It slows things down a bit, but I must ignore predicates after having seen an action (5-5-2008).
-
build_EOFStates
public int build_EOFStates(java.util.Collection<Rule> rules)
add an EOF transition to any rule end NFAState that points to nothing (i.e., for all those rules not invoked by another rule). These are start symbols then. Return the number of grammar entry points; i.e., how many rules are not invoked by another rule (they can only be invoked from outside). These are the start rules.
-
build_AB
public StateCluster build_AB(StateCluster A, StateCluster B)
From A B build A-e->B (that is, build an epsilon arc from right of A to left of B). As a convenience, return B if A is null or return A if B is null.
-
build_AlternativeBlockFromSet
public StateCluster build_AlternativeBlockFromSet(StateCluster set)
From a set ('a'|'b') build o->o-'a'..'b'->o->o (last NFAState is blockEndNFAState pointed to by all alts)
-
build_AlternativeBlock
public StateCluster build_AlternativeBlock(java.util.List<StateCluster> alternativeStateClusters)
From A|B|..|Z alternative block build o->o-A->o->o (last NFAState is blockEndNFAState pointed to by all alts) | ^ o->o-B->o--| | | ... | | | o->o-Z->o--| So every alternative gets begin NFAState connected by epsilon and every alt right side points at a block end NFAState. There is a new NFAState in the NFAState in the StateCluster for each alt plus one for the end NFAState. Special case: only one alternative: don't make a block with alt begin/end. Special case: if just a list of tokens/chars/sets, then collapse to a single edge'd o-set->o graph. Set alt number (1..n) in the left-Transition NFAState.
-
build_Aoptional
public StateCluster build_Aoptional(StateCluster A)
From (A)? build either: o--A->o | ^ o---->| or, if A is a block, just add an empty alt to the end of the block
-
build_Aplus
public StateCluster build_Aplus(StateCluster A)
From (A)+ build |---| (Transition 2 from A.right points at alt 1) v | (follow of loop is Transition 1) o->o-A-o->o Meaning that the last NFAState in A points back to A's left Transition NFAState and we add a new begin/end NFAState. A can be single alternative or multiple. During analysis we'll call the follow link (transition 1) alt n+1 for an n-alt A block.
-
build_Astar
public StateCluster build_Astar(StateCluster A)
From (A)* build |---| v | o->o-A-o--o (Transition 2 from block end points at alt 1; follow is Transition 1) | ^ o---------| (optional branch is 2nd alt of optional block containing A+) Meaning that the last (end) NFAState in A points back to A's left side NFAState and we add 3 new NFAStates (the optional branch is built just like an optional subrule). See the Aplus() method for more on the loop back Transition. The new node on right edge is set to RIGHT_EDGE_OF_CLOSURE so we can detect nested (A*)* loops and insert an extra node. Previously, two blocks shared same EOB node. There are 2 or 3 decision points in a A*. If A is not a block (i.e., it only has one alt), then there are two decisions: the optional bypass and then loopback. If A is a block of alts, then there are three decisions: bypass, loopback, and A's decision point. Note that the optional bypass must be outside the loop as (A|B)* is not the same thing as (A|B|)+. This is an accurate NFA representation of the meaning of (A)*, but for generating code, I don't need a DFA for the optional branch by virtue of how I generate code. The exit-loopback-branch decision is sufficient to let me make an appropriate enter, exit, loop determination. See codegen.g
-
build_Wildcard
public StateCluster build_Wildcard(GrammarAST associatedAST)
Build an atom with all possible values in its label
-
build_WildcardTree
public StateCluster build_WildcardTree(GrammarAST associatedAST)
Build a subrule matching ^(. .*) (any tree or node). Let's use (^(. .+) | .) to be safe.
-
-