Class DFAOptimizer


  • public class DFAOptimizer
    extends java.lang.Object
    A module to perform optimizations on DFAs. I could more easily (and more quickly) do some optimizations (such as PRUNE_EBNF_EXIT_BRANCHES) during DFA construction, but then it messes up the determinism checking. For example, it looks like loop exit branches are unreachable if you prune exit branches during DFA construction and before determinism checks. In general, ANTLR's NFA→DFA→codegen pipeline seems very robust to me which I attribute to a uniform and consistent set of data structures. Regardless of what I want to "say"/implement, I do so within the confines of, for example, a DFA. The code generator can then just generate code--it doesn't have to do much thinking. Putting optimizations in the code gen code really starts to make it a spagetti factory (uh oh, now I'm hungry!). The pipeline is very testable; each stage has well defined input/output pairs. ### Optimization: PRUNE_EBNF_EXIT_BRANCHES There is no need to test EBNF block exit branches. Not only is it an unneeded computation, but counter-intuitively, you actually get better errors. You can report an error at the missing or extra token rather than as soon as you've figured out you will fail. Imagine optional block "( DOT CLASS )? SEMI". ANTLR generates: int alt=0; if ( input.LA(1)==DOT ) { alt=1; } else if ( input.LA(1)==SEMI ) { alt=2; } Clearly, since Parser.match() will ultimately find the error, we do not want to report an error nor do we want to bother testing lookahead against what follows the (...)? We want to generate simply "should I enter the subrule?": int alt=2; if ( input.LA(1)==DOT ) { alt=1; } NOTE 1. Greedy loops cannot be optimized in this way. For example, "(greedy=false:'x'|.)* '\n'". You specifically need the exit branch to tell you when to terminate the loop as the same input actually predicts one of the alts (i.e., staying in the loop). NOTE 2. I do not optimize cyclic DFAs at the moment as it doesn't seem to work. ;) I'll have to investigate later to see what work I can do on cyclic DFAs to make them have fewer edges. Might have something to do with the EOT token. ### PRUNE_SUPERFLUOUS_EOT_EDGES When a token is a subset of another such as the following rules, ANTLR quietly assumes the first token to resolve the ambiguity. EQ : '=' ; ASSIGNOP : '=' | '+=' ; It can yield states that have only a single edge on EOT to an accept state. This is a waste and messes up my code generation. ;) If Tokens rule DFA goes s0 -'='-> s3 -EOT-> s5 (accept) then s5 should be pruned and s3 should be made an accept. Do NOT do this for keyword versus ID as the state with EOT edge emanating from it will also have another edge. ### Optimization: COLLAPSE_ALL_INCIDENT_EDGES Done during DFA construction. See method addTransition() in NFAToDFAConverter. ### Optimization: MERGE_STOP_STATES Done during DFA construction. See addDFAState() in NFAToDFAConverter.
    • Field Detail

      • PRUNE_EBNF_EXIT_BRANCHES

        public static boolean PRUNE_EBNF_EXIT_BRANCHES
      • PRUNE_TOKENS_RULE_SUPERFLUOUS_EOT_EDGES

        public static boolean PRUNE_TOKENS_RULE_SUPERFLUOUS_EOT_EDGES
      • COLLAPSE_ALL_PARALLEL_EDGES

        public static boolean COLLAPSE_ALL_PARALLEL_EDGES
      • MERGE_STOP_STATES

        public static boolean MERGE_STOP_STATES
      • visited

        protected java.util.Set<java.lang.Integer> visited
        Used by DFA state machine generator to avoid infinite recursion resulting from cycles int the DFA. This is a set of int state #s. This is a side-effect of calling optimize; can't clear after use because code gen needs it.
      • grammar

        protected Grammar grammar
    • Constructor Detail

      • DFAOptimizer

        public DFAOptimizer​(Grammar grammar)
    • Method Detail

      • optimize

        public void optimize()
      • optimize

        protected void optimize​(DFA dfa)
      • optimizeExitBranches

        protected void optimizeExitBranches​(DFAState d)
      • optimizeEOTBranches

        protected void optimizeEOTBranches​(DFAState d)