Class: ToknInternal::DFABuilder
- Inherits:
-
Object
- Object
- ToknInternal::DFABuilder
- Defined in:
- lib/tokn/dfa_builder.rb
Overview
Converts NFAs (nondeterministic, finite state automata) to minimal DFAs.
Performs the subset construction algorithm described in (among other placess) en.wikipedia.org/wiki/Powerset_construction
Also implements an innovative algorithm to partition a set of edge labels into a set that has the property that no two elements have overlapping regions. This allows us to perform the subset construction (and closure operations) efficiently while supporting large possible character sets (e.g., unicode, which ranges from 0..0x10ffff. See RangePartition.rb for more details.
Class Method Summary collapse
-
.nfa_to_dfa(startState, db = false) ⇒ Object
Convert an NFA to a DFA.
Instance Method Summary collapse
-
#build(partition = true, normalize = true) ⇒ Object
Perform the build algorithm.
-
#initialize(nfaStartState) ⇒ DFABuilder
constructor
Constructs a builder object.
Constructor Details
#initialize(nfaStartState) ⇒ DFABuilder
Constructs a builder object
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
# File 'lib/tokn/dfa_builder.rb', line 80 def initialize(nfaStartState) @nextId = 0 @nfaStart = nfaStartState # Build a map of nfa state ids => nfa states @nfaStateMap = {} nfas, _, _ = @nfaStart.reachableStates nfas.each {|s| @nfaStateMap[s.id] = s} # Initialize an array of nfa state lists, indexed by dfa state id @nfaStateLists = [] # Map of existing DFA states; key is array of NFA state ids @dfaStateMap = {} end |
Class Method Details
.nfa_to_dfa(startState, db = false) ⇒ Object
Convert an NFA to a DFA.
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
# File 'lib/tokn/dfa_builder.rb', line 28 def self.nfa_to_dfa(startState, db = false) !db || startState.generatePDF("original_nfa") # Reverse this NFA, convert to DFA, then # reverse it, and convert it again. Apparently this # produces a minimal DFA. rev = startState.reverseNFA() !db || rev.generatePDF("reversed_nfa") bld = DFABuilder.new(rev) dfa = bld.build(true, false) # partition, but don't normalize !db || dfa.generatePDF("reversed_dfa") rev2 = dfa.reverseNFA() bld = DFABuilder.new(rev2) # Don't regenerate the partition; it is still valid # for this second build process # dfa = bld.build(false, true) # don't partition, but do normalize # If there are edges that contain more than one token identifier, # remove all but the first (i.e. the one with the highest token id) stSet, _, _ = dfa.reachableStates stSet.each do |s| s.edges.each do |lbl, dest| a = lbl.array if !a.size next end primeId = a[0] next if primeId >= EPSILON-1 lbl.difference!(CodeSet.new(primeId+1, EPSILON)) end end !db || dfa.generatePDF("minimal_dfa") dfa end |
Instance Method Details
#build(partition = true, normalize = true) ⇒ Object
Perform the build algorithm
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
# File 'lib/tokn/dfa_builder.rb', line 101 def build(partition = true, normalize = true) db = false !partition || partitionEdges(@nfaStart) iset = Set.new iset.add(@nfaStart) epsClosure(iset) @dfaStart,_ = createDFAState(stateSetToIdArray(iset)) markedStates = Set.new unmarked = [@dfaStart] until unmarked.empty? dfaState = unmarked.pop nfaIds = @nfaStateLists[dfaState.id] # map of CodeSet => set of NFA states moveMap = {} nfaIds.each do |nfaId| nfaState = @nfaStateMap[nfaId] nfaState.edges.each do |lbl,dest| if lbl.array[0] == EPSILON next end nfaStates = moveMap[lbl] if !nfaStates nfaStates = Set.new moveMap[lbl] = nfaStates end nfaStates.add(dest) end end moveMap.each_pair do |charRange,nfaStates| # May be better to test if already in set before calc closure; or simply has closure epsClosure(nfaStates) dfaDestState, isNew = createDFAState(stateSetToIdArray(nfaStates)) if isNew unmarked.push(dfaDestState) end dfaState.addEdge(charRange, dfaDestState) end end if normalize !db || @dfaStart.generatePDF("prior_normalize") !db || pr("Normalizing states for:\n\n%s\n",State.dumpNFA(@dfaStart)) State.normalizeStates(@dfaStart) !db || pr("After normalizing:\n\n%s\n",State.dumpNFA(@dfaStart)) !db || @dfaStart.generatePDF("post_normalize") end @dfaStart end |