Class: ToknInternal::RangePartition
- Inherits:
-
Object
- Object
- ToknInternal::RangePartition
- Defined in:
- lib/tokn/range_partition.rb
Overview
A data structure that transforms a set of CodeSets to a disjoint set of them, such that no two range sets overlap.
This is improve the efficiency of the NFA => DFA algorithm, which involves gathering information about what states are reachable on certain characters. We can’t afford to treat each character as a singleton, since the ranges can be quite large. Hence, we want to treat ranges of characters as single entities; this will only work if no two such ranges overlap.
It works by starting with a tree whose node is labelled with the maximal superset of character values. Then, for each edge in the NFA, performs a DFS on this tree, splitting any node that only partially intersects any one set that appears in the edge label. The running time is O(n log k), where n is the size of the NFA, and k is the height of the resulting tree.
We encourage k to be small by sorting the NFA edges by their label complexity.
Instance Method Summary collapse
- #addSet(s) ⇒ Object
-
#apply(s) ⇒ Object
Apply the partition to a CodeSet.
-
#generatePDF(test_dir = nil, name = "partition") ⇒ Object
Generate a .dot file, and from that, a PDF, for debug purposes.
-
#initialize ⇒ RangePartition
constructor
include Tokn.
- #prepare ⇒ Object
Constructor Details
#initialize ⇒ RangePartition
include Tokn
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
# File 'lib/tokn/range_partition.rb', line 29 def initialize() # We will build a tree, where each node has a CodeSet # associated with it, and the child nodes (if present) # partition this CodeSet into smaller, nonempty sets. # A tree is represented by a node, where each node is a pair [x,y], # with x the node's CodeSet, and y a list of the node's children. @nextNodeId = 0 # Make the root node hold the largest possible CodeSet. # We want to be able to include all the token ids as well. @rootNode = buildNode(CodeSet.new(CODEMIN,CODEMAX)) @setsToAdd = Set.new # Add epsilon immediately, so it's always in its own subset addSet(CodeSet.new(EPSILON)) @prepared = false end |
Instance Method Details
#addSet(s) ⇒ Object
52 53 54 55 56 57 |
# File 'lib/tokn/range_partition.rb', line 52 def addSet(s) if @prepared raise IllegalStateException end @setsToAdd.add(s) end |
#apply(s) ⇒ Object
Apply the partition to a CodeSet
> s CodeSet < array of subsets from the partition whose union equals s
(this array will be the single element s if no partitioning was necessary)
117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
# File 'lib/tokn/range_partition.rb', line 117 def apply(s) if !@prepared raise IllegalStateException end list = [] s2 = s.makeCopy applyAux(@rootNode, s2, list) # Sort the list of subsets by their first elements list.sort! { |x,y| x.array[0] <=> y.array[0] } list end |
#generatePDF(test_dir = nil, name = "partition") ⇒ Object
Generate a .dot file, and from that, a PDF, for debug purposes
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
# File 'lib/tokn/range_partition.rb', line 82 def generatePDF(test_dir = nil, name = "partition") if !@prepared raise IllegalStateException end g = "" g += "digraph "+name+" {\n\n" nodes = [] buildNodeList(nodes) nodes.each do |node| g += " '" + d(node) + "' [shape=rect] [label='" + node.set.to_s_alt + "']\n" end g += "\n" nodes.each do |node| node.children.each do |ch| g += " '" + d(node) + "' -> '" + d(ch) + "'\n" end end g += "\n}\n" g.gsub!( /'/, '"' ) dotToPDF(g,name, test_dir) end |
#prepare ⇒ Object
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
# File 'lib/tokn/range_partition.rb', line 59 def prepare() if @prepared raise IllegalStateException end # Construct partition from previously added sets list = @setsToAdd.to_a # Sort set by cardinality: probably get a more balanced tree # if larger sets are processed first list.sort!{ |x,y| y.cardinality <=> x.cardinality } list.each do |s| addSetAux(s) end @prepared = true end |