Class: ToknInternal::RangePartition

Inherits:
Object
  • Object
show all
Defined in:
lib/tokn/range_partition.rb

Overview

A data structure that transforms a set of CodeSets to a disjoint set of them, such that no two range sets overlap.

This is improve the efficiency of the NFA => DFA algorithm, which involves gathering information about what states are reachable on certain characters. We can’t afford to treat each character as a singleton, since the ranges can be quite large. Hence, we want to treat ranges of characters as single entities; this will only work if no two such ranges overlap.

It works by starting with a tree whose node is labelled with the maximal superset of character values. Then, for each edge in the NFA, performs a DFS on this tree, splitting any node that only partially intersects any one set that appears in the edge label. The running time is O(n log k), where n is the size of the NFA, and k is the height of the resulting tree.

We encourage k to be small by sorting the NFA edges by their label complexity.

Instance Method Summary collapse

Constructor Details

#initializeRangePartition

include Tokn



29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# File 'lib/tokn/range_partition.rb', line 29

def initialize()
  # We will build a tree, where each node has a CodeSet
  # associated with it, and the child nodes (if present)
  # partition this CodeSet into smaller, nonempty sets.
  
  # A tree is represented by a node, where each node is a pair [x,y],
  # with x the node's CodeSet, and y a list of the node's children.
  
  @nextNodeId = 0
  
  # Make the root node hold the largest possible CodeSet.
  # We want to be able to include all the token ids as well.
  
  @rootNode = buildNode(CodeSet.new(CODEMIN,CODEMAX))
  
  @setsToAdd = Set.new
  
  # Add epsilon immediately, so it's always in its own subset
  addSet(CodeSet.new(EPSILON))
  
  @prepared = false
end

Instance Method Details

#addSet(s) ⇒ Object



52
53
54
55
56
57
# File 'lib/tokn/range_partition.rb', line 52

def addSet(s)
  if @prepared
    raise IllegalStateException
  end
  @setsToAdd.add(s)
end

#apply(s) ⇒ Object

Apply the partition to a CodeSet

> s CodeSet < array of subsets from the partition whose union equals s

(this array will be the single element s if no partitioning was necessary)


117
118
119
120
121
122
123
124
125
126
127
128
129
130
# File 'lib/tokn/range_partition.rb', line 117

def apply(s)
  if !@prepared
    raise IllegalStateException
  end
  
  list = []
  s2 = s.makeCopy
  applyAux(@rootNode, s2, list)
  
  # Sort the list of subsets by their first elements
  list.sort! { |x,y| x.array[0] <=> y.array[0] }
  
  list
end

#generatePDF(test_dir = nil, name = "partition") ⇒ Object

Generate a .dot file, and from that, a PDF, for debug purposes



82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# File 'lib/tokn/range_partition.rb', line 82

def generatePDF(test_dir = nil, name = "partition")
  if !@prepared
    raise IllegalStateException
  end
  
  g = ""
  g += "digraph "+name+" {\n\n"
  
  nodes = []
  buildNodeList(nodes)
  nodes.each do |node|
    g += " '" + d(node) + "' [shape=rect] [label='" + node.set.to_s_alt + "']\n"
  end
  
  g += "\n"
  nodes.each do |node|
    node.children.each do |ch|
      g += " '" + d(node) + "' -> '" + d(ch) + "'\n"
    end
  end
 
  g += "\n}\n"
  g.gsub!( /'/, '"' )
  
  dotToPDF(g,name, test_dir)
  
end

#prepareObject



59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# File 'lib/tokn/range_partition.rb', line 59

def prepare()
  if @prepared
    raise IllegalStateException
  end
  
  # Construct partition from previously added sets
  
  list = @setsToAdd.to_a
  
  # Sort set by cardinality: probably get a more balanced tree
  # if larger sets are processed first
  list.sort!{ |x,y| y.cardinality <=> x.cardinality }
  
  list.each do |s|
    addSetAux(s)
  end
  
  @prepared = true
end