15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
|
# File 'lib/pacer/transform/reduce.rb', line 15
def help(section = nil)
case section
when nil
puts <<HELP
HELP
when :example
puts <<HELP
This example usage is from pacer-xml plugin v0.2. I transform a raw
stream of lines from a 79MB file that contains > 4000 concatinated xml
documents averaging 600 lines each. to a stream of imported nodes:
First, a little setup: create a graph, open the file and make a route of
its lines
graph = Pacer.tg
f = File.open '/tmp/ipgb20120103.xml'
lines = f.each_line.to_route(element_type: :string).route
Create a simple reducer that delimits sections when it hits a DTD tag
and when it gets to the end of the file (that's the s.nil?). and reduces
the stream by pushing each section's lines into an array. When a section
is entered, the initial value is provided by the return value of the
enter block.
reducer = lines.reducer(element_type: :array).route
reducer.enter { |s| [] if s =~ /<\?xml/ }
reducer.reduce { |s, lines| lines << s }
reducer.leave { |s, lines| s.nil? or s =~ /<\?xml/ }
Now we're back in the territory of fairly vanilla routes. We join each
section, use the pacer-xml gem's StringRoute#xml method to parse the XML
with Nokogiri and then its XmlRoute#import method to turn those XML
nodes into graph elements.
vertex = reducer.map(element_type: :string, &:join).xml.limit(1).import(graph).first
graph #=> #<PacerGraph tinkergraph[vertices:88 edges:90]
vertex #=> #<V[0] us-patent-grant>
We can see that we've now got a graph with 88 vertices and 90 edges.
HELP
end
end
|