15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
|
# File 'lib/pacer/transform/reduce.rb', line 15
def help(section = nil)
case section
when nil
puts "\n"
when :example
puts "This example usage is from pacer-xml plugin v0.2. I transform a raw\nstream of lines from a 79MB file that contains > 4000 concatinated xml\ndocuments averaging 600 lines each. to a stream of imported nodes:\n\nFirst, a little setup: create a graph, open the file and make a route of\nits lines\n\n graph = Pacer.tg\n f = File.open '/tmp/ipgb20120103.xml'\n lines = f.each_line.to_route(element_type: :string).route\n\nCreate a simple reducer that delimits sections when it hits a DTD tag\nand when it gets to the end of the file (that's the s.nil?). and reduces\nthe stream by pushing each section's lines into an array. When a section\nis entered, the initial value is provided by the return value of the\nenter block.\n\n reducer = lines.reducer(element_type: :array).route\n reducer.enter { |s| [] if s =~ /<\\?xml/ }\n reducer.reduce { |s, lines| lines << s }\n reducer.leave { |s, lines| s.nil? or s =~ /<\\?xml/ }\n\nNow we're back in the territory of fairly vanilla routes. We join each\nsection, use the pacer-xml gem's StringRoute#xml method to parse the XML\nwith Nokogiri and then its XmlRoute#import method to turn those XML\nnodes into graph elements.\n\n vertex = reducer.map(element_type: :string, &:join).xml.limit(1).import(graph).first\n\n graph #=> #<PacerGraph tinkergraph[vertices:88 edges:90]\n vertex #=> #<V[0] us-patent-grant>\n\nWe can see that we've now got a graph with 88 vertices and 90 edges.\n\n"
end
end
|