Module: Pacer::Transform::Reduce

Defined in:
lib/pacer/transform/reduce.rb

Defined Under Namespace

Classes: ReducerPipe

Instance Attribute Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#enter(&block) ⇒ Object



63
64
65
66
67
68
# File 'lib/pacer/transform/reduce.rb', line 63

def enter(&block)
  if block
    @enter = block
  end
  self
end

#leave(same_as = nil, &block) ⇒ Object



77
78
79
80
81
82
83
84
# File 'lib/pacer/transform/reduce.rb', line 77

def leave(same_as = nil, &block)
  if same_as == :enter
    @leave = @enter
  elsif block
    @leave = block
  end
  self
end

#reduce(&block) ⇒ Object



70
71
72
73
74
75
# File 'lib/pacer/transform/reduce.rb', line 70

def reduce(&block)
  if block
    @reduce = block
  end
  self
end

Instance Method Details

#attach_pipe(end_pipe) ⇒ Object



86
87
88
89
90
91
92
93
94
# File 'lib/pacer/transform/reduce.rb', line 86

def attach_pipe(end_pipe)
  if @enter and @reduce and @leave
    pipe = ReducerPipe.new self, @enter, @reduce, @leave
    pipe.setStarts end_pipe if end_pipe
    pipe
  else
    fail Pacer::ClientError, 'enter, reduce, and leave must all be specified for reducers'
  end
end

#help(section = nil) ⇒ Object

The goal is to break down the xml stream from being a black box iterator to doing the job in a few steps:



15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# File 'lib/pacer/transform/reduce.rb', line 15

def help(section = nil)
  case section
  when nil
    puts <<HELP

HELP
  when :example
    puts <<HELP
This example usage is from pacer-xml plugin v0.2. I transform a raw
stream of lines from a 79MB file that contains > 4000 concatinated xml
documents averaging 600 lines each. to a stream of imported nodes:

First, a little setup: create a graph, open the file and make a route of
its lines

  graph = Pacer.tg
  f = File.open '/tmp/ipgb20120103.xml'
  lines = f.each_line.to_route(element_type: :string).route

Create a simple reducer that delimits sections when it hits a DTD tag
and when it gets to the end of the file (that's the s.nil?). and reduces
the stream by pushing each section's lines into an array. When a section
is entered, the initial value is provided by the return value of the
enter block.

  reducer = lines.reducer(element_type: :array).route
  reducer.enter  { |s|        []     if s =~ /<\?xml/ }
  reducer.reduce { |s, lines| lines << s              }
  reducer.leave  { |s, lines| s.nil? or s =~ /<\?xml/ }

Now we're back in the territory of fairly vanilla routes. We join each
section, use the pacer-xml gem's StringRoute#xml method to parse the XML
with Nokogiri and then its XmlRoute#import method to turn those XML
nodes into graph elements.

  vertex = reducer.map(element_type: :string, &:join).xml.limit(1).import(graph).first

  graph           #=> #<PacerGraph tinkergraph[vertices:88 edges:90]
  vertex          #=> #<V[0] us-patent-grant>

We can see that we've now got a graph with 88 vertices and 90 edges.

HELP
  end
end