Class: EagleClaw::Scraper

Inherits:
Object
  • Object
show all
Extended by:
Callbacks
Includes:
Browser
Defined in:
lib/eagleclaw.rb

Class Attribute Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Callbacks

register, run_callbacks

Methods included from Browser

#agent, #method_missing

Constructor Details

#initializeScraper

Create a new EagleClaw::Scraper instance.

By default, just sets @data and @problems to empty ‘Array`s.



94
95
96
97
# File 'lib/eagleclaw.rb', line 94

def initialize
  @data = []
  @problems = []
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method in the class EagleClaw::Browser

Class Attribute Details

.propertiesObject

Returns the value of attribute properties.



13
14
15
# File 'lib/eagleclaw.rb', line 13

def properties
  @properties
end

Instance Attribute Details

#dataObject

A ‘Hash` which holds data collected during a run.

See Also:



82
83
84
# File 'lib/eagleclaw.rb', line 82

def data
  @data
end

#problemsObject

An ‘Array` which collects



86
87
88
# File 'lib/eagleclaw.rb', line 86

def problems
  @problems
end

Class Method Details

.after(: each, : method_name) ⇒ nil .after(: all, : method_name) ⇒ nil .after(: each, &block) ⇒ nil .after(: all, &block) ⇒ nil

Define a post-processor to run in a certain context.

Overloads:

  • .after(: each, : method_name) ⇒ nil

    Run the given method after each component of the run.

  • .after(: all, : method_name) ⇒ nil

    Run the given method after the run itself.

  • .after(: each, &block) ⇒ nil

    Run the given block (using ‘instance_eval`) after each component of the run.

  • .after(: all, &block) ⇒ nil

    Run the given block (using ‘instance_eval`) after the entire run.

Parameters:

  • context (Symbol)

    either ‘:each` or `:all`.

  • meth (optional, Symbol) (defaults to: nil)

    name of method to call.

Returns:

  • (nil)

See Also:



65
66
67
# File 'lib/eagleclaw.rb', line 65

def after(context, meth = nil, &block)
  register([:after, context], meth, &block)
end

.before(: each, : method_name) ⇒ nil .before(: all, : method_name) ⇒ nil .before(: each, &block) ⇒ nil .before(: all, &block) ⇒ nil

Define a pre-processor to run in a certain context.

Examples:

Fetch a page before the run

before(:all) do
  agent.get("http://google.com/")
end

Reset the page before each component of the run

before(:each) do
  agent.get("http://google.com/")
end

Overloads:

  • .before(: each, : method_name) ⇒ nil

    Run the given method before each component of the run.

  • .before(: all, : method_name) ⇒ nil

    Run the given method before the run itself.

  • .before(: each, &block) ⇒ nil

    Run the given block (using ‘instance_eval`) before each component of the run.

  • .before(: all, &block) ⇒ nil

    Run the given block (using ‘instance_eval`) before the run itself.

Parameters:

  • context (Symbol)

    either ‘:each` or `:all`.

  • meth (optional, Symbol) (defaults to: nil)

    name of method to call.

Returns:

  • (nil)


42
43
44
# File 'lib/eagleclaw.rb', line 42

def before(context, meth = nil, &block)
  register([:before, context], meth, &block)
end

.prop(prop_name, meth = nil, &block) ⇒ Object



69
70
71
72
# File 'lib/eagleclaw.rb', line 69

def prop(prop_name, meth = nil, &block)
  (@properties ||= []) << prop_name.to_sym
  register([:property, prop_name.to_sym], meth, &block)
end

Instance Method Details

#resetnil

This method is abstract.

Subclass and extend to reset the scraper state.

Reset this scraper instance’s state.

The default version of this method just clears @data and @problems.

Returns:

  • (nil)


108
109
110
111
# File 'lib/eagleclaw.rb', line 108

def reset
  data.clear
  problems.clear
end

#runObject

Run the scraper.

Operating procedure:

  1. Run before(:all) blocks.

  2. For each property (defined with prop(:prop_name)):

    1. Run before(:each) blocks.

    2. Run the property itself.

    3. Runs after(:each) blocks.

  3. Runs after(:all) blocks.

  4. Return data.

See Also:



128
129
130
131
132
133
134
135
136
137
# File 'lib/eagleclaw.rb', line 128

def run
  self.class.run_callbacks([:before, :all], self)
  self.class.properties.each do |property|
    self.class.run_callbacks([:before, :each], self)
    self.class.run_callbacks([:property, property], self)
    self.class.run_callbacks([:after, :each], self)
  end
  self.class.run_callbacks([:after, :all], self)
  data
end