Class: WebScraper
- Inherits:
-
Object
- Object
- WebScraper
- Defined in:
- lib/web_scraper.rb
Overview
WebScraper allows you to describe html structure declaratively, get appropriate blocks, and work with them as with ruby objects.
Defined Under Namespace
Classes: BaseDefentitionError, ConfigurationError, KeyDefentitionError, PropertyDefentitionError, ResourceDefentitionError
Class Attribute Summary collapse
-
._base ⇒ Object
readonly
Returns the value of attribute _base.
-
._key ⇒ Object
readonly
Returns the value of attribute _key.
-
._resource ⇒ Object
readonly
Returns the value of attribute _resource.
-
.properties ⇒ Object
readonly
Returns the value of attribute properties.
Class Method Summary collapse
-
.all ⇒ Object
Loads html page, detects appropriate blocks, wraps them in objects.
-
.base(_base) ⇒ Object
Defines base – selector which determines blocks of content.
-
.count ⇒ Object
Returns number of objects found.
-
.find(key) ⇒ Object
Finds first object with required key.
-
.key(_key) ⇒ Object
Defines key – property which will be used in find method.
-
.property(*args) ⇒ Object
Defines property – name (and type optionally) and selector.
-
.reset ⇒ Object
Resets cache of the html data.
-
.resource(_resource) ⇒ Object
Defines resource – url of the html page.
-
.valid? ⇒ Boolean
Checks if all attributes were set.
-
.valid_info?(info) ⇒ Boolean
Checks if property information (i.e. name and type) were defined correctly.
-
.valid_selector?(selector) ⇒ Boolean
Checks if selector was defined correctly.
Instance Method Summary collapse
-
#css(*args) ⇒ Object
Allows you to use nokogiri css method directly on your object.
-
#initialize(node) ⇒ WebScraper
constructor
Sets nokogiri node.
-
#method_missing(name, *args, &block) ⇒ Object
Returns appropriate value for property if found.
-
#xpath(*args) ⇒ Object
Allows you to use nokogiri xpath method directly on your object.
Constructor Details
#initialize(node) ⇒ WebScraper
Sets nokogiri node. It’s private method.
248 249 250 |
# File 'lib/web_scraper.rb', line 248 def initialize(node) @node = node end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args, &block) ⇒ Object
Returns appropriate value for property if found. Converts it to the defined type.
271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 |
# File 'lib/web_scraper.rb', line 271 def method_missing(name, *args, &block) if self.class.properties.key? name property = self.class.properties[name] type = property[:type] value = @node.send(*property[:selector]) case type when :string then value.text.strip when :integer then value.text.to_i when :float then value.text.to_f when :node then value end else super(name, *args, &block) end end |
Class Attribute Details
._base ⇒ Object (readonly)
Returns the value of attribute _base.
156 157 158 |
# File 'lib/web_scraper.rb', line 156 def _base @_base end |
._key ⇒ Object (readonly)
Returns the value of attribute _key.
217 218 219 |
# File 'lib/web_scraper.rb', line 217 def _key @_key end |
._resource ⇒ Object (readonly)
Returns the value of attribute _resource.
139 140 141 |
# File 'lib/web_scraper.rb', line 139 def _resource @_resource end |
.properties ⇒ Object (readonly)
Returns the value of attribute properties.
201 202 203 |
# File 'lib/web_scraper.rb', line 201 def properties @properties end |
Class Method Details
.all ⇒ Object
Loads html page, detects appropriate blocks, wraps them in objects. The result will be cached.
94 95 96 97 98 99 |
# File 'lib/web_scraper.rb', line 94 def all raise ConfigurationError unless valid? @all ||= Nokogiri::HTML(open(_resource)) .send(*_base).map { |node| new(node) } end |
.base(_base) ⇒ Object
Defines base – selector which determines blocks of content. You can use css or xpath selectors.
150 151 152 153 154 |
# File 'lib/web_scraper.rb', line 150 def base(_base) raise BaseDefentitionError unless valid_selector? _base @_base = _base.to_a.flatten end |
.count ⇒ Object
Returns number of objects found.
105 106 107 |
# File 'lib/web_scraper.rb', line 105 def count all.size end |
.find(key) ⇒ Object
Finds first object with required key.
121 122 123 |
# File 'lib/web_scraper.rb', line 121 def find(key) all.find { |e| e.send(_key) == key } end |
.key(_key) ⇒ Object
Defines key – property which will be used in find method.
211 212 213 214 215 |
# File 'lib/web_scraper.rb', line 211 def key(_key) raise KeyDefentitionError unless properties.keys.include? _key @_key = _key end |
.property(*args) ⇒ Object
Defines property – name (and type optionally) and selector. You can use css or xpath selectors. Types determine returning values. Available types (default is string): string, integer, float, node. The node option means nokogiri node.
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 |
# File 'lib/web_scraper.rb', line 171 def property(*args) @properties ||= {} exception = PropertyDefentitionError case args.length when 1 params = args[0] raise exception unless params.is_a? Hash info = params.reject { |k| [:css, :xpath].include? k } selector = params.select { |k| [:css, :xpath].include? k } when 2 name, selector = args info = { name => :string } else raise exception end raise exception unless valid_selector? selector raise exception unless valid_info? info name = info.keys.first type = info.values.first selector = selector.to_a.flatten @properties[name] = { type: type, selector: selector } end |
.reset ⇒ Object
Resets cache of the html data.
113 114 115 |
# File 'lib/web_scraper.rb', line 113 def reset @all = nil end |
.resource(_resource) ⇒ Object
Defines resource – url of the html page.
133 134 135 136 137 |
# File 'lib/web_scraper.rb', line 133 def resource(_resource) raise ResourceDefentitionError unless _resource.is_a? String @_resource = _resource end |
.valid? ⇒ Boolean
Checks if all attributes were set.
221 222 223 |
# File 'lib/web_scraper.rb', line 221 def valid? _resource && _base && _key end |
.valid_info?(info) ⇒ Boolean
Checks if property information (i.e. name and type) were defined correctly.
236 237 238 239 240 241 |
# File 'lib/web_scraper.rb', line 236 def valid_info?(info) (info.is_a? Hash) && (info.size == 1) && (info.keys.first.is_a? Symbol) && ([:string, :integer, :float, :node].include? info.values.first) end |
.valid_selector?(selector) ⇒ Boolean
Checks if selector was defined correctly.
227 228 229 230 231 232 |
# File 'lib/web_scraper.rb', line 227 def valid_selector?(selector) (selector.is_a? Hash) && (selector.size == 1) && ([:css, :xpath].include? selector.keys.first) && (selector.values.first.is_a? String) end |
Instance Method Details
#css(*args) ⇒ Object
Allows you to use nokogiri css method directly on your object. It proxies it to nokogiri node.
255 256 257 |
# File 'lib/web_scraper.rb', line 255 def css(*args) @node.css(*args) end |
#xpath(*args) ⇒ Object
Allows you to use nokogiri xpath method directly on your object. It proxies it to nokogiri node.
262 263 264 |
# File 'lib/web_scraper.rb', line 262 def xpath(*args) @node.xpath(*args) end |