Module: FreeScrape

Defined in:
lib/free_scrape/item.rb,
lib/free_scrape/version.rb,
lib/free_scrape/category.rb,
lib/free_scrape/item_link.rb,
lib/free_scrape/free_scrape.rb

Defined Under Namespace

Classes: Category, Item, ItemLink

Constant Summary collapse

VERSION =
'0.1.0'
COMMON_PROXY_PORT =

Common proxy port

8080
DEFAULT_LANGUAGE =

Default language

:en

Class Method Summary collapse

Class Method Details

.item(descriptor) ⇒ Object

Returns the Item with the specified descriptor, which can be either a URI to freebase.com, an Item GUID or an Item name.

FreeScrape.item('Aphex Twin')
# => #<FreeScrape::Item:0xb73fdba0 ...>


182
183
184
# File 'lib/free_scrape/free_scrape.rb', line 182

def FreeScrape.item(descriptor)
  Item.from(descriptor)
end

.languageObject

Returns the language to access FreeScrape with.



164
165
166
# File 'lib/free_scrape/free_scrape.rb', line 164

def FreeScrape.language
  @@free_scrape_language ||= DEFAULT_LANGUAGE
end

.language=(new_language) ⇒ Object

Sets the language to access FreeScrape with to the new_language.



171
172
173
# File 'lib/free_scrape/free_scrape.rb', line 171

def FreeScrape.language=(new_language)
  @@free_scrape_language = new_language.to_sym
end

.open_page(uri, options = {}) ⇒ Object

Similar to FreeScrape.open_uri but returns an Hpricot document.



119
120
121
# File 'lib/free_scrape/free_scrape.rb', line 119

def FreeScrape.open_page(uri,options={})
  Hpricot(FreeScrape.open_uri(uri,options))
end

.open_uri(uri, options = {}) ⇒ Object

Opens the uri with the given options. The contents of the uri will be returned.

options may contain the following keys:

:user_agent_alias

The User-Agent Alias to use.

:user_agent

The User-Agent String to use.

:proxy

A Hash of proxy information which may contain the following keys:

:host

The proxy host.

:port

The proxy port.

:user

The user-name to login as.

:password

The password to login with.

FreeScrape.open_uri('http://www.hackety.org/')

FreeScrape.open_uri('http://tenderlovemaking.com/',
  :user_agent_alias => 'Linux Mozilla')
FreeScrape.open_uri('http://www.wired.com/',
  :user_agent => 'the future')


97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
# File 'lib/free_scrape/free_scrape.rb', line 97

def FreeScrape.open_uri(uri,options={})
  headers = {}

  if options[:user_agent_alias]
    headers['User-Agent'] = WWW::Mechanize::AGENT_ALIASES[options[:user_agent_alias]]
  elsif options[:user_agent]
    headers['User-Agent'] = options[:user_agent]
  elsif FreeScrape.user_agent
    headers['User-Agent'] = FreeScrape.user_agent
  end

  proxy = (options[:proxy] || FreeScrape.proxy)
  if proxy[:host]
    headers[:proxy] = FreeScrape.proxy_uri(proxy)
  end

  return Kernel.open(uri,headers)
end

.proxyObject

Returns the Hash of proxy information.



18
19
20
21
22
23
24
25
# File 'lib/free_scrape/free_scrape.rb', line 18

def FreeScrape.proxy
  @@free_scrape_proxy ||= {
    :host => nil,
    :port => COMMON_PROXY_PORT,
    :user => nil,
    :password => nil
  }
end

.proxy_uri(proxy_info = FreeScrape.proxy) ⇒ Object

Creates a HTTP URI based from the given proxy_info hash. The proxy_info hash defaults to Web.proxy, if not given.

proxy_info may contain the following keys:

:host

The proxy host.

:port

The proxy port. Defaults to COMMON_PROXY_PORT, if not specified.

:user

The user-name to login as.

:password

The password to login with.



38
39
40
41
42
43
44
45
# File 'lib/free_scrape/free_scrape.rb', line 38

def FreeScrape.proxy_uri(proxy_info=FreeScrape.proxy)
  if FreeScrape.proxy[:host]
    return URI::HTTP.build(:host => FreeScrape.proxy[:host],
                           :port => FreeScrape.proxy[:port],
                           :userinfo => "#{FreeScrape.proxy[:user]}:#{FreeScrape.proxy[:password]}",
                           :path => '/')
  end
end

.user_agentObject

Returns the FreeScrape User-Agent



57
58
59
# File 'lib/free_scrape/free_scrape.rb', line 57

def FreeScrape.user_agent
  @@free_scrape_user_agent ||= FreeScrape.user_agent_aliases['Windows IE 6']
end

.user_agent=(agent) ⇒ Object

Sets the FreeScrape User-Agent to the specified agent.



64
65
66
# File 'lib/free_scrape/free_scrape.rb', line 64

def FreeScrape.user_agent=(agent)
  @@free_scrape_user_agent = agent
end

.user_agent_alias=(name) ⇒ Object

Sets the FreeScrape User-Agent using the specified user-agent alias name.



72
73
74
# File 'lib/free_scrape/free_scrape.rb', line 72

def FreeScrape.user_agent_alias=(name)
  @@free_scrape_user_agent = FreeScrape.user_agent_aliases[name.to_s]
end

.user_agent_aliasesObject

Returns the supported FreeScrape User-Agent Aliases.



50
51
52
# File 'lib/free_scrape/free_scrape.rb', line 50

def FreeScrape.user_agent_aliases
  WWW::Mechanize::AGENT_ALIASES
end

.web_agent(options = {}, &block) ⇒ Object

Creates a new WWW::Mechanize agent with the given options.

options may contain the following keys:

:user_agent_alias

The User-Agent Alias to use.

:user_agent

The User-Agent string to use.

:proxy

A Hash of proxy information which may contain the following keys:

:host

The proxy host.

:port

The proxy port.

:user

The user-name to login as.

:password

The password to login with.

FreeScrape.web_agent

FreeScrape.web_agent(:user_agent_alias => 'Linux Mozilla')
FreeScrape.web_agent(:user_agent => 'Google Bot')


141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
# File 'lib/free_scrape/free_scrape.rb', line 141

def FreeScrape.web_agent(options={},&block)
  agent = WWW::Mechanize.new

  if options[:user_agent_alias]
    agent.user_agent_alias = options[:user_agent_alias]
  elsif options[:user_agent]
    agent.user_agent = options[:user_agent]
  elsif FreeScrape.user_agent
    agent.user_agent = FreeScrape.user_agent
  end

  proxy = (options[:proxy] || FreeScrape.proxy)
  if proxy[:host]
    agent.set_proxy(proxy[:host],proxy[:port],proxy[:user],proxy[:password])
  end

  block.call(agent) if block
  return agent
end