Module: GScraper

Defined in:
lib/gscraper/page.rb,
lib/gscraper/hosts.rb,
lib/gscraper/version.rb,
lib/gscraper/gscraper.rb,
lib/gscraper/licenses.rb,
lib/gscraper/has_pages.rb,
lib/gscraper/languages.rb,
lib/gscraper/search/page.rb,
lib/gscraper/search/query.rb,
lib/gscraper/sponsored_ad.rb,
lib/gscraper/search/result.rb,
lib/gscraper/search/search.rb,
lib/gscraper/sponsored_links.rb,
lib/gscraper/search/web_query.rb,
lib/gscraper/search/ajax_query.rb,
lib/gscraper/search/exceptions/blocked.rb

Overview

GScraper - A web-scraping interface to various Google Services.

Copyright (c) 2007-2012 Hal Brodigan (postmodern.mod3 at gmail.com)

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA

Defined Under Namespace

Modules: HasPages, Hosts, Languages, Licenses, Search Classes: Page, SponsoredAd, SponsoredLinks

Constant Summary collapse

VERSION =

The version of GScraper

'0.4.0'
COMMON_PROXY_PORT =

Common proxy port.

8080

Class Method Summary collapse

Class Method Details

.proxyHash

The proxy information.

Returns:

  • (Hash)


34
35
36
37
38
39
40
41
# File 'lib/gscraper/gscraper.rb', line 34

def self.proxy
  @@gscraper_proxy ||= {
    :host     => nil,
    :port     => COMMON_PROXY_PORT,
    :user     => nil,
    :password => nil
  }
end

.proxy_uri(proxy = self.proxy) ⇒ Object

Creates a HTTP URI for the current proxy.

Parameters:

  • proxy_info (Hash)

    The proxy information.



61
62
63
64
65
66
67
68
69
70
# File 'lib/gscraper/gscraper.rb', line 61

def self.proxy_uri(proxy=self.proxy)
  if proxy[:host]
    return URI::HTTP.build(
      :host     => proxy[:host],
      :port     => proxy[:port],
      :userinfo => "#{proxy[:user]}:#{proxy[:password]}",
      :path     => '/'
    )
  end
end

.user_agentString

The GScraper User-Agent.

Returns:

  • (String)


86
87
88
# File 'lib/gscraper/gscraper.rb', line 86

def self.user_agent
  @@gscraper_user_agent ||= self.user_agent_aliases['Windows IE 6']
end

.user_agent=(agent) ⇒ String

Sets the GScraper User-Agent.

Parameters:

  • agent (String)

    The new User-Agent string.

Returns:

  • (String)

    The new User-Agent string.



99
100
101
# File 'lib/gscraper/gscraper.rb', line 99

def self.user_agent=(agent)
  @@gscraper_user_agent = agent
end

.user_agent_alias=(name) ⇒ String

Sets the GScraper User-Agent.

Parameters:

  • name (String)

    The User-Agent alias.

Returns:

  • (String)

    The new User-Agent string.



112
113
114
# File 'lib/gscraper/gscraper.rb', line 112

def self.user_agent_alias=(name)
  @@gscraper_user_agent = self.user_agent_aliases[name.to_s]
end

.user_agent_aliasesArray<String>

The supported GScraper User-Agent Aliases.

Returns:

  • (Array<String>)


77
78
79
# File 'lib/gscraper/gscraper.rb', line 77

def self.user_agent_aliases
  Mechanize::AGENT_ALIASES
end

.web_agent(options = {}) {|agent| ... } ⇒ Object

Creates a new Mechanize agent.

Examples:

GScraper.web_agent
GScraper.web_agent(:user_agent_alias => 'Linux Mozilla')
GScraper.web_agent(:user_agent => 'Google Bot')

Parameters:

  • options (Hash) (defaults to: {})

    Additional options.

  • :proxy (Hash)

    a customizable set of options

Options Hash (options):

  • :user_agent_alias (String)

    The User-Agent Alias to use.

  • :user_agent (String)

    The User-Agent string to use.

  • :proxy (Hash)

    The proxy information to use.

Yields:

  • (agent)


150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# File 'lib/gscraper/gscraper.rb', line 150

def self.web_agent(options={})
  agent = Mechanize.new

  if options[:user_agent_alias]
    agent.user_agent_alias = options[:user_agent_alias]
  elsif options[:user_agent]
    agent.user_agent = options[:user_agent]
  elsif user_agent
    agent.user_agent = self.user_agent
  end

  proxy = (options[:proxy] || self.proxy)
  if proxy[:host]
    agent.set_proxy(proxy[:host],proxy[:port],proxy[:user],proxy[:password])
  end

  yield agent if block_given?
  return agent
end