Module: XboxLive::Scraper

Defined in:
lib/xbox_live/scraper.rb

Overview

Scraper is a collection of methods to log into the Xbox Live web site and retrieve web pages.

The only public function is XboxLive::Scraper.get_page(url)

Class Method Summary collapse

Class Method Details

.agentObject

Create and memoize the Mechanize agent



141
142
143
144
# File 'lib/xbox_live/scraper.rb', line 141

def self.agent
  log "  Initializing mechanize agent @ #{Time.now.to_s}" if !defined? @@agent
  @@agent ||= Mechanize.new { |a| a.user_agent_alias = 'Mac Safari' }
end

.get_page(url) ⇒ Object

Load a page from Xbox Live and return a Mechanize/Nokogiri page TODO: cache pages for some time to prevent duplicative HTTP activity



16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# File 'lib/xbox_live/scraper.rb', line 16

def self.get_page(url)
  log "Loading page #{url}."

  # Check to see if there is a recent version of the page in cache
  if @cache[url]
    log "  Found page in cache."
    return @cache[url][:page] if Time.now - @cache[url][:updated_at] < XboxLive.options[:refresh_age]
    log "    but the cached page is stale."
  end

  # Load the specified page via Mechanize
  log "  Getting page from Xbox Live."
  page = safe_get(url)

  # Most pages require authentication. If the Mechanize agent has
  # not logged in yet, or if the session has expired, it will be
  # redirected to the Xbox Live login page.
  if (page)
    # Log the agent in via the returned login page.
    log "  Page load failed - not signed in."
    page = (page)

    # The login SHOULD have returned the original page requested,
    # but the URL will be the POST URL, so there is no way to be
    # certain. Therefore, it is safest to just load the page again
    # now that the Mechanize agent has logged in.
    log "  Retrying page #{url}"
    page = safe_get(url)
  end

  if page.nil? or page.title.match /Error/
    log "  ERROR: failed to load page. Trying again."
    page = safe_get(url)
    if page.nil? or page.title.match /Error/
      log "  ERROR: failed on second try. Aborting."
      return nil
    else
      log "  SUCCESS: page loaded on retry."
    end
  end

  if page.uri.to_s != url
    log "  ERROR: loaded page URL does not match expected URL. Loaded: #{page.uri.to_s}"
    return nil
  end

  log "  Loaded page '#{page.title.strip}'. Storing in cache."
  @cache[url] = { page: page, updated_at: Time.now }
  page
end

.log(message) ⇒ Object

Write out a log entry



147
148
149
# File 'lib/xbox_live/scraper.rb', line 147

def self.log(message)
  puts message if XboxLive.options[:debug]
end

.login(page) ⇒ Object

Log in to Xbox Live using the supplied login page.



88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# File 'lib/xbox_live/scraper.rb', line 88

def self.(page)
  return nil if !(page)

  # Find the URL where the login form should be POSTed to.
  url = page.body.match(/srf_uPost='([^']+)/)[1]
  if url.empty?
    log "  ERROR: Trying to log in but 'Sign In' page doesn't contain needed info."
    return nil
  end

  # PPFT appears to be some kind of session identifier which is
  # required for the login process.
  ppft_html = page.body.match(/srf_sFT='([^']+)/)[1]
  ppft = ppft_html.match(/value="([^"]+)/)[1]

  # The rest of the parameters are either user-provided (i.e.
  # username and password) or are constants.
  params = {
    'login' => XboxLive.options[:username],
    'passwd' => XboxLive.options[:password],
    'type' => '11',
    'LoginOptions' => '3',
    'NewUser' => '1',
    'PPSX' => 'Passpor',
    'PPFT' => ppft,
    'idshbo' => '1'
  }

  # POST the login form and hope for the best.
  log "  Submitting login form via POST"
  page = agent.post(url, params)

  # The login will fail and return a page saying that Javascript must be
  # enabled. However, there is a hidden form in the page that can be
  # submitted to enable non-javascript support.
  form = page.form('fmHF')
  if form.nil?
    log "  ERROR: The non-JS login page doesn't contain form fmHF."
    return nil
  end

  # Submitting the form on the Javascript error page completes the
  # login process, and SHOULD return the originally requested page.
  log "  Submitting final non-JS login form"
  agent.submit(form)
end

.login_page?(page) ⇒ Boolean

Check to see if the provided page the Xbox Live login page.

Returns:

  • (Boolean)


136
137
138
# File 'lib/xbox_live/scraper.rb', line 136

def self.(page)
  page and page.title == "Welcome to Windows Live"
end

.post_page(url, params) ⇒ Object

POST a page to Xbox Live and return the result.



68
69
70
71
72
# File 'lib/xbox_live/scraper.rb', line 68

def self.post_page(url, params)
  log "POSTing page #{url} with params #{params}."
  page = agent.post(url, params)
  page
end

.safe_get(page) ⇒ Object

Get a page, but catch any errors so processing can continue



78
79
80
81
82
83
84
85
# File 'lib/xbox_live/scraper.rb', line 78

def self.safe_get(page)
  begin
    return agent.get(page)
  # rescue Errno::ETIMEDOUT, Timeout::Error, Mechanize::ResponseCodeError
  rescue
    return nil
  end
end