Class: Infoboxer::MediaWiki

Inherits:
Object
  • Object
show all
Defined in:
lib/infoboxer/media_wiki.rb,
lib/infoboxer/media_wiki/page.rb,
lib/infoboxer/media_wiki/traits.rb

Overview

MediaWiki client class.

Usage:

client = Infoboxer::MediaWiki
  .new('http://en.wikipedia.org/w/api.php', user_agent: 'My Own Project')
page = client.get('Argentina')

Consider using shortcuts like #wiki, #wikipedia, #wp and so on instead of direct instation of this class (although you can if you want to!)

Defined Under Namespace

Classes: Page, Traits

Constant Summary collapse

UA =

Default Infoboxer User-Agent header.

You can set yours as an option to Infoboxer#wiki and its shortcuts, or to #initialize

"Infoboxer/#{Infoboxer::VERSION} "\
'(https://github.com/molybdenum-99/infoboxer; [email protected])'

Class Attribute Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(api_base_url, ua: nil, user_agent: ua) ⇒ MediaWiki

Creating new MediaWiki client. Infoboxer#wiki provides shortcut for it, as well as shortcuts for some well-known wikis, like Infoboxer#wikipedia.



57
58
59
60
61
# File 'lib/infoboxer/media_wiki.rb', line 57

def initialize(api_base_url, ua: nil, user_agent: ua)
  @api_base_url = Addressable::URI.parse(api_base_url)
  @api = MediaWiktory::Wikipedia::Api.new(api_base_url, user_agent: user_agent(user_agent))
  @traits = Traits.get(@api_base_url.host, siteinfo)
end

Class Attribute Details

.user_agentString

User agent getter/setter.

Default value is UA.

You can also use per-instance option, see #initialize



40
41
42
# File 'lib/infoboxer/media_wiki.rb', line 40

def user_agent
  @user_agent
end

Instance Attribute Details

#apiMediaWiktory::Wikipedia::Client (readonly)



47
48
49
# File 'lib/infoboxer/media_wiki.rb', line 47

def api
  @api
end

Instance Method Details

#category(title, limit: 'max', &processor) ⇒ Tree::Nodes<Page>

Receive list of parsed MediaWiki pages from specified category.



178
179
180
181
182
# File 'lib/infoboxer/media_wiki.rb', line 178

def category(title, limit: 'max', &processor)
  title = normalize_category_title(title)

  list(@api.query.generator(:categorymembers).title(title), limit, &processor)
end

#get(*titles, interwiki: nil, &processor) ⇒ Page, Tree::Nodes<Page>

Receive list of parsed MediaWiki pages for list of titles provided. All pages are received with single query to MediaWiki API.

NB: if you are requesting more than 50 titles at once (MediaWiki limitation for single request), Infoboxer will do as many queries as necessary to extract them all (it will be like (titles.count / 50.0).ceil requests)



130
131
132
133
134
135
# File 'lib/infoboxer/media_wiki.rb', line 130

def get(*titles, interwiki: nil, &processor)
  return interwikis(interwiki).get(*titles, &processor) if interwiki

  pages = get_h(*titles, &processor).values.compact
  titles.count == 1 ? pages.first : Tree::Nodes[*pages]
end

#get_h(*titles, &processor) ⇒ Hash<String, Page>

Same as #get, but returns hash of {requested title => page}.

Useful quirks:

  • when requested page not existing, key will be still present in resulting hash (value will be nil);
  • when requested page redirects to another, key will still be the requested title. For ex., get_h('Einstein') will return hash with key 'Einstein' and page titled 'Albert Einstein'.

This allows you to be in full control of what pages of large list you've received.



157
158
159
160
161
162
# File 'lib/infoboxer/media_wiki.rb', line 157

def get_h(*titles, &processor)
  raw_pages = raw(*titles, &processor)
              .tap { |ps| ps.detect { |_, p| p['invalid'] }.tap { |_, i| i && fail(i['invalidreason']) } }
              .reject { |_, p| p.key?('missing') }
  titles.map { |title| [title, make_page(raw_pages, title)] }.to_h
end

#inspectString



222
223
224
# File 'lib/infoboxer/media_wiki.rb', line 222

def inspect
  "#<#{self.class}(#{@api_base_url.host})>"
end

#prefixsearch(prefix, limit: 'max', &processor) ⇒ Tree::Nodes<Page>

Receive list of parsed MediaWiki pages with titles startin from prefix. See MediaWiki API docs for details.



217
218
219
# File 'lib/infoboxer/media_wiki.rb', line 217

def prefixsearch(prefix, limit: 'max', &processor)
  list(@api.query.generator(:prefixsearch).search(prefix), limit, &processor)
end

#raw(*titles, &processor) ⇒ Hash{String => Hash}

Receive "raw" data from Wikipedia (without parsing or wrapping in classes).



75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# File 'lib/infoboxer/media_wiki.rb', line 75

def raw(*titles, &processor)
  # could emerge on "automatically" created page lists, should work
  return {} if titles.empty?

  titles.each_slice(50).map do |part|
    request = prepare_request(@api.query.titles(*part), &processor)
    response = request.response

    # If additional props are required, there may be additional pages, even despite each_slice(50)
    response = response.continue while response.continue?

    sources = response['pages'].values.map { |page| [page['title'], page] }.to_h
    redirects =
      if response['redirects']
        response['redirects'].map { |r| [r['from'], sources[r['to']]] }.to_h
      else
        {}
      end

    # This way for 'Einstein' query we'll have {'Albert Einstein' => page, 'Einstein' => same page}
    sources.merge(redirects)
  end.inject(:merge)
end

#search(query, limit: 'max', &processor) ⇒ Tree::Nodes<Page>

Receive list of parsed MediaWiki pages for provided search query. See MediaWiki API docs for details.



200
201
202
# File 'lib/infoboxer/media_wiki.rb', line 200

def search(query, limit: 'max', &processor)
  list(@api.query.generator(:search).search(query), limit, &processor)
end