Class: Infoboxer::MediaWiki

Inherits:
Object
  • Object
show all
Defined in:
lib/infoboxer/media_wiki.rb,
lib/infoboxer/media_wiki/page.rb,
lib/infoboxer/media_wiki/traits.rb

Overview

MediaWiki client class.

Usage:

client = Infoboxer::MediaWiki
  .new('http://en.wikipedia.org/w/api.php', user_agent: 'My Own Project')
page = client.get('Argentina')

Consider using shortcuts like #wiki, #wikipedia, #wp and so on instead of direct instation of this class (although you can if you want to!)

Defined Under Namespace

Classes: Page, Traits

Constant Summary collapse

UA =

Default Infoboxer User-Agent header.

You can set yours as an option to Infoboxer#wiki and its shortcuts, or to #initialize

"Infoboxer/#{Infoboxer::VERSION} "\
'(https://github.com/molybdenum-99/infoboxer; [email protected])'.freeze

Class Attribute Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(api_base_url, ua: nil, user_agent: ua) ⇒ MediaWiki

Creating new MediaWiki client. Infoboxer#wiki provides shortcut for it, as well as shortcuts for some well-known wikis, like Infoboxer#wikipedia.

Parameters:

  • api_base_url (String)

    URL of api.php file in your MediaWiki installation. Typically, its <domain>/w/api.php, but can vary in different wikis.

  • user_agent (String) (defaults to: ua)

    (also aliased as :ua) Custom User-Agent header.



55
56
57
58
59
# File 'lib/infoboxer/media_wiki.rb', line 55

def initialize(api_base_url, ua: nil, user_agent: ua)
  @api_base_url = Addressable::URI.parse(api_base_url)
  @api = MediaWiktory::Wikipedia::Api.new(api_base_url, user_agent: user_agent(user_agent))
  @traits = Traits.get(@api_base_url.host, siteinfo)
end

Class Attribute Details

.user_agentString

User agent getter/setter.

Default value is UA.

You can also use per-instance option, see #initialize

Returns:

  • (String)


38
39
40
# File 'lib/infoboxer/media_wiki.rb', line 38

def user_agent
  @user_agent
end

Instance Attribute Details

#apiMediaWiktory::Wikipedia::Client (readonly)

Returns:

  • (MediaWiktory::Wikipedia::Client)


45
46
47
# File 'lib/infoboxer/media_wiki.rb', line 45

def api
  @api
end

Instance Method Details

#category(title, limit: 'max', &processor) ⇒ Tree::Nodes<Page>

Receive list of parsed MediaWiki pages from specified category.

Parameters:

  • title (String)

    Category title. You can use namespaceless title (like "Countries in South America"), title with namespace (like "Category:Countries in South America") or title with local namespace (like "Catégorie:Argentine" for French Wikipedia)

  • limit (Integer, "max") (defaults to: 'max')
  • processor (Proc)

    Optional block to preprocess MediaWiktory query. Refer to MediaWiktory::Actions::Query for its API. Infoboxer assumes that the block returns new instance of Query, so be careful while using it.

Returns:



176
177
178
179
180
# File 'lib/infoboxer/media_wiki.rb', line 176

def category(title, limit: 'max', &processor)
  title = normalize_category_title(title)

  list(@api.query.generator(:categorymembers).title(title), limit, &processor)
end

#get(*titles, interwiki: nil, &processor) ⇒ Page, Tree::Nodes<Page>

Receive list of parsed MediaWiki pages for list of titles provided. All pages are received with single query to MediaWiki API.

NB: if you are requesting more than 50 titles at once (MediaWiki limitation for single request), Infoboxer will do as many queries as necessary to extract them all (it will be like (titles.count / 50.0).ceil requests)

Parameters:

  • titles (Array<String>)

    List of page titles to get.

  • interwiki (Symbol) (defaults to: nil)

    Identifier of other wiki, related to current, to fetch pages from.

  • processor (Proc)

    Optional block to preprocess MediaWiktory query. Refer to MediaWiktory::Actions::Query for its API. Infoboxer assumes that the block returns new instance of Query, so be careful while using it.

Returns:

  • (Page, Tree::Nodes<Page>)

    array of parsed pages. Notes:

    • if you call get with only one title, one page will be returned instead of an array
    • if some of pages are not in wiki, they will not be returned, therefore resulting array can be shorter than titles array; you can always check pages.map(&:title) to see what you've really received; this approach allows you to write absent-minded code like this:
      Infoboxer.wp.get('Argentina', 'Chile', 'Something non-existing').
         infobox.fetch('some value')
    

    and obtain meaningful results instead of NoMethodError or SomethingNotFound.



128
129
130
131
132
133
# File 'lib/infoboxer/media_wiki.rb', line 128

def get(*titles, interwiki: nil, &processor)
  return interwikis(interwiki).get(*titles, &processor) if interwiki

  pages = get_h(*titles, &processor).values.compact
  titles.count == 1 ? pages.first : Tree::Nodes[*pages]
end

#get_h(*titles, &processor) ⇒ Hash<String, Page>

Same as #get, but returns hash of {requested title => page}.

Useful quirks:

  • when requested page not existing, key will be still present in resulting hash (value will be nil);
  • when requested page redirects to another, key will still be the requested title. For ex., get_h('Einstein') will return hash with key 'Einstein' and page titled 'Albert Einstein'.

This allows you to be in full control of what pages of large list you've received.

Parameters:

  • titles (Array<String>)

    List of page titles to get.

  • processor (Proc)

    Optional block to preprocess MediaWiktory query. Refer to MediaWiktory::Actions::Query for its API. Infoboxer assumes that the block returns new instance of Query, so be careful while using it.

Returns:

  • (Hash<String, Page>)


155
156
157
158
159
160
# File 'lib/infoboxer/media_wiki.rb', line 155

def get_h(*titles, &processor)
  raw_pages = raw(*titles, &processor)
              .tap { |ps| ps.detect { |_, p| p['invalid'] }.tap { |_, i| i && fail(i['invalidreason']) } }
              .reject { |_, p| p.key?('missing') }
  titles.map { |title| [title, make_page(raw_pages, title)] }.to_h
end

#inspectString

Returns:

  • (String)


220
221
222
# File 'lib/infoboxer/media_wiki.rb', line 220

def inspect
  "#<#{self.class}(#{@api_base_url.host})>"
end

#prefixsearch(prefix, limit: 'max', &processor) ⇒ Tree::Nodes<Page>

Receive list of parsed MediaWiki pages with titles startin from prefix. See MediaWiki API docs for details.

Parameters:

  • prefix (String)

    Page title prefix.

  • limit (Integer, "max") (defaults to: 'max')
  • processor (Proc)

    Optional block to preprocess MediaWiktory query. Refer to MediaWiktory::Actions::Query for its API. Infoboxer assumes that the block returns new instance of Query, so be careful while using it.

Returns:



215
216
217
# File 'lib/infoboxer/media_wiki.rb', line 215

def prefixsearch(prefix, limit: 'max', &processor)
  list(@api.query.generator(:prefixsearch).search(prefix), limit, &processor)
end

#raw(*titles, &processor) ⇒ Hash{String => Hash}

Receive "raw" data from Wikipedia (without parsing or wrapping in classes).

Parameters:

  • titles (Array<String>)

    List of page titles to get.

  • processor (Proc)

    Optional block to preprocess MediaWiktory query. Refer to MediaWiktory::Actions::Query for its API. Infoboxer assumes that the block returns new instance of Query, so be careful while using it.

Returns:

  • (Hash{String => Hash})

    Hash of {requested title => raw MediaWiki object}. Note that even missing (does not exist in current Wiki) or invalid (impossible title) still be present in response, just will have "missing" or "invalid" key, just like MediaWiki returns them.



73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# File 'lib/infoboxer/media_wiki.rb', line 73

def raw(*titles, &processor)
  # could emerge on "automatically" created page lists, should work
  return {} if titles.empty?

  titles.each_slice(50).map do |part|
    request = prepare_request(@api.query.titles(*part), &processor)
    response = request.response

    # If additional props are required, there may be additional pages, even despite each_slice(50)
    response = response.continue while response.continue?

    sources = response['pages'].values.map { |page| [page['title'], page] }.to_h
    redirects =
      if response['redirects']
        response['redirects'].map { |r| [r['from'], sources[r['to']]] }.to_h
      else
        {}
      end

    # This way for 'Einstein' query we'll have {'Albert Einstein' => page, 'Einstein' => same page}
    sources.merge(redirects)
  end.inject(:merge)
end

#search(query, limit: 'max', &processor) ⇒ Tree::Nodes<Page>

Receive list of parsed MediaWiki pages for provided search query. See MediaWiki API docs for details.

Parameters:

Returns:



198
199
200
# File 'lib/infoboxer/media_wiki.rb', line 198

def search(query, limit: 'max', &processor)
  list(@api.query.generator(:search).search(query), limit, &processor)
end