Class: Infoboxer::MediaWiki

Inherits:

Object

Object
Infoboxer::MediaWiki

Defined in:: lib/infoboxer/media_wiki.rb,
lib/infoboxer/media_wiki/page.rb,
lib/infoboxer/media_wiki/traits.rb

Overview

MediaWiki client class.

Usage:

client = Infoboxer::MediaWiki.new('http://en.wikipedia.org/w/api.php', user_agent: 'My Own Project')
page = client.get('Argentina')

Consider using shortcuts like #wiki, #wikipedia, #wp and so on instead of direct instation of this class (although you can if you want to!)

Defined Under Namespace

Classes: Page, Traits

Constant Summary collapse

UA = Default Infoboxer User-Agent header. You can set yours as an option to Infoboxer#wiki and its shortcuts, or to #initialize

"Infoboxer/#{Infoboxer::VERSION} (https://github.com/molybdenum-99/infoboxer; [email protected])"

Class Attribute Summary collapse

.user_agent ⇒ Object
User agent getter/setter.

Instance Attribute Summary collapse

#api_base_url ⇒ Object readonly
Returns the value of attribute api_base_url.
#traits ⇒ Object readonly
Returns the value of attribute traits.

Instance Method Summary collapse

#category(title) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages from specified category.
#get(*titles) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages for list of titles provided.
#get_h(*titles) ⇒ Hash<String, Page>
Same as #get, but returns hash of title => page.
#initialize(api_base_url, options = {}) ⇒ MediaWiki constructor
Creating new MediaWiki client.
#inspect ⇒ Object
#prefixsearch(prefix) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages with titles startin from prefix.
#raw(*titles) ⇒ Array<Hash>
Receive "raw" data from Wikipedia (without parsing or wrapping in classes).
#search(query) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages for provided search query.

Constructor Details

#initialize(api_base_url, options = {}) ⇒ `MediaWiki`

Creating new MediaWiki client. Infoboxer#wiki provides shortcut for it, as well as shortcuts for some well-known wikis, like Infoboxer#wikipedia.

Parameters:

api_base_url —
URL of api.php file in your MediaWiki installation. Typically, its <domain>/w/api.php, but can vary in different wikis.
options (defaults to: {}) —
Only one option is currently supported:
- :user_agent (also aliased as :ua) -- custom User-Agent header.

# File 'lib/infoboxer/media_wiki.rb', line 52

def initialize(api_base_url, options = {})
  @api_base_url = Addressable::URI.parse(api_base_url)
  @client = MediaWiktory::Client.new(api_base_url, user_agent: user_agent(options))
  @traits = Traits.get(@api_base_url.host, namespaces: extract_namespaces)
end

Class Attribute Details

.user_agent ⇒ `Object`

User agent getter/setter.

Default value is UA.

You can also use per-instance option, see #initialize



38
39
40

# File 'lib/infoboxer/media_wiki.rb', line 38

def user_agent
  @user_agent
end

Instance Attribute Details

#api_base_url ⇒ `Object` (readonly)

Returns the value of attribute api_base_url.



41
42
43

# File 'lib/infoboxer/media_wiki.rb', line 41

def api_base_url
  @api_base_url
end

#traits ⇒ `Object` (readonly)

Returns the value of attribute traits.



41
42
43

# File 'lib/infoboxer/media_wiki.rb', line 41

def traits
  @traits
end

Instance Method Details

#category(title) ⇒ `Tree::Nodes<Page>`

Receive list of parsed MediaWiki pages from specified category.

NB: currently, this API always fetches all pages from category, there is no option to "take first 20 pages". Pages are fetched in 50-page batches, then parsed. So, for large category it can really take a while to fetch all pages.

Parameters:

title —
Category title. You can use namespaceless title (like "Countries in South America"), title with namespace (like "Category:Countries in South America") or title with local namespace (like "Catégorie:Argentine" for French Wikipedia)

Returns:

(Tree::Nodes<Page>) —
array of parsed pages.

# File 'lib/infoboxer/media_wiki.rb', line 149

def category(title)
  title = normalize_category_title(title)
  
  list(categorymembers: {title: title, limit: 50})
end

#get(*titles) ⇒ `Tree::Nodes<Page>`

Receive list of parsed MediaWiki pages for list of titles provided. All pages are received with single query to MediaWiki API.

NB: if you are requesting more than 50 titles at once (MediaWiki limitation for single request), Infoboxer will do as many queries as necessary to extract them all (it will be like (titles.count / 50.0).ceil requests)

Returns:

(Tree::Nodes<Page>) —
array of parsed pages. Notes:
- if you call get with only one title, one page will be returned instead of an array
- if some of pages are not in wiki, they will not be returned, therefore resulting array can be shorter than titles array; you can always check pages.map(&:title) to see what you've really received; this approach allows you to write absent-minded code like this:
```
  Infoboxer.wp.get('Argentina', 'Chile', 'Something non-existing').
     infobox.fetch('some value')
```
and obtain meaningful results instead of NoMethodError or some NotFound.

# File 'lib/infoboxer/media_wiki.rb', line 102

def get(*titles)
  pages = raw(*titles).
    tap{|pages| pages.detect(&:invalid?).tap{|i| i && fail(i.raw.invalidreason)}}.
    select(&:exists?).
    map{|raw|
      Page.new(self,
        Parser.paragraphs(raw.content, traits),
        raw)
    }
  titles.count == 1 ? pages.first : Tree::Nodes[*pages]
end

#get_h(*titles) ⇒ `Hash<String, Page>`

Same as #get, but returns hash of title => page.

Useful quirks:

when requested page not existing, key will be still present in resulting hash (value will be nil);
when requested page redirects to another, key will still be the requested title. For ex., get_h('Einstein') will return hash with key 'Einstein' and page titled 'Albert Einstein'.

This allows you to be in full control of what pages of large list you've received.

Returns:

(Hash<String, Page>)

# File 'lib/infoboxer/media_wiki.rb', line 128

def get_h(*titles)
  pages = [*get(*titles)]
  titles.map{|t|
    [t, pages.detect{|p| p.source.alt_titles.map(&:downcase).include?(t.downcase)}]
  }.to_h
end

#inspect ⇒ `Object`



192
193
194

# File 'lib/infoboxer/media_wiki.rb', line 192

def inspect
  "#<#{self.class}(#{@api_base_url.host})>"
end

#prefixsearch(prefix) ⇒ `Tree::Nodes<Page>`

Receive list of parsed MediaWiki pages with titles startin from prefix. See MediaWiki API docs for details.

Parameters:

prefix —
page title prefix.

Returns:

(Tree::Nodes<Page>) —
array of parsed pages.



188
189
190

# File 'lib/infoboxer/media_wiki.rb', line 188

def prefixsearch(prefix)
  list(prefixsearch: {search: prefix, limit: 100})
end

#raw(*titles) ⇒ `Array<Hash>`

Receive "raw" data from Wikipedia (without parsing or wrapping in classes).

Returns:

(Array<Hash>)

# File 'lib/infoboxer/media_wiki.rb', line 62

def raw(*titles)
  return [] if titles.empty? # could emerge on "automatically" created page lists, should work
  
  titles.each_slice(50).map{|part|
    @client.query.
      titles(*part).
      prop(revisions: {prop: :content}, info: {prop: :url}).
      redirects(true). # FIXME: should be done transparently by MediaWiktory?
      perform.pages
  }.inject(:concat). # somehow flatten(1) fails!
  sort_by{|page|
    res_title = page.alt_titles.detect{|t| titles.map(&:downcase).include?(t.downcase)} # FIXME?..
    titles.index(res_title) || 1_000
  }
end

#search(query) ⇒ `Tree::Nodes<Page>`

Receive list of parsed MediaWiki pages for provided search query. See MediaWiki API docs for details.

Parameters:

query —
Search query. For old installations, look at https://www.mediawiki.org/wiki/Help:Searching for search syntax. For new ones (including Wikipedia), see at https://www.mediawiki.org/wiki/Help:CirrusSearch.

Returns:

(Tree::Nodes<Page>) —
array of parsed pages.



171
172
173

# File 'lib/infoboxer/media_wiki.rb', line 171

def search(query)
  list(search: {search: query, limit: 50})
end

Class: Infoboxer::MediaWiki

Overview

Defined Under Namespace

Constant Summary collapse

Class Attribute Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(api_base_url, options = {}) ⇒ MediaWiki

Class Attribute Details

.user_agent ⇒ Object

Instance Attribute Details

#api_base_url ⇒ Object (readonly)

#traits ⇒ Object (readonly)

Instance Method Details

#category(title) ⇒ Tree::Nodes<Page>

#get(*titles) ⇒ Tree::Nodes<Page>

#get_h(*titles) ⇒ Hash<String, Page>

#inspect ⇒ Object

#prefixsearch(prefix) ⇒ Tree::Nodes<Page>

#raw(*titles) ⇒ Array<Hash>

#search(query) ⇒ Tree::Nodes<Page>

#initialize(api_base_url, options = {}) ⇒ `MediaWiki`

.user_agent ⇒ `Object`

#api_base_url ⇒ `Object` (readonly)

#traits ⇒ `Object` (readonly)

#category(title) ⇒ `Tree::Nodes<Page>`

#get(*titles) ⇒ `Tree::Nodes<Page>`

#get_h(*titles) ⇒ `Hash<String, Page>`

#inspect ⇒ `Object`

#prefixsearch(prefix) ⇒ `Tree::Nodes<Page>`

#raw(*titles) ⇒ `Array<Hash>`

#search(query) ⇒ `Tree::Nodes<Page>`