Class: Infoboxer::MediaWiki
- Inherits:
-
Object
- Object
- Infoboxer::MediaWiki
- Defined in:
- lib/infoboxer/media_wiki.rb,
lib/infoboxer/media_wiki/page.rb,
lib/infoboxer/media_wiki/traits.rb
Overview
MediaWiki client class.
Usage:
client = Infoboxer::MediaWiki
.new('http://en.wikipedia.org/w/api.php', user_agent: 'My Own Project')
page = client.get('Argentina')
Consider using shortcuts like #wiki, #wikipedia, #wp and so on instead of direct instation of this class (although you can if you want to!)
Defined Under Namespace
Constant Summary collapse
- UA =
Default Infoboxer User-Agent header.
You can set yours as an option to Infoboxer#wiki and its shortcuts, or to #initialize
"Infoboxer/#{Infoboxer::VERSION} "\ '(https://github.com/molybdenum-99/infoboxer; [email protected])'.freeze
Class Attribute Summary collapse
-
.user_agent ⇒ String
User agent getter/setter.
Instance Attribute Summary collapse
Instance Method Summary collapse
-
#category(title, limit: 'max', &processor) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages from specified category.
-
#get(*titles, interwiki: nil, &processor) ⇒ Page, Tree::Nodes<Page>
Receive list of parsed MediaWiki pages for list of titles provided.
-
#get_h(*titles, &processor) ⇒ Hash<String, Page>
Same as #get, but returns hash of
{requested title => page}
. -
#initialize(api_base_url, ua: nil, user_agent: ua) ⇒ MediaWiki
constructor
Creating new MediaWiki client.
- #inspect ⇒ String
-
#prefixsearch(prefix, limit: 'max', &processor) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages with titles startin from prefix.
-
#raw(*titles, &processor) ⇒ Hash{String => Hash}
Receive "raw" data from Wikipedia (without parsing or wrapping in classes).
-
#search(query, limit: 'max', &processor) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages for provided search query.
Constructor Details
#initialize(api_base_url, ua: nil, user_agent: ua) ⇒ MediaWiki
Creating new MediaWiki client. Infoboxer#wiki provides shortcut for it, as well as shortcuts for some well-known wikis, like Infoboxer#wikipedia.
55 56 57 58 59 |
# File 'lib/infoboxer/media_wiki.rb', line 55 def initialize(api_base_url, ua: nil, user_agent: ua) @api_base_url = Addressable::URI.parse(api_base_url) @api = MediaWiktory::Wikipedia::Api.new(api_base_url, user_agent: user_agent(user_agent)) @traits = Traits.get(@api_base_url.host, siteinfo) end |
Class Attribute Details
.user_agent ⇒ String
User agent getter/setter.
Default value is UA.
You can also use per-instance option, see #initialize
38 39 40 |
# File 'lib/infoboxer/media_wiki.rb', line 38 def user_agent @user_agent end |
Instance Attribute Details
#api ⇒ MediaWiktory::Wikipedia::Client (readonly)
45 46 47 |
# File 'lib/infoboxer/media_wiki.rb', line 45 def api @api end |
Instance Method Details
#category(title, limit: 'max', &processor) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages from specified category.
176 177 178 179 180 |
# File 'lib/infoboxer/media_wiki.rb', line 176 def category(title, limit: 'max', &processor) title = normalize_category_title(title) list(@api.query.generator(:categorymembers).title(title), limit, &processor) end |
#get(*titles, interwiki: nil, &processor) ⇒ Page, Tree::Nodes<Page>
Receive list of parsed MediaWiki pages for list of titles provided. All pages are received with single query to MediaWiki API.
NB: if you are requesting more than 50 titles at once
(MediaWiki limitation for single request), Infoboxer will do as
many queries as necessary to extract them all (it will be like
(titles.count / 50.0).ceil
requests)
128 129 130 131 132 133 |
# File 'lib/infoboxer/media_wiki.rb', line 128 def get(*titles, interwiki: nil, &processor) return interwikis(interwiki).get(*titles, &processor) if interwiki pages = get_h(*titles, &processor).values.compact titles.count == 1 ? pages.first : Tree::Nodes[*pages] end |
#get_h(*titles, &processor) ⇒ Hash<String, Page>
Same as #get, but returns hash of {requested title => page}
.
Useful quirks:
- when requested page not existing, key will be still present in
resulting hash (value will be
nil
); - when requested page redirects to another, key will still be the
requested title. For ex.,
get_h('Einstein')
will return hash with key 'Einstein' and page titled 'Albert Einstein'.
This allows you to be in full control of what pages of large list you've received.
155 156 157 158 159 160 |
# File 'lib/infoboxer/media_wiki.rb', line 155 def get_h(*titles, &processor) raw_pages = raw(*titles, &processor) .tap { |ps| ps.detect { |_, p| p['invalid'] }.tap { |_, i| i && fail(i['invalidreason']) } } .reject { |_, p| p.key?('missing') } titles.map { |title| [title, make_page(raw_pages, title)] }.to_h end |
#inspect ⇒ String
220 221 222 |
# File 'lib/infoboxer/media_wiki.rb', line 220 def inspect "#<#{self.class}(#{@api_base_url.host})>" end |
#prefixsearch(prefix, limit: 'max', &processor) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages with titles startin from prefix. See MediaWiki API docs for details.
215 216 217 |
# File 'lib/infoboxer/media_wiki.rb', line 215 def prefixsearch(prefix, limit: 'max', &processor) list(@api.query.generator(:prefixsearch).search(prefix), limit, &processor) end |
#raw(*titles, &processor) ⇒ Hash{String => Hash}
Receive "raw" data from Wikipedia (without parsing or wrapping in classes).
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
# File 'lib/infoboxer/media_wiki.rb', line 73 def raw(*titles, &processor) # could emerge on "automatically" created page lists, should work return {} if titles.empty? titles.each_slice(50).map do |part| request = prepare_request(@api.query.titles(*part), &processor) response = request.response # If additional props are required, there may be additional pages, even despite each_slice(50) response = response.continue while response.continue? sources = response['pages'].values.map { |page| [page['title'], page] }.to_h redirects = if response['redirects'] response['redirects'].map { |r| [r['from'], sources[r['to']]] }.to_h else {} end # This way for 'Einstein' query we'll have {'Albert Einstein' => page, 'Einstein' => same page} sources.merge(redirects) end.inject(:merge) end |
#search(query, limit: 'max', &processor) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages for provided search query. See MediaWiki API docs for details.
198 199 200 |
# File 'lib/infoboxer/media_wiki.rb', line 198 def search(query, limit: 'max', &processor) list(@api.query.generator(:search).search(query), limit, &processor) end |