Class: Infoboxer::MediaWiki
- Inherits:
-
Object
- Object
- Infoboxer::MediaWiki
- Defined in:
- lib/infoboxer/media_wiki.rb,
lib/infoboxer/media_wiki/page.rb,
lib/infoboxer/media_wiki/traits.rb
Overview
MediaWiki client class.
Usage:
client = Infoboxer::MediaWiki.new('http://en.wikipedia.org/w/api.php', user_agent: 'My Own Project')
page = client.get('Argentina')
Consider using shortcuts like #wiki, #wikipedia, #wp and so on instead of direct instation of this class (although you can if you want to!)
Defined Under Namespace
Constant Summary collapse
- UA =
Default Infoboxer User-Agent header.
You can set yours as an option to Infoboxer#wiki and its shortcuts, or to #initialize
"Infoboxer/#{Infoboxer::VERSION} (https://github.com/molybdenum-99/infoboxer; [email protected])"
Class Attribute Summary collapse
-
.user_agent ⇒ Object
User agent getter/setter.
Instance Attribute Summary collapse
-
#api_base_url ⇒ Object
readonly
Returns the value of attribute api_base_url.
-
#traits ⇒ Object
readonly
Returns the value of attribute traits.
Instance Method Summary collapse
-
#category(title) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages from specified category.
-
#get(*titles) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages for list of titles provided.
-
#get_h(*titles) ⇒ Hash<String, Page>
Same as #get, but returns hash of title => page.
-
#initialize(api_base_url, options = {}) ⇒ MediaWiki
constructor
Creating new MediaWiki client.
- #inspect ⇒ Object
-
#prefixsearch(prefix) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages with titles startin from prefix.
-
#raw(*titles) ⇒ Array<Hash>
Receive "raw" data from Wikipedia (without parsing or wrapping in classes).
-
#search(query) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages for provided search query.
Constructor Details
#initialize(api_base_url, options = {}) ⇒ MediaWiki
Creating new MediaWiki client. Infoboxer#wiki provides shortcut for it, as well as shortcuts for some well-known wikis, like Infoboxer#wikipedia.
52 53 54 55 56 |
# File 'lib/infoboxer/media_wiki.rb', line 52 def initialize(api_base_url, = {}) @api_base_url = Addressable::URI.parse(api_base_url) @client = MediaWiktory::Client.new(api_base_url, user_agent: user_agent()) @traits = Traits.get(@api_base_url.host, namespaces: extract_namespaces) end |
Class Attribute Details
.user_agent ⇒ Object
User agent getter/setter.
Default value is UA.
You can also use per-instance option, see #initialize
38 39 40 |
# File 'lib/infoboxer/media_wiki.rb', line 38 def user_agent @user_agent end |
Instance Attribute Details
#api_base_url ⇒ Object (readonly)
Returns the value of attribute api_base_url.
41 42 43 |
# File 'lib/infoboxer/media_wiki.rb', line 41 def api_base_url @api_base_url end |
#traits ⇒ Object (readonly)
Returns the value of attribute traits.
41 42 43 |
# File 'lib/infoboxer/media_wiki.rb', line 41 def traits @traits end |
Instance Method Details
#category(title) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages from specified category.
NB: currently, this API always fetches all pages from category, there is no option to "take first 20 pages". Pages are fetched in 50-page batches, then parsed. So, for large category it can really take a while to fetch all pages.
149 150 151 152 153 |
# File 'lib/infoboxer/media_wiki.rb', line 149 def category(title) title = normalize_category_title(title) list(categorymembers: {title: title, limit: 50}) end |
#get(*titles) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages for list of titles provided. All pages are received with single query to MediaWiki API.
NB: if you are requesting more than 50 titles at once
(MediaWiki limitation for single request), Infoboxer will do as
many queries as necessary to extract them all (it will be like
(titles.count / 50.0).ceil
requests)
102 103 104 105 106 107 108 109 110 111 112 |
# File 'lib/infoboxer/media_wiki.rb', line 102 def get(*titles) pages = raw(*titles). tap{|pages| pages.detect(&:invalid?).tap{|i| i && fail(i.raw.invalidreason)}}. select(&:exists?). map{|raw| Page.new(self, Parser.paragraphs(raw.content, traits), raw) } titles.count == 1 ? pages.first : Tree::Nodes[*pages] end |
#get_h(*titles) ⇒ Hash<String, Page>
Same as #get, but returns hash of title => page.
Useful quirks:
- when requested page not existing, key will be still present in
resulting hash (value will be
nil
); - when requested page redirects to another, key will still be the
requested title. For ex.,
get_h('Einstein')
will return hash with key 'Einstein' and page titled 'Albert Einstein'.
This allows you to be in full control of what pages of large list you've received.
128 129 130 131 132 133 |
# File 'lib/infoboxer/media_wiki.rb', line 128 def get_h(*titles) pages = [*get(*titles)] titles.map{|t| [t, pages.detect{|p| p.source.alt_titles.map(&:downcase).include?(t.downcase)}] }.to_h end |
#inspect ⇒ Object
192 193 194 |
# File 'lib/infoboxer/media_wiki.rb', line 192 def inspect "#<#{self.class}(#{@api_base_url.host})>" end |
#prefixsearch(prefix) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages with titles startin from prefix. See MediaWiki API docs for details.
NB: currently, this API always fetches all pages from category, there is no option to "take first 20 pages". Pages are fetched in 50-page batches, then parsed. So, for large category it can really take a while to fetch all pages.
188 189 190 |
# File 'lib/infoboxer/media_wiki.rb', line 188 def prefixsearch(prefix) list(prefixsearch: {search: prefix, limit: 100}) end |
#raw(*titles) ⇒ Array<Hash>
Receive "raw" data from Wikipedia (without parsing or wrapping in classes).
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
# File 'lib/infoboxer/media_wiki.rb', line 62 def raw(*titles) return [] if titles.empty? # could emerge on "automatically" created page lists, should work titles.each_slice(50).map{|part| @client.query. titles(*part). prop(revisions: {prop: :content}, info: {prop: :url}). redirects(true). # FIXME: should be done transparently by MediaWiktory? perform.pages }.inject(:concat). # somehow flatten(1) fails! sort_by{|page| res_title = page.alt_titles.detect{|t| titles.map(&:downcase).include?(t.downcase)} # FIXME?.. titles.index(res_title) || 1_000 } end |
#search(query) ⇒ Tree::Nodes<Page>
Receive list of parsed MediaWiki pages for provided search query. See MediaWiki API docs for details.
NB: currently, this API always fetches all pages from category, there is no option to "take first 20 pages". Pages are fetched in 50-page batches, then parsed. So, for large category it can really take a while to fetch all pages.
171 172 173 |
# File 'lib/infoboxer/media_wiki.rb', line 171 def search(query) list(search: {search: query, limit: 50}) end |