Class: Sunflower
- Inherits:
-
Object
- Object
- Sunflower
- Defined in:
- lib/sunflower/core.rb,
lib/sunflower/list.rb
Overview
Main class. To start working, you have to create new Sunflower:
s = Sunflower.new('en.wikipedia.org')
And then log in:
s.login('Username','password')
If you have ran setup, you can just use
s = Sunflower.new.login
Then you can request data from API using #API method.
To log data to file, use #log method (works like puts). Use RestClient.log=<io> to log all requests.
You can use multiple Sunflowers at once, to work on multiple wikis.
Defined Under Namespace
Constant Summary collapse
- VERSION =
'0.5.11'- INVALID_CHARS =
%w(# < > [ ] | { })
- INVALID_CHARS_REGEX =
Regexp.union *INVALID_CHARS
- @@siteinfo =
Used by #initialize to cache siteinfo data.
{}
Instance Attribute Summary collapse
-
#always_do_code_cleanup ⇒ Object
Whether to run #code_cleanup when calling #save.
-
#api_endpoint ⇒ Object
readonly
The URL this Sunflower works on, as provided as argument to #initialize.
-
#log(message) ⇒ Object
Log message to a file named log.txt in current directory, if logging is enabled.
-
#siteinfo ⇒ Object
Siteinfo, as returned by API call.
-
#summary ⇒ Object
Summary used when saving edits with this Sunflower.
-
#username ⇒ Object
readonly
Username if logged in; nil otherwise.
-
#warnings ⇒ Object
writeonly
Whether to output warning messages (using Kernel#warn).
-
#wikiURL ⇒ Object
readonly
The URL this Sunflower works on, as provided as argument to #initialize.
Class Method Summary collapse
-
.path ⇒ Object
Path to user data file.
-
.read_userdata ⇒ Object
Returns array of [url, username, password], or nil if userdata is unavailable or invalid.
-
.resolve_wikimedia_id(id) ⇒ Object
Used by #initialize to convert short identifiers such as “b:pl” to domains such as “pl.wikibooks.org”.
Instance Method Summary collapse
-
#API(request) ⇒ Object
Call the API.
-
#API_continued(request, merge_on, xxcontinue, limit = nil) ⇒ Object
Call the API.
-
#cleanup_title(title, preserve_case = false, preserve_colon = false) ⇒ Object
Cleans up underscores, percent-encoding and title-casing in title (with optional anchor).
-
#initialize(url = nil, opts = {}) ⇒ Sunflower
constructor
Initialize a new Sunflower working on a wiki with given URL, for ex.
- #inspect ⇒ Object
-
#is_bot? ⇒ Boolean
Whether this user (if logged in) has bot rights.
- #log? ⇒ Boolean
-
#logged_in? ⇒ Boolean
Whether we are logged in.
-
#login(user = '', password = '') ⇒ Object
Log in using given info.
-
#make_list(type, key, opts = {}) ⇒ Object
Makes a list of articles.
-
#ns_canon_for(ns) ⇒ Object
Like #ns_local_for, but returns canonical (English) name.
-
#ns_local_for(ns) ⇒ Object
Returns the localized namespace name for ns, which may be namespace number, canonical name, or any namespace alias.
-
#ns_regex_for(ns) ⇒ Object
Returns a regular expression that will match given namespace.
-
#page(title) ⇒ Object
Returns a Sunflower::Page with the given title belonging to this Sunflower.
- #warnings? ⇒ Boolean
Constructor Details
#initialize(url = nil, opts = {}) ⇒ Sunflower
Initialize a new Sunflower working on a wiki with given URL, for ex. “pl.wikipedia.org”. url can also be a shorthand identifier such as “b:pl” - see Sunflower.resolve_wikimedia_id for details.
There is currently one option available:
-
api_endpoint: full URL to your api.php, if different than http://<url>/w/api.php (standard for WMF wikis)
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
# File 'lib/sunflower/core.rb', line 142 def initialize url=nil, opts={} if url.is_a? Hash url, opts = nil, url end if !url userdata = Sunflower.read_userdata() if userdata url = userdata[0] else raise Sunflower::Error, 'initialize: no URL supplied and no userdata found!' end end # find out the base URL for this wiki and its API endpoint # we joyfully assume that all URLs contain at least a single dot, which is incorrect, but oh well if url.include?('.') # a regular external wiki; use the RSD discovery mechanism to find out the endpoint @wikiURL = url # let's not pull in a HTML parsing library, this regex will do @api_endpoint = opts[:api_endpoint] || RestClient.get(@wikiURL).to_str[/<link rel="EditURI" type="application\/rsd\+xml" href="([^"]+)\?action=rsd"/, 1] else # probably a Wikimedia wiki shorthand @wikiURL = Sunflower.resolve_wikimedia_id(url) @api_endpoint = opts[:api_endpoint] || 'http://'+@wikiURL+'/w/api.php' end # handle protocol-relative URLs u = URI.parse(@api_endpoint) u.scheme ||= URI.parse(@wikiURL).scheme || 'http' @api_endpoint = u.to_s @warnings = true @log = false @loggedin = false @username = nil @is_bot = false @cookies = {} siprop = 'general|namespaces|namespacealiases|specialpagealiases|magicwords|interwikimap|dbrepllag|statistics|usergroups|extensions|fileextensions|rightsinfo|languages|skins|extensiontags|functionhooks|showhooks|variables' @@siteinfo[@api_endpoint] ||= self.API(action: 'query', meta: 'siteinfo', siprop: siprop)['query'] @siteinfo = @@siteinfo[@api_endpoint] _build_ns_map end |
Instance Attribute Details
#always_do_code_cleanup ⇒ Object
Whether to run #code_cleanup when calling #save.
68 69 70 |
# File 'lib/sunflower/core.rb', line 68 def always_do_code_cleanup @always_do_code_cleanup end |
#api_endpoint ⇒ Object (readonly)
The URL this Sunflower works on, as provided as argument to #initialize.
70 71 72 |
# File 'lib/sunflower/core.rb', line 70 def api_endpoint @api_endpoint end |
#log(message) ⇒ Object
Log message to a file named log.txt in current directory, if logging is enabled. See #log= / #log?.
346 347 348 |
# File 'lib/sunflower/core.rb', line 346 def log File.open('log.txt','a'){|f| f.puts } if @log end |
#siteinfo ⇒ Object
Siteinfo, as returned by API call.
72 73 74 |
# File 'lib/sunflower/core.rb', line 72 def siteinfo @siteinfo end |
#summary ⇒ Object
Summary used when saving edits with this Sunflower.
66 67 68 |
# File 'lib/sunflower/core.rb', line 66 def summary @summary end |
#username ⇒ Object (readonly)
Username if logged in; nil otherwise.
77 78 79 |
# File 'lib/sunflower/core.rb', line 77 def username @username end |
#warnings=(value) ⇒ Object (writeonly)
Whether to output warning messages (using Kernel#warn). Defaults to true.
83 84 85 |
# File 'lib/sunflower/core.rb', line 83 def warnings=(value) @warnings = value end |
#wikiURL ⇒ Object (readonly)
The URL this Sunflower works on, as provided as argument to #initialize.
70 71 72 |
# File 'lib/sunflower/core.rb', line 70 def wikiURL @wikiURL end |
Class Method Details
.path ⇒ Object
Path to user data file.
49 50 51 |
# File 'lib/sunflower/core.rb', line 49 def self.path File.join(ENV['HOME'], 'sunflower-userdata') end |
.read_userdata ⇒ Object
Returns array of [url, username, password], or nil if userdata is unavailable or invalid.
54 55 56 57 58 59 60 61 62 63 |
# File 'lib/sunflower/core.rb', line 54 def self.read_userdata data = nil data = File.read(Sunflower.path).split(/\r?\n/).map{|i| i.strip} rescue nil if data && data.length==3 && data.all?{|a| a and a != ''} return data else return nil end end |
.resolve_wikimedia_id(id) ⇒ Object
Used by #initialize to convert short identifiers such as “b:pl” to domains such as “pl.wikibooks.org”. Identifier is of the format “type:lang” or “lang:type” (see below for valid values).
Either or both parts can be ommitted; default type is “w”, default lang is “en”. (Since clashes are impossible, the colon can be ommitted in such cases as well.)
lang can be any valid language code. It is ignored for type == “meta” or “commons”.
Valid values for type are the same as used for inter-wiki links, that is:
- w
-
Wikipedia
- b
-
Wikibooks
- n
-
Wikinews
- q
-
Wikiquote
- s
-
Wikisource
- v
-
Wikiversity
- wikt
-
Wiktionary
- species
-
Wikispecies
- commons
-
Wikimedia Commons
- meta
-
Wikimedia Meta-Wiki
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
# File 'lib/sunflower/core.rb', line 109 def self.resolve_wikimedia_id id keys = id.split(':').select{|a| a and !a.empty? } raise ArgumentError, 'invalid format' if keys.length > 2 type_map = { 'b' => 'XX.wikibooks.org', 'q' => 'XX.wikiquote.org', 'n' => 'XX.wikinews.org', 'w' => 'XX.wikipedia.org', 'wikt' => 'XX.wiktionary.org', 'species' => 'XX.wikispecies.org', 'v' => 'XX.wikiversity.org', 's' => 'XX.wikisource.org', 'commons' => 'commons.wikimedia.org', 'meta' => 'meta.wikimedia.org', } types, langs = keys.partition{|a| type_map.keys.include? a } type = types.first || 'w' lang = langs.first || 'en' return type_map[type].sub 'XX', lang end |
Instance Method Details
#API(request) ⇒ Object
Call the API. Returns a hash of JSON response. Request can be a HTTP request string or a hash.
218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
# File 'lib/sunflower/core.rb', line 218 def API request if request.is_a? String request += '&format=json' elsif request.is_a? Hash request = request.merge({format:'json'}) end resp = RestClient.post( @api_endpoint, request, {:user_agent => "Sunflower #{VERSION} alpha", :cookies => @cookies} ) JSON.parse resp.to_str end |
#API_continued(request, merge_on, xxcontinue, limit = nil) ⇒ Object
Call the API. While more results are available via the xxcontinue parameter, call it again.
Assumes action=query.
By default returns an array of all API responses. Attempts to merge the responses into a response that would have been returned if the limit was infinite (merges the response hashes recursively using Hash#sunflower_recursive_merge!). merge_on is the key of response that contains the continuation data.
If limit given, will perform no more than this many API calls before returning. If limit is 1, behaves exactly like #API.
Example: get list of all pages linking to Main Page:
sunflower.API_continued "action=query&list=backlinks&bllimit=max&bltitle=Main_Page", 'backlinks', 'blcontinue'
248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 |
# File 'lib/sunflower/core.rb', line 248 def API_continued request, merge_on, xxcontinue, limit=nil out = [] # gather res = self.API(request) out << res while res['query-continue'] and (!limit || out.length < limit) api_endpoint = if request.is_a? String request + "&#{xxcontinue}=#{res["query-continue"][merge_on][xxcontinue]}" elsif request.is_a? Hash request.merge({xxcontinue => res["query-continue"][merge_on][xxcontinue]}) end res = self.API(api_endpoint) out << res end # merge merged = out[0] out.drop(1).each do |cur| merged.sunflower_recursive_merge! cur end return merged end |
#cleanup_title(title, preserve_case = false, preserve_colon = false) ⇒ Object
Cleans up underscores, percent-encoding and title-casing in title (with optional anchor).
351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 |
# File 'lib/sunflower/core.rb', line 351 def cleanup_title title, preserve_case=false, preserve_colon=false # strip unicode bidi junk title = title.gsub /[\u200e\u200f\u202a\u202b\u202c\u202d\u202e]/, '' # strip unicode spaces title = title.gsub /[\u00a0\u1680\u180e\u2000-\u200a\u2028\u2029\u202f\u205f\u3000]+/, ' ' return '' if title.strip == '' name, anchor = title.split '#', 2 # CGI.unescape also changes pluses to spaces; code borrowed from there unescape = lambda{|a| a.gsub(/((?:%[0-9a-fA-F]{2})+)/){ [$1.delete('%')].pack('H*').force_encoding($1.encoding) } } ns = nil name = unescape.call(name).gsub(/[ _]+/, ' ').strip anchor = unescape.call(anchor.gsub(/\.([0-9a-fA-F]{2})/, '%\1')).gsub(/[ _]+/, ' ').strip if anchor leading_colon = name[0]==':' name = name.sub(/^:\s*/, '') if leading_colon leading_colon = false if !preserve_colon # FIXME unicode? downcase, upcase if name.include? ':' maybe_ns, part_name = name.split ':', 2 if ns_id = @namespace_to_id[maybe_ns.strip.downcase] ns, name = @namespace_id_to_local[ns_id], part_name.strip end end name[0] = name[0].upcase if !preserve_case and @siteinfo["general"]["case"] == "first-letter" return [leading_colon ? ':' : nil, ns ? "#{ns}:" : nil, name, anchor ? "##{anchor}" : nil].join '' end |
#inspect ⇒ Object
191 192 193 |
# File 'lib/sunflower/core.rb', line 191 def inspect "#<Sunflower #{@loggedin ? @username : "[anon]"}@#{@wikiURL}#{@is_bot ? ' [bot]' : ''}>" end |
#is_bot? ⇒ Boolean
Whether this user (if logged in) has bot rights.
80 |
# File 'lib/sunflower/core.rb', line 80 def is_bot?; @is_bot; end |
#log? ⇒ Boolean
88 |
# File 'lib/sunflower/core.rb', line 88 def log?; @log; end |
#logged_in? ⇒ Boolean
Whether we are logged in.
75 |
# File 'lib/sunflower/core.rb', line 75 def logged_in?; @loggedin; end |
#login(user = '', password = '') ⇒ Object
Log in using given info.
280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 |
# File 'lib/sunflower/core.rb', line 280 def login user='', password='' if user=='' || password=='' userdata = Sunflower.read_userdata() if userdata user = userdata[1] if user=='' password = userdata[2] if password=='' else raise Sunflower::Error, 'login: no user/pass supplied and no userdata found!' end end raise Sunflower::Error, 'bad username!' if user =~ INVALID_CHARS_REGEX # 1. get the login token response = RestClient.post( @api_endpoint, "action=login&lgname=#{CGI.escape user}&lgpassword=#{CGI.escape password}&format=json", {:user_agent => "Sunflower #{VERSION} alpha"} ) @cookies = response. raise Sunflower::Error, 'unable to log in (no cookies received)!' if !@cookies or @cookies.empty? json = JSON.parse response.to_str token, prefix = (json['login']['lgtoken']||json['login']['token']), json['login']['cookieprefix'] # 2. actually log in response = RestClient.post( @api_endpoint, "action=login&lgname=#{CGI.escape user}&lgpassword=#{CGI.escape password}&lgtoken=#{token}&format=json", {:user_agent => "Sunflower #{VERSION} alpha", :cookies => @cookies} ) json = JSON.parse response.to_str @cookies = @cookies.merge(response.) raise Sunflower::Error, 'unable to log in (no cookies received)!' if !@cookies or @cookies.empty? # 3. confirm you did log in by checking the watchlist. @loggedin=true r=self.API('action=query&list=watchlistraw') if r['error'] && r['error']['code']=='wrnotloggedin' @loggedin=false raise Sunflower::Error, 'unable to log in!' end # set the username @username = user # 4. check bot rights r=self.API('action=query&list=allusers&aulimit=1&augroup=bot&aufrom='+(CGI.escape user)) unless r['query']['allusers'][0] && r['query']['allusers'][0]['name']==user warn 'Sunflower - this user does not have bot rights!' if @warnings @is_bot=false else @is_bot=true end return self end |
#make_list(type, key, opts = {}) ⇒ Object
Makes a list of articles. Returns array of titles.
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 |
# File 'lib/sunflower/list.rb', line 221 def make_list type, key, opts={} begin return Sunflower::List.new self, type, key, opts rescue Sunflower::Error => e if e. == "no such list type available: #{type}" backwards_compat = { :categorieson => :categories_on, :categoryrecursive => :category_recursive, :categoryr => :category_recursive, :linkson => :links_on, :templateson => :templates_on, :transclusionson => :templates_on, :usercontribs => :contribs, :whatlinksto => :whatlinkshere, :whattranscludes => :whatembeds, :imageusage => :image_usage, :image => :image_usage, :searchtitles => :search_titles, :external => :linksearch, :regex => :grep, :regexp => :grep, } if type2 = backwards_compat[type.to_s.downcase.gsub(/[^a-z]/, '').to_sym] warn "warning: #{type} has been renamed to #{type2}, old name will be removed in v0.6" Sunflower::List.new self, type2, key, opts else raise e end else raise e end end end |
#ns_canon_for(ns) ⇒ Object
Like #ns_local_for, but returns canonical (English) name.
399 400 401 402 403 404 405 406 |
# File 'lib/sunflower/core.rb', line 399 def ns_canon_for ns case ns when Numeric @namespace_id_to_canon[ns.to_i] when String @namespace_id_to_canon[ @namespace_to_id[cleanup_title(ns).downcase] ] end end |
#ns_local_for(ns) ⇒ Object
Returns the localized namespace name for ns, which may be namespace number, canonical name, or any namespace alias.
Returns nil if passed an invalid namespace.
389 390 391 392 393 394 395 396 |
# File 'lib/sunflower/core.rb', line 389 def ns_local_for ns case ns when Numeric @namespace_id_to_local[ns.to_i] when String @namespace_id_to_local[ @namespace_to_id[cleanup_title(ns).downcase] ] end end |
#ns_regex_for(ns) ⇒ Object
Returns a regular expression that will match given namespace. Rules for input like #ns_local_for.
Does NOT handle percent-encoding and underscores. Use #cleanup_title to canonicalize the namespace first.
411 412 413 414 415 416 |
# File 'lib/sunflower/core.rb', line 411 def ns_regex_for ns id = ns.is_a?(Numeric) ? ns.to_i : @namespace_to_id[cleanup_title(ns).downcase] return nil if !id /#{@namespace_to_id.to_a.select{|a| a[1] == id }.map{|a| Regexp.escape a[0] }.join '|' }/i end |
#page(title) ⇒ Object
Returns a Sunflower::Page with the given title belonging to this Sunflower.
275 276 277 |
# File 'lib/sunflower/core.rb', line 275 def page title Sunflower::Page.new title, self end |
#warnings? ⇒ Boolean
84 |
# File 'lib/sunflower/core.rb', line 84 def warnings?; @warnings; end |