Class: Sunflower
- Inherits:
-
Object
- Object
- Sunflower
- Defined in:
- lib/sunflower/core.rb,
lib/sunflower/list.rb
Overview
Main class. To start working, you have to create new Sunflower:
s = Sunflower.new('en.wikipedia.org')
And then log in:
s.login('Username','password')
If you have ran setup, you can just use
s = Sunflower.new.login
Then you can request data from API using #API method.
To log data to file, use #log method (works like puts). Use RestClient.log=<io> to log all requests.
You can use multiple Sunflowers at once, to work on multiple wikis.
Defined Under Namespace
Constant Summary collapse
- VERSION =
'0.5.13'
- USER_AGENT =
"Sunflower #{VERSION} alpha <https://github.com/MatmaRex/Sunflower>"
- INVALID_CHARS =
%w(# < > [ ] | { })
- INVALID_CHARS_REGEX =
Regexp.union *INVALID_CHARS
- @@siteinfo =
Used by #initialize to cache siteinfo data.
{}
Instance Attribute Summary collapse
-
#always_do_code_cleanup ⇒ Object
Whether to run #code_cleanup when calling #save.
-
#api_endpoint ⇒ Object
readonly
The URL this Sunflower works on, as provided as argument to #initialize.
-
#log(message) ⇒ Object
Log message to a file named log.txt in current directory, if logging is enabled.
-
#siteinfo ⇒ Object
Siteinfo, as returned by API call.
-
#summary ⇒ Object
Summary used when saving edits with this Sunflower.
-
#username ⇒ Object
readonly
Username if logged in; nil otherwise.
-
#warnings ⇒ Object
writeonly
Whether to output warning messages (using Kernel#warn).
-
#wikiURL ⇒ Object
readonly
The URL this Sunflower works on, as provided as argument to #initialize.
Class Method Summary collapse
-
.path ⇒ Object
Path to user data file.
-
.read_userdata ⇒ Object
Returns array of [url, username, password], or nil if userdata is unavailable or invalid.
-
.resolve_wikimedia_id(id) ⇒ Object
Used by #initialize to convert short identifiers such as “b:pl” to domains such as “pl.wikibooks.org”.
Instance Method Summary collapse
-
#API(request) ⇒ Object
Call the API.
-
#API_continued(request, merge_on, xxcontinue, limit = nil) ⇒ Object
Call the API.
-
#cleanup_title(title, preserve_case = false, preserve_colon = false) ⇒ Object
Cleans up underscores, percent-encoding and title-casing in title (with optional anchor).
-
#initialize(url = nil, opts = {}) ⇒ Sunflower
constructor
Initialize a new Sunflower working on a wiki with given URL, for ex.
- #inspect ⇒ Object
-
#is_bot? ⇒ Boolean
Whether this user (if logged in) has bot rights.
- #log? ⇒ Boolean
-
#logged_in? ⇒ Boolean
Whether we are logged in.
-
#login(user = '', password = '') ⇒ Object
Log in using given info.
-
#make_list(type, key, opts = {}) ⇒ Object
Makes a list of articles.
-
#ns_canon_for(ns) ⇒ Object
Like #ns_local_for, but returns canonical (English) name.
-
#ns_local_for(ns) ⇒ Object
Returns the localized namespace name for ns, which may be namespace number, canonical name, or any namespace alias.
-
#ns_regex_for(ns) ⇒ Object
Returns a regular expression that will match given namespace.
-
#page(title) ⇒ Object
Returns a Sunflower::Page with the given title belonging to this Sunflower.
- #warnings? ⇒ Boolean
Constructor Details
#initialize(url = nil, opts = {}) ⇒ Sunflower
Initialize a new Sunflower working on a wiki with given URL, for ex. “pl.wikipedia.org”. url can also be a shorthand identifier such as “b:pl” - see Sunflower.resolve_wikimedia_id for details.
There is currently one option available:
-
api_endpoint: full URL to your api.php, if different than http://<url>/w/api.php (standard for WMF wikis)
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
# File 'lib/sunflower/core.rb', line 143 def initialize url=nil, opts={} if url.is_a? Hash url, opts = nil, url end if !url userdata = Sunflower.read_userdata() if userdata url = userdata[0] else raise Sunflower::Error, 'initialize: no URL supplied and no userdata found!' end end # find out the base URL for this wiki and its API endpoint # we joyfully assume that all URLs contain at least a single dot, which is incorrect, but oh well if url.include?('.') # a regular external wiki; use the RSD discovery mechanism to find out the endpoint @wikiURL = url # let's not pull in a HTML parsing library, this regex will do @api_endpoint = opts[:api_endpoint] || RestClient.get(@wikiURL).to_str[/<link rel="EditURI" type="application\/rsd\+xml" href="([^"]+)\?action=rsd"/, 1] else # probably a Wikimedia wiki shorthand @wikiURL = Sunflower.resolve_wikimedia_id(url) @api_endpoint = opts[:api_endpoint] || 'https://'+@wikiURL+'/w/api.php' end # handle protocol-relative URLs u = URI.parse(@api_endpoint) u.scheme ||= URI.parse(@wikiURL).scheme || 'http' @api_endpoint = u.to_s @warnings = true @log = false @loggedin = false @username = nil @is_bot = false @cookies = {} siprop = 'general|namespaces|namespacealiases|specialpagealiases|magicwords|interwikimap|dbrepllag|statistics|usergroups|extensions|fileextensions|rightsinfo|languages|skins|extensiontags|functionhooks|showhooks|variables' @@siteinfo[@api_endpoint] ||= self.API(action: 'query', meta: 'siteinfo', siprop: siprop)['query'] @siteinfo = @@siteinfo[@api_endpoint] _build_ns_map end |
Instance Attribute Details
#always_do_code_cleanup ⇒ Object
Whether to run #code_cleanup when calling #save.
69 70 71 |
# File 'lib/sunflower/core.rb', line 69 def always_do_code_cleanup @always_do_code_cleanup end |
#api_endpoint ⇒ Object (readonly)
The URL this Sunflower works on, as provided as argument to #initialize.
71 72 73 |
# File 'lib/sunflower/core.rb', line 71 def api_endpoint @api_endpoint end |
#log(message) ⇒ Object
Log message to a file named log.txt in current directory, if logging is enabled. See #log= / #log?.
347 348 349 |
# File 'lib/sunflower/core.rb', line 347 def log File.open('log.txt','a'){|f| f.puts } if @log end |
#siteinfo ⇒ Object
Siteinfo, as returned by API call.
73 74 75 |
# File 'lib/sunflower/core.rb', line 73 def siteinfo @siteinfo end |
#summary ⇒ Object
Summary used when saving edits with this Sunflower.
67 68 69 |
# File 'lib/sunflower/core.rb', line 67 def summary @summary end |
#username ⇒ Object (readonly)
Username if logged in; nil otherwise.
78 79 80 |
# File 'lib/sunflower/core.rb', line 78 def username @username end |
#warnings=(value) ⇒ Object (writeonly)
Whether to output warning messages (using Kernel#warn). Defaults to true.
84 85 86 |
# File 'lib/sunflower/core.rb', line 84 def warnings=(value) @warnings = value end |
#wikiURL ⇒ Object (readonly)
The URL this Sunflower works on, as provided as argument to #initialize.
71 72 73 |
# File 'lib/sunflower/core.rb', line 71 def wikiURL @wikiURL end |
Class Method Details
.path ⇒ Object
Path to user data file.
50 51 52 |
# File 'lib/sunflower/core.rb', line 50 def self.path File.join(ENV['HOME'], 'sunflower-userdata') end |
.read_userdata ⇒ Object
Returns array of [url, username, password], or nil if userdata is unavailable or invalid.
55 56 57 58 59 60 61 62 63 64 |
# File 'lib/sunflower/core.rb', line 55 def self.read_userdata data = nil data = File.read(Sunflower.path).split(/\r?\n/).map{|i| i.strip} rescue nil if data && data.length==3 && data.all?{|a| a and a != ''} return data else return nil end end |
.resolve_wikimedia_id(id) ⇒ Object
Used by #initialize to convert short identifiers such as “b:pl” to domains such as “pl.wikibooks.org”. Identifier is of the format “type:lang” or “lang:type” (see below for valid values).
Either or both parts can be ommitted; default type is “w”, default lang is “en”. (Since clashes are impossible, the colon can be ommitted in such cases as well.)
lang can be any valid language code. It is ignored for type == “meta” or “commons”.
Valid values for type are the same as used for inter-wiki links, that is:
- w
-
Wikipedia
- b
-
Wikibooks
- n
-
Wikinews
- q
-
Wikiquote
- s
-
Wikisource
- v
-
Wikiversity
- wikt
-
Wiktionary
- species
-
Wikispecies
- commons
-
Wikimedia Commons
- meta
-
Wikimedia Meta-Wiki
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
# File 'lib/sunflower/core.rb', line 110 def self.resolve_wikimedia_id id keys = id.split(':').select{|a| a and !a.empty? } raise ArgumentError, 'invalid format' if keys.length > 2 type_map = { 'b' => 'XX.wikibooks.org', 'q' => 'XX.wikiquote.org', 'n' => 'XX.wikinews.org', 'w' => 'XX.wikipedia.org', 'wikt' => 'XX.wiktionary.org', 'species' => 'XX.wikispecies.org', 'v' => 'XX.wikiversity.org', 's' => 'XX.wikisource.org', 'commons' => 'commons.wikimedia.org', 'meta' => 'meta.wikimedia.org', } types, langs = keys.partition{|a| type_map.keys.include? a } type = types.first || 'w' lang = langs.first || 'en' return type_map[type].sub 'XX', lang end |
Instance Method Details
#API(request) ⇒ Object
Call the API. Returns a hash of JSON response. Request can be a HTTP request string or a hash.
219 220 221 222 223 224 225 226 227 228 229 230 231 232 |
# File 'lib/sunflower/core.rb', line 219 def API request if request.is_a? String request += '&format=json' elsif request.is_a? Hash request = request.merge({format:'json'}) end resp = RestClient.post( @api_endpoint, request, {:user_agent => USER_AGENT, :cookies => @cookies} ) JSON.parse resp.to_str end |
#API_continued(request, merge_on, xxcontinue, limit = nil) ⇒ Object
Call the API. While more results are available via the xxcontinue parameter, call it again.
Assumes action=query.
By default returns an array of all API responses. Attempts to merge the responses into a response that would have been returned if the limit was infinite (merges the response hashes recursively using Hash#sunflower_recursive_merge!). merge_on is the key of response that contains the continuation data.
If limit given, will perform no more than this many API calls before returning. If limit is 1, behaves exactly like #API.
Example: get list of all pages linking to Main Page:
sunflower.API_continued "action=query&list=backlinks&bllimit=max&bltitle=Main_Page", 'backlinks', 'blcontinue'
249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 |
# File 'lib/sunflower/core.rb', line 249 def API_continued request, merge_on, xxcontinue, limit=nil out = [] # gather res = self.API(request) out << res while res['query-continue'] and (!limit || out.length < limit) api_endpoint = if request.is_a? String request + "&#{xxcontinue}=#{res["query-continue"][merge_on][xxcontinue]}" elsif request.is_a? Hash request.merge({xxcontinue => res["query-continue"][merge_on][xxcontinue]}) end res = self.API(api_endpoint) out << res end # merge merged = out[0] out.drop(1).each do |cur| merged.sunflower_recursive_merge! cur end return merged end |
#cleanup_title(title, preserve_case = false, preserve_colon = false) ⇒ Object
Cleans up underscores, percent-encoding and title-casing in title (with optional anchor).
352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 |
# File 'lib/sunflower/core.rb', line 352 def cleanup_title title, preserve_case=false, preserve_colon=false # strip unicode bidi junk title = title.gsub /[\u200e\u200f\u202a\u202b\u202c\u202d\u202e]/, '' # strip unicode spaces title = title.gsub /[\u00a0\u1680\u180e\u2000-\u200a\u2028\u2029\u202f\u205f\u3000]+/, ' ' return '' if title.strip == '' name, anchor = title.split '#', 2 # CGI.unescape also changes pluses to spaces; code borrowed from there unescape = lambda{|a| a.gsub(/((?:%[0-9a-fA-F]{2})+)/){ [$1.delete('%')].pack('H*').force_encoding($1.encoding) } } ns = nil name = unescape.call(name).gsub(/[ _]+/, ' ').strip anchor = unescape.call(anchor.gsub(/\.([0-9a-fA-F]{2})/, '%\1')).gsub(/[ _]+/, ' ').strip if anchor leading_colon = name[0]==':' name = name.sub(/^:\s*/, '') if leading_colon leading_colon = false if !preserve_colon # FIXME unicode? downcase, upcase if name.include? ':' maybe_ns, part_name = name.split ':', 2 if ns_id = @namespace_to_id[maybe_ns.strip.downcase] ns, name = @namespace_id_to_local[ns_id], part_name.strip end end name[0] = name[0].upcase if !preserve_case and @siteinfo["general"]["case"] == "first-letter" return [leading_colon ? ':' : nil, ns ? "#{ns}:" : nil, name, anchor ? "##{anchor}" : nil].join '' end |
#inspect ⇒ Object
192 193 194 |
# File 'lib/sunflower/core.rb', line 192 def inspect "#<Sunflower #{@loggedin ? @username : "[anon]"}@#{@wikiURL}#{@is_bot ? ' [bot]' : ''}>" end |
#is_bot? ⇒ Boolean
Whether this user (if logged in) has bot rights.
81 |
# File 'lib/sunflower/core.rb', line 81 def is_bot?; @is_bot; end |
#log? ⇒ Boolean
89 |
# File 'lib/sunflower/core.rb', line 89 def log?; @log; end |
#logged_in? ⇒ Boolean
Whether we are logged in.
76 |
# File 'lib/sunflower/core.rb', line 76 def logged_in?; @loggedin; end |
#login(user = '', password = '') ⇒ Object
Log in using given info.
281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 |
# File 'lib/sunflower/core.rb', line 281 def login user='', password='' if user=='' || password=='' userdata = Sunflower.read_userdata() if userdata user = userdata[1] if user=='' password = userdata[2] if password=='' else raise Sunflower::Error, 'login: no user/pass supplied and no userdata found!' end end raise Sunflower::Error, 'bad username!' if user =~ INVALID_CHARS_REGEX # 1. get the login token response = RestClient.post( @api_endpoint, "action=login&lgname=#{CGI.escape user}&lgpassword=#{CGI.escape password}&format=json", {:user_agent => USER_AGENT} ) @cookies = response. raise Sunflower::Error, 'unable to log in (no cookies received)!' if !@cookies or @cookies.empty? json = JSON.parse response.to_str token = json['login']['lgtoken'] || json['login']['token'] # 2. actually log in response = RestClient.post( @api_endpoint, "action=login&lgname=#{CGI.escape user}&lgpassword=#{CGI.escape password}&lgtoken=#{CGI.escape token}&format=json", {:user_agent => USER_AGENT, :cookies => @cookies} ) json = JSON.parse response.to_str @cookies = @cookies.merge(response.) raise Sunflower::Error, 'unable to log in (no cookies received)!' if !@cookies or @cookies.empty? # 3. confirm you did log in by checking the watchlist. @loggedin=true r=self.API('action=query&list=watchlistraw') if r['error'] && r['error']['code']=='wrnotloggedin' @loggedin=false raise Sunflower::Error, 'unable to log in!' end # set the username @username = user # 4. check bot rights r=self.API('action=query&list=allusers&aulimit=1&augroup=bot&aufrom='+(CGI.escape user)) unless r['query']['allusers'][0] && r['query']['allusers'][0]['name']==user warn 'Sunflower - this user does not have bot rights!' if @warnings @is_bot=false else @is_bot=true end return self end |
#make_list(type, key, opts = {}) ⇒ Object
Makes a list of articles. Returns array of titles.
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 |
# File 'lib/sunflower/list.rb', line 221 def make_list type, key, opts={} begin return Sunflower::List.new self, type, key, opts rescue Sunflower::Error => e if e. == "no such list type available: #{type}" backwards_compat = { :categorieson => :categories_on, :categoryrecursive => :category_recursive, :categoryr => :category_recursive, :linkson => :links_on, :templateson => :templates_on, :transclusionson => :templates_on, :usercontribs => :contribs, :whatlinksto => :whatlinkshere, :whattranscludes => :whatembeds, :imageusage => :image_usage, :image => :image_usage, :searchtitles => :search_titles, :external => :linksearch, :regex => :grep, :regexp => :grep, } if type2 = backwards_compat[type.to_s.downcase.gsub(/[^a-z]/, '').to_sym] warn "warning: #{type} has been renamed to #{type2}, old name will be removed in v0.6" Sunflower::List.new self, type2, key, opts else raise e end else raise e end end end |
#ns_canon_for(ns) ⇒ Object
Like #ns_local_for, but returns canonical (English) name.
400 401 402 403 404 405 406 407 |
# File 'lib/sunflower/core.rb', line 400 def ns_canon_for ns case ns when Numeric @namespace_id_to_canon[ns.to_i] when String @namespace_id_to_canon[ @namespace_to_id[cleanup_title(ns).downcase] ] end end |
#ns_local_for(ns) ⇒ Object
Returns the localized namespace name for ns, which may be namespace number, canonical name, or any namespace alias.
Returns nil if passed an invalid namespace.
390 391 392 393 394 395 396 397 |
# File 'lib/sunflower/core.rb', line 390 def ns_local_for ns case ns when Numeric @namespace_id_to_local[ns.to_i] when String @namespace_id_to_local[ @namespace_to_id[cleanup_title(ns).downcase] ] end end |
#ns_regex_for(ns) ⇒ Object
Returns a regular expression that will match given namespace. Rules for input like #ns_local_for.
Does NOT handle percent-encoding and underscores. Use #cleanup_title to canonicalize the namespace first.
412 413 414 415 416 417 |
# File 'lib/sunflower/core.rb', line 412 def ns_regex_for ns id = ns.is_a?(Numeric) ? ns.to_i : @namespace_to_id[cleanup_title(ns).downcase] return nil if !id /#{@namespace_to_id.to_a.select{|a| a[1] == id }.map{|a| Regexp.escape a[0] }.join '|' }/i end |
#page(title) ⇒ Object
Returns a Sunflower::Page with the given title belonging to this Sunflower.
276 277 278 |
# File 'lib/sunflower/core.rb', line 276 def page title Sunflower::Page.new title, self end |
#warnings? ⇒ Boolean
85 |
# File 'lib/sunflower/core.rb', line 85 def warnings?; @warnings; end |