Class: Sunflower

Inherits:
Object
  • Object
show all
Defined in:
lib/sunflower/core.rb,
lib/sunflower/list.rb

Overview

Main class. To start working, you have to create new Sunflower:

s = Sunflower.new('en.wikipedia.org')

And then log in:

s.('Username','password')

If you have ran setup, you can just use

s = Sunflower.new.

Then you can request data from API using #API method.

To log data to file, use #log method (works like puts). Use RestClient.log=<io> to log all requests.

You can use multiple Sunflowers at once, to work on multiple wikis.

Defined Under Namespace

Classes: Error, List, Page

Constant Summary collapse

VERSION =
'0.5.13'
USER_AGENT =
"Sunflower #{VERSION} alpha <https://github.com/MatmaRex/Sunflower>"
INVALID_CHARS =
%w(# < > [ ] | { })
INVALID_CHARS_REGEX =
Regexp.union *INVALID_CHARS
@@siteinfo =

Used by #initialize to cache siteinfo data.

{}

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(url = nil, opts = {}) ⇒ Sunflower

Initialize a new Sunflower working on a wiki with given URL, for ex. “pl.wikipedia.org”. url can also be a shorthand identifier such as “b:pl” - see Sunflower.resolve_wikimedia_id for details.

There is currently one option available:

  • api_endpoint: full URL to your api.php, if different than http://<url>/w/api.php (standard for WMF wikis)



143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
# File 'lib/sunflower/core.rb', line 143

def initialize url=nil, opts={}
	if url.is_a? Hash
		url, opts = nil, url
	end
	
	if !url
		userdata = Sunflower.read_userdata()
		
		if userdata
			url = userdata[0]
		else
			raise Sunflower::Error, 'initialize: no URL supplied and no userdata found!'
		end
	end
	
	# find out the base URL for this wiki and its API endpoint
	# we joyfully assume that all URLs contain at least a single dot, which is incorrect, but oh well
	if url.include?('.')
		# a regular external wiki; use the RSD discovery mechanism to find out the endpoint
		@wikiURL = url
		# let's not pull in a HTML parsing library, this regex will do
		@api_endpoint = opts[:api_endpoint] || RestClient.get(@wikiURL).to_str[/<link rel="EditURI" type="application\/rsd\+xml" href="([^"]+)\?action=rsd"/, 1]
	else
		# probably a Wikimedia wiki shorthand
		@wikiURL = Sunflower.resolve_wikimedia_id(url)
		@api_endpoint = opts[:api_endpoint] || 'https://'+@wikiURL+'/w/api.php'
	end
	
	# handle protocol-relative URLs
	u = URI.parse(@api_endpoint)
	u.scheme ||= URI.parse(@wikiURL).scheme || 'http'
	@api_endpoint = u.to_s
	
	@warnings = true
	@log = false
	
	@loggedin = false
	@username = nil
	@is_bot = false
	
	@cookies = {}
	
	siprop = 'general|namespaces|namespacealiases|specialpagealiases|magicwords|interwikimap|dbrepllag|statistics|usergroups|extensions|fileextensions|rightsinfo|languages|skins|extensiontags|functionhooks|showhooks|variables'
	@@siteinfo[@api_endpoint] ||= self.API(action: 'query', meta: 'siteinfo', siprop: siprop)['query']
	@siteinfo = @@siteinfo[@api_endpoint]
	
	_build_ns_map
end

Instance Attribute Details

#always_do_code_cleanupObject

Whether to run #code_cleanup when calling #save.



69
70
71
# File 'lib/sunflower/core.rb', line 69

def always_do_code_cleanup
  @always_do_code_cleanup
end

#api_endpointObject (readonly)

The URL this Sunflower works on, as provided as argument to #initialize.



71
72
73
# File 'lib/sunflower/core.rb', line 71

def api_endpoint
  @api_endpoint
end

#log(message) ⇒ Object

Log message to a file named log.txt in current directory, if logging is enabled. See #log= / #log?.



347
348
349
# File 'lib/sunflower/core.rb', line 347

def log message
	File.open('log.txt','a'){|f| f.puts message} if @log
end

#siteinfoObject

Siteinfo, as returned by API call.



73
74
75
# File 'lib/sunflower/core.rb', line 73

def siteinfo
  @siteinfo
end

#summaryObject

Summary used when saving edits with this Sunflower.



67
68
69
# File 'lib/sunflower/core.rb', line 67

def summary
  @summary
end

#usernameObject (readonly)

Username if logged in; nil otherwise.



78
79
80
# File 'lib/sunflower/core.rb', line 78

def username
  @username
end

#warnings=(value) ⇒ Object (writeonly)

Whether to output warning messages (using Kernel#warn). Defaults to true.



84
85
86
# File 'lib/sunflower/core.rb', line 84

def warnings=(value)
  @warnings = value
end

#wikiURLObject (readonly)

The URL this Sunflower works on, as provided as argument to #initialize.



71
72
73
# File 'lib/sunflower/core.rb', line 71

def wikiURL
  @wikiURL
end

Class Method Details

.pathObject

Path to user data file.



50
51
52
# File 'lib/sunflower/core.rb', line 50

def self.path
	File.join(ENV['HOME'], 'sunflower-userdata')
end

.read_userdataObject

Returns array of [url, username, password], or nil if userdata is unavailable or invalid.



55
56
57
58
59
60
61
62
63
64
# File 'lib/sunflower/core.rb', line 55

def self.read_userdata
	data = nil
	data = File.read(Sunflower.path).split(/\r?\n/).map{|i| i.strip} rescue nil
	
	if data && data.length==3 && data.all?{|a| a and a != ''}
		return data
	else
		return nil
	end
end

.resolve_wikimedia_id(id) ⇒ Object

Used by #initialize to convert short identifiers such as “b:pl” to domains such as “pl.wikibooks.org”. Identifier is of the format “type:lang” or “lang:type” (see below for valid values).

Either or both parts can be ommitted; default type is “w”, default lang is “en”. (Since clashes are impossible, the colon can be ommitted in such cases as well.)

lang can be any valid language code. It is ignored for type == “meta” or “commons”.

Valid values for type are the same as used for inter-wiki links, that is:

w

Wikipedia

b

Wikibooks

n

Wikinews

q

Wikiquote

s

Wikisource

v

Wikiversity

wikt

Wiktionary

species

Wikispecies

commons

Wikimedia Commons

meta

Wikimedia Meta-Wiki

Raises:

  • (ArgumentError)


110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# File 'lib/sunflower/core.rb', line 110

def self.resolve_wikimedia_id id
	keys = id.split(':').select{|a| a and !a.empty? }
	
	raise ArgumentError, 'invalid format' if keys.length > 2
	
	type_map = {
		'b' => 'XX.wikibooks.org',
		'q' => 'XX.wikiquote.org',
		'n' => 'XX.wikinews.org',
		'w' => 'XX.wikipedia.org',
		'wikt' => 'XX.wiktionary.org',
		'species' => 'XX.wikispecies.org',
		'v' => 'XX.wikiversity.org',
		's' => 'XX.wikisource.org',
		'commons' => 'commons.wikimedia.org',
		'meta' => 'meta.wikimedia.org',
	}
	
	types, langs = keys.partition{|a| type_map.keys.include? a }
	type = types.first || 'w'
	lang = langs.first || 'en'
	
	return type_map[type].sub 'XX', lang
end

Instance Method Details

#API(request) ⇒ Object

Call the API. Returns a hash of JSON response. Request can be a HTTP request string or a hash.



219
220
221
222
223
224
225
226
227
228
229
230
231
232
# File 'lib/sunflower/core.rb', line 219

def API request
	if request.is_a? String
		request += '&format=json'
	elsif request.is_a? Hash
		request = request.merge({format:'json'})
	end
	
	resp = RestClient.post(
		@api_endpoint,
		request,
		{:user_agent => USER_AGENT, :cookies => @cookies}
	)
	JSON.parse resp.to_str
end

#API_continued(request, merge_on, xxcontinue, limit = nil) ⇒ Object

Call the API. While more results are available via the xxcontinue parameter, call it again.

Assumes action=query.

By default returns an array of all API responses. Attempts to merge the responses into a response that would have been returned if the limit was infinite (merges the response hashes recursively using Hash#sunflower_recursive_merge!). merge_on is the key of response that contains the continuation data.

If limit given, will perform no more than this many API calls before returning. If limit is 1, behaves exactly like #API.

Example: get list of all pages linking to Main Page:

sunflower.API_continued "action=query&list=backlinks&bllimit=max&bltitle=Main_Page", 'backlinks', 'blcontinue'


249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
# File 'lib/sunflower/core.rb', line 249

def API_continued request, merge_on, xxcontinue, limit=nil
	out = []
	
	# gather
	res = self.API(request)
	out << res
	while res['query-continue'] and (!limit || out.length < limit)
		api_endpoint = if request.is_a? String
			request + "&#{xxcontinue}=#{res["query-continue"][merge_on][xxcontinue]}"
		elsif request.is_a? Hash
			request.merge({xxcontinue => res["query-continue"][merge_on][xxcontinue]})
		end
		
		res = self.API(api_endpoint)
		out << res
	end
	
	# merge
	merged = out[0]
	out.drop(1).each do |cur|
		merged.sunflower_recursive_merge! cur
	end
	
	return merged
end

#cleanup_title(title, preserve_case = false, preserve_colon = false) ⇒ Object

Cleans up underscores, percent-encoding and title-casing in title (with optional anchor).



352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
# File 'lib/sunflower/core.rb', line 352

def cleanup_title title, preserve_case=false, preserve_colon=false
	# strip unicode bidi junk
	title = title.gsub /[\u200e\u200f\u202a\u202b\u202c\u202d\u202e]/, ''
	# strip unicode spaces
	title = title.gsub /[\u00a0\u1680\u180e\u2000-\u200a\u2028\u2029\u202f\u205f\u3000]+/, ' '
	
	return '' if title.strip == ''
	
	name, anchor = title.split '#', 2
	
	# CGI.unescape also changes pluses to spaces; code borrowed from there
	unescape = lambda{|a| a.gsub(/((?:%[0-9a-fA-F]{2})+)/){ [$1.delete('%')].pack('H*').force_encoding($1.encoding) } }
	
	ns = nil
	name = unescape.call(name).gsub(/[ _]+/, ' ').strip
	anchor = unescape.call(anchor.gsub(/\.([0-9a-fA-F]{2})/, '%\1')).gsub(/[ _]+/, ' ').strip if anchor
	
	leading_colon = name[0]==':'
	name = name.sub(/^:\s*/, '') if leading_colon
	leading_colon = false if !preserve_colon
	
	# FIXME unicode? downcase, upcase
	
	if name.include? ':'
		maybe_ns, part_name = name.split ':', 2
		if ns_id = @namespace_to_id[maybe_ns.strip.downcase]
			ns, name = @namespace_id_to_local[ns_id], part_name.strip
		end
	end
	
	name[0] = name[0].upcase if !preserve_case and @siteinfo["general"]["case"] == "first-letter"
	
	return [leading_colon ? ':' : nil,  ns ? "#{ns}:" : nil,  name,  anchor ? "##{anchor}" : nil].join ''
end

#inspectObject



192
193
194
# File 'lib/sunflower/core.rb', line 192

def inspect
	"#<Sunflower #{@loggedin ? @username : "[anon]"}@#{@wikiURL}#{@is_bot ? ' [bot]' : ''}>"
end

#is_bot?Boolean

Whether this user (if logged in) has bot rights.

Returns:

  • (Boolean)


81
# File 'lib/sunflower/core.rb', line 81

def is_bot?; @is_bot; end

#log?Boolean

Returns:

  • (Boolean)


89
# File 'lib/sunflower/core.rb', line 89

def log?; @log; end

#logged_in?Boolean

Whether we are logged in.

Returns:

  • (Boolean)


76
# File 'lib/sunflower/core.rb', line 76

def logged_in?; @loggedin; end

#login(user = '', password = '') ⇒ Object

Log in using given info.

Raises:



281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
# File 'lib/sunflower/core.rb', line 281

def  user='', password=''
	if user=='' || password==''
		userdata = Sunflower.read_userdata()
		
		if userdata
			user = userdata[1] if user==''
			password = userdata[2] if password==''
		else
			raise Sunflower::Error, 'login: no user/pass supplied and no userdata found!'
		end
	end
	
	raise Sunflower::Error, 'bad username!' if user =~ INVALID_CHARS_REGEX
	
	
	# 1. get the login token
	response = RestClient.post(
		@api_endpoint, 
		"action=login&lgname=#{CGI.escape user}&lgpassword=#{CGI.escape password}&format=json",
		{:user_agent => USER_AGENT}
	)
	
	@cookies = response.cookies
	raise Sunflower::Error, 'unable to log in (no cookies received)!' if !@cookies or @cookies.empty?
	
	json = JSON.parse response.to_str
	token = json['login']['lgtoken'] || json['login']['token']
	
	# 2. actually log in
	response = RestClient.post(
		@api_endpoint,
		"action=login&lgname=#{CGI.escape user}&lgpassword=#{CGI.escape password}&lgtoken=#{CGI.escape token}&format=json",
		{:user_agent => USER_AGENT, :cookies => @cookies}
	)
	
	json = JSON.parse response.to_str
	
	@cookies = @cookies.merge(response.cookies)
	
	raise Sunflower::Error, 'unable to log in (no cookies received)!' if !@cookies or @cookies.empty?
	
	
	# 3. confirm you did log in by checking the watchlist.
	@loggedin=true
	r=self.API('action=query&list=watchlistraw')
	if r['error'] && r['error']['code']=='wrnotloggedin'
		@loggedin=false
		raise Sunflower::Error, 'unable to log in!'
	end
	
	# set the username
	@username = user
	
	# 4. check bot rights
	r=self.API('action=query&list=allusers&aulimit=1&augroup=bot&aufrom='+(CGI.escape user))
	unless r['query']['allusers'][0] && r['query']['allusers'][0]['name']==user
		warn 'Sunflower - this user does not have bot rights!' if @warnings
		@is_bot=false
	else
		@is_bot=true
	end
	
	return self
end

#make_list(type, key, opts = {}) ⇒ Object

Makes a list of articles. Returns array of titles.



221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
# File 'lib/sunflower/list.rb', line 221

def make_list type, key, opts={}
	begin
		return Sunflower::List.new self, type, key, opts
	rescue Sunflower::Error => e
		if e.message == "no such list type available: #{type}"
			backwards_compat = {
				:categorieson => :categories_on,
				:categoryrecursive => :category_recursive,
				:categoryr => :category_recursive,
				:linkson => :links_on,
				:templateson => :templates_on,
				:transclusionson => :templates_on,
				:usercontribs => :contribs,
				:whatlinksto => :whatlinkshere,
				:whattranscludes => :whatembeds,
				:imageusage => :image_usage,
				:image => :image_usage,
				:searchtitles => :search_titles,
				:external => :linksearch,
				:regex => :grep,
				:regexp => :grep,
			}
			
			if type2 = backwards_compat[type.to_s.downcase.gsub(/[^a-z]/, '').to_sym]
				warn "warning: #{type} has been renamed to #{type2}, old name will be removed in v0.6"
				Sunflower::List.new self, type2, key, opts
			else
				raise e
			end
		else
			raise e
		end
	end
end

#ns_canon_for(ns) ⇒ Object

Like #ns_local_for, but returns canonical (English) name.



400
401
402
403
404
405
406
407
# File 'lib/sunflower/core.rb', line 400

def ns_canon_for ns
	case ns
	when Numeric
		@namespace_id_to_canon[ns.to_i]
	when String
		@namespace_id_to_canon[ @namespace_to_id[cleanup_title(ns).downcase] ]
	end
end

#ns_local_for(ns) ⇒ Object

Returns the localized namespace name for ns, which may be namespace number, canonical name, or any namespace alias.

Returns nil if passed an invalid namespace.



390
391
392
393
394
395
396
397
# File 'lib/sunflower/core.rb', line 390

def ns_local_for ns
	case ns
	when Numeric
		@namespace_id_to_local[ns.to_i]
	when String
		@namespace_id_to_local[ @namespace_to_id[cleanup_title(ns).downcase] ]
	end
end

#ns_regex_for(ns) ⇒ Object

Returns a regular expression that will match given namespace. Rules for input like #ns_local_for.

Does NOT handle percent-encoding and underscores. Use #cleanup_title to canonicalize the namespace first.



412
413
414
415
416
417
# File 'lib/sunflower/core.rb', line 412

def ns_regex_for ns
	id = ns.is_a?(Numeric) ? ns.to_i : @namespace_to_id[cleanup_title(ns).downcase]
	return nil if !id
	
	/#{@namespace_to_id.to_a.select{|a| a[1] == id }.map{|a| Regexp.escape a[0] }.join '|' }/i
end

#page(title) ⇒ Object

Returns a Sunflower::Page with the given title belonging to this Sunflower.



276
277
278
# File 'lib/sunflower/core.rb', line 276

def page title
	Sunflower::Page.new title, self
end

#warnings?Boolean

Returns:

  • (Boolean)


85
# File 'lib/sunflower/core.rb', line 85

def warnings?; @warnings; end