Class: Sunflower::Page

Inherits:
Object
  • Object
show all
Defined in:
lib/sunflower/core.rb,
lib/sunflower/commontasks.rb

Overview

Class representing a single Wiki page. To load specified page, use #new. To save it back, use #save.

Constant Summary collapse

INVALID_CHARS =

Characters which MediaWiki does not permit in page title.

%w(# < > [ ] | { })
INVALID_CHARS_REGEX =

Regex matching characters which MediaWiki does not permit in page title.

Regexp.union *INVALID_CHARS

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(title = '', url = '') ⇒ Page

Load the specified page. Only the text will be immediately loaded - attributes and edit token will be loaded when needed, or when you call #preload_attrs.

If you are using multiple Sunflowers, you have to specify which one this page belongs to using the second argument of function. You can pass either a Sunflower object, wiki URL, or a shorthand id as specified in Sunflower.resolve_wikimedia_id.

Raises:



478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
# File 'lib/sunflower/core.rb', line 478

def initialize title='', url=''
  raise Sunflower::Error, 'title invalid: '+title if title =~ INVALID_CHARS_REGEX
  
  case url
  when Sunflower
    @sunflower = url
  when '', nil
    count = ObjectSpace.each_object(Sunflower){|o| @sunflower=o}
    raise Sunflower::Error, 'no Sunflowers present' if count==0
    raise Sunflower::Error, 'you must pass wiki name if using multiple Sunflowers at once' if count>1
  else
    url = (url.include?('.') ? url : Sunflower.resolve_wikimedia_id(url))
    ObjectSpace.each_object(Sunflower){|o| @sunflower=o if o.wikiURL==url}
    raise Sunflower::Error, "no Sunflower for #{url}" if !@sunflower
  end
  
  @title = @sunflower.cleanup_title title
  
  @preloaded_text = false
  @preloaded_attrs = false
end

Instance Attribute Details

#counterObject (readonly)

Value of given attribute, as returned by API call prop=info for this page. Lazy-loaded.



440
441
442
# File 'lib/sunflower/core.rb', line 440

def counter
  @counter
end

#edittokenObject (readonly)

Value of given attribute, as returned by API call prop=info for this page. Lazy-loaded.



440
441
442
# File 'lib/sunflower/core.rb', line 440

def edittoken
  @edittoken
end

#lastrevidObject (readonly)

Value of given attribute, as returned by API call prop=info for this page. Lazy-loaded.



440
441
442
# File 'lib/sunflower/core.rb', line 440

def lastrevid
  @lastrevid
end

#lengthObject (readonly)

Value of given attribute, as returned by API call prop=info for this page. Lazy-loaded.



440
441
442
# File 'lib/sunflower/core.rb', line 440

def length
  @length
end

#nsObject (readonly)

Value of given attribute, as returned by API call prop=info for this page. Lazy-loaded.



440
441
442
# File 'lib/sunflower/core.rb', line 440

def ns
  @ns
end

#orig_textObject (readonly)

The text of the page, as of when it was loaded. Lazy-loaded.



432
433
434
# File 'lib/sunflower/core.rb', line 432

def orig_text
  @orig_text
end

#pageidObject (readonly)

Value of given attribute, as returned by API call prop=info for this page. Lazy-loaded.



440
441
442
# File 'lib/sunflower/core.rb', line 440

def pageid
  @pageid
end

#preloaded_attrsObject

Whether this datum is already loaded. Can be set to true to suppress loading (used e.g. by Sunflower::List#pages_preloaded)



446
447
448
# File 'lib/sunflower/core.rb', line 446

def preloaded_attrs
  @preloaded_attrs
end

#preloaded_textObject

Whether this datum is already loaded. Can be set to true to suppress loading (used e.g. by Sunflower::List#pages_preloaded)



446
447
448
# File 'lib/sunflower/core.rb', line 446

def preloaded_text
  @preloaded_text
end

#protectionObject (readonly)

Value of given attribute, as returned by API call prop=info for this page. Lazy-loaded.



440
441
442
# File 'lib/sunflower/core.rb', line 440

def protection
  @protection
end

#real_titleObject (readonly)

Value of ‘title` attribute, as returned by API call prop=info for this page. Lazy-loaded. See #title.



442
443
444
# File 'lib/sunflower/core.rb', line 442

def real_title
  @real_title
end

#starttimestampObject (readonly)

Value of given attribute, as returned by API call prop=info for this page. Lazy-loaded.



440
441
442
# File 'lib/sunflower/core.rb', line 440

def starttimestamp
  @starttimestamp
end

#sunflowerObject (readonly)

The Sunflower instance this page belongs to.



427
428
429
# File 'lib/sunflower/core.rb', line 427

def sunflower
  @sunflower
end

#textObject

The current text of the page. Lazy-loaded.



430
431
432
# File 'lib/sunflower/core.rb', line 430

def text
  @text
end

#titleObject (readonly)

Page title, as passed to #initialize and cleaned by Sunflower#cleanup_title. Real page title as canonicalized by MediaWiki software can be accessed via #real_title (but it should always be the same).



437
438
439
# File 'lib/sunflower/core.rb', line 437

def title
  @title
end

#touchedObject (readonly)

Value of given attribute, as returned by API call prop=info for this page. Lazy-loaded.



440
441
442
# File 'lib/sunflower/core.rb', line 440

def touched
  @touched
end

Class Method Details

.get(title, wiki = '') ⇒ Object



572
573
574
# File 'lib/sunflower/core.rb', line 572

def self.get title, wiki=''
  self.new(title, wiki)
end

.load(title, wiki = '') ⇒ Object



576
577
578
# File 'lib/sunflower/core.rb', line 576

def self.load title, wiki=''
  self.new(title, wiki)
end

Instance Method Details

#append(txt, newlines = 2) ⇒ Object

appends newlines and text by default - 2 newlines



18
19
20
# File 'lib/sunflower/commontasks.rb', line 18

def append txt, newlines=2
  self.text = self.text.rstrip + ("\n"*newlines) + txt
end

#change_category(from, to) ⇒ Object

Replace the category from with category to in page wikitext.

Inputs can be either with the Category: prefix (or localised version) or without.



103
104
105
106
107
108
109
110
111
112
# File 'lib/sunflower/commontasks.rb', line 103

def change_category from, to
  cat_regex = self.sunflower.ns_regex_for 'Category'
  from = self.sunflower.cleanup_title(from).sub(/^#{cat_regex}:/, '')
  to   = self.sunflower.cleanup_title(to  ).sub(/^#{cat_regex}:/, '')
  
  self.text.gsub!(/\[\[ *#{cat_regex} *: *#{Regexp.escape from} *(\||\]\])/){
    rest = $1
    "[[#{self.sunflower.ns_local_for 'Category'}:#{to}#{rest}"
  }
end

#code_cleanupObject

simple, safe code cleanup use Sunflower.always_do_code_cleanup=true to do it automatically just before saving page



76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# File 'lib/sunflower/commontasks.rb', line 76

def code_cleanup
  str = self.text.gsub /\r\n/, "\n"
  
  str.gsub!(/\[\[([^\|\]]+)(\||\]\])/){
    name, rest = $1, $2
    "[[#{self.sunflower.cleanup_title name, true, true}#{rest}"
  }
  
  # headings
  str.gsub!(/(^|\n)(=+) *([^=\n]*[^ :=\n])[ :]*=/, '\1\2 \3 ='); # =a= > = a =, =a:= > = a =
  str.gsub!(/(^|\n)(=+[^=\n]+=+)[\n]{2,}/, "\\1\\2\n"); # one newline

  # spaced lists
  str.gsub!(/(\n[#*:;]+)([^ \t\n#*:;{])/, '\1 \2');
  
  if wikiid = self.sunflower.siteinfo['general']['wikiid']
    if self.respond_to? :"code_cleanup_#{wikiid}"
      str = self.send :"code_cleanup_#{wikiid}", str
    end
  end
  
  self.text = str
end

#code_cleanup_plwiki(str) ⇒ Object

plwiki-specific cleanup routines. based on Nux’s cleaner: pl.wikipedia.org/wiki/Wikipedysta:Nux/wp_sk.js



30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# File 'lib/sunflower/commontasks.rb', line 30

def code_cleanup_plwiki str
  str = str.dup
  
  str.gsub!(/\{\{\{(?:poprzednik|następca|pop|nast|lata|info|lang)\|(.+?)\}\}\}/i,'\1')
  str.gsub!(/(={1,5})\s*Przypisy\s*\1\s*<references\s?\/>/i){
    if $1=='=' || $1=='=='
      '{{Przypisy}}'
    else
      '{{Przypisy|stopień= '+$1+'}}'
    end
  }
  
  # sklejanie skrótów linkowych
  str.gsub!(/m\.? ?\[\[n\.? ?p\.? ?m\.?\]\]/, 'm [[n.p.m.]]');

  # korekty dat - niepotrzebny przecinek
  str.gsub!(/(\[\[[0-9]+ (stycznia|lutego|marca|kwietnia|maja|czerwca|lipca|sierpnia|września|października|listopada|grudnia)\]\]), (\[\[[0-9]{4}\]\])/i, '\1 \3');

  # linkowanie do wieków
  str.gsub!(/\[\[([XVI]{1,5}) [wW]\.?\]\]/, '[[\1 wiek|\1 w.]]');
  str.gsub!(/\[\[([XVI]{1,5}) [wW]\.?\|/, '[[\1 wiek|');
  str.gsub!(/\[\[(III|II|IV|VIII|VII|VI|IX|XIII|XII|XI|XIV|XV|XVIII|XVII|XVI|XIX|XXI|XX)\]\]/, '[[\1 wiek|\1]]');
  str.gsub!(/\[\[(III|II|IV|VIII|VII|VI|IX|XIII|XII|XI|XIV|XV|XVIII|XVII|XVI|XIX|XXI|XX)\|/, '[[\1 wiek|');

  # rozwijanie typowych linków
  str.gsub!(/\[\[ang\.\]\]/, '[[język angielski|ang.]]');
  str.gsub!(/\[\[cz\.\]\]/, '[[język czeski|cz.]]');
  str.gsub!(/\[\[fr\.\]\]/, '[[język francuski|fr.]]');
  str.gsub!(/\[\[łac\.\]\]/, '[[łacina|łac.]]');
  str.gsub!(/\[\[niem\.\]\]/, '[[język niemiecki|niem.]]');
  str.gsub!(/\[\[pol\.\]\]/, '[[język polski|pol.]]');
  str.gsub!(/\[\[pl\.\]\]/, '[[język polski|pol.]]');
  str.gsub!(/\[\[ros\.\]\]/, '[[język rosyjski|ros.]]');
  str.gsub!(/\[\[(((G|g)iga|(M|m)ega|(K|k)ilo)herc|[GMk]Hz)\|/, '[[herc|');

  # unifikacja nagłówkowa
  str.gsub!(/[ \n\t]*\n'''? *(Zobacz|Patrz) (też|także):* *'''?[ \n\t]*/i, "\n\n== Zobacz też ==\n");
  str.gsub!(/[ \n\t]*\n(=+) *(Zobacz|Patrz) (też|także):* *=+[ \n\t]*/i, "\n\n\\1 Zobacz też \\1\n");
  str.gsub!(/[ \n\t]*\n'''? *((Zewnętrzn[ey] )?(Linki?|Łącza|Stron[ay]|Zobacz w (internecie|sieci))( zewn[eę]trzn[aey])?):* *'''?[ \n\t]*/i, "\n\n== Linki zewnętrzne ==\n");
  str.gsub!(/[ \n\t]*\n(=+) *((Zewnętrzn[ey] )?(Linki?|Łącza|Stron[ay]|Zobacz w (internecie|sieci))( zewn[eę]trzn[aey])?):* *=+[ \n\t]*/i, "\n\n\\1 Linki zewnętrzne \\1\n");

  return str
end

#dumpObject

Save the current text of this page to a file whose name is based on page title, with non-alphanumeric characters stripped.



543
544
545
# File 'lib/sunflower/core.rb', line 543

def dump
  self.dump_to @title.gsub(/[^a-zA-Z0-9\-]/,'_')+'.txt'
end

#dump_to(file) ⇒ Object

Save the current text of this page to file (which can be either a filename or an IO).



534
535
536
537
538
539
540
# File 'lib/sunflower/core.rb', line 534

def dump_to file
  if file.respond_to? :write #probably file or IO
    file.write @text
  else #filename?
    File.open(file.to_s, 'w'){|f| f.write @text}
  end
end

#gsub(from, to) ⇒ Object



9
10
11
# File 'lib/sunflower/commontasks.rb', line 9

def gsub from, to
  self.replace from, to
end

#preload_attrsObject

Load the metadata associated with this page. Semi-private.



522
523
524
525
526
527
528
529
530
531
# File 'lib/sunflower/core.rb', line 522

def preload_attrs
  r = @sunflower.API('action=query&prop=info&inprop=protection&intoken=edit&titles='+CGI.escape(@title))
  r = r['query']['pages'].values.first
  r.each{|key, value|
    key = 'real_title' if key == 'title'
    self.instance_variable_set('@'+key, value)
  }
  
  @preloaded_attrs = true
end

#preload_textObject

Load the text of this page. Semi-private.



501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
# File 'lib/sunflower/core.rb', line 501

def preload_text
  if title == ''
    @text = ''
  else
    r = @sunflower.API('action=query&prop=revisions&rvprop=content&titles='+CGI.escape(@title))
    r = r['query']['pages'].values.first
    if r['missing']
      @text = ''
    elsif r['invalid']
      raise Sunflower::Error, 'title invalid: '+@title
    else
      @text = r['revisions'][0]['*']
    end
  end
  
  @orig_text = @text.dup
  
  @preloaded_text = true
end

#prepend(txt, newlines = 2) ⇒ Object

prepends text and newlines by default - 2 newlines



24
25
26
# File 'lib/sunflower/commontasks.rb', line 24

def prepend txt, newlines=2
  self.text = txt + ("\n"*newlines) + self.text.lstrip
end

#remove_category(cat) ⇒ Object

Remove the category from page wikitext.

Input can be either with the Category: prefix (or localised version) or without.



117
118
119
120
121
122
# File 'lib/sunflower/commontasks.rb', line 117

def remove_category cat
  cat_regex = self.sunflower.ns_regex_for 'Category'
  cat = self.sunflower.cleanup_title(cat).sub(/^#{cat_regex}:/, '')
  
  self.text.gsub!(/\[\[ *#{cat_regex} *: *#{Regexp.escape cat} *(\|[^\]]*)?\]\](\r?\n)?/, '')
end

#replace(from, to, once = false) ⇒ Object

replaces “from” with “to” in page text “from” may be regex



6
7
8
# File 'lib/sunflower/commontasks.rb', line 6

def replace from, to, once=false
  self.text = self.text.send( (once ? 'sub' : 'gsub'), from, to )
end

#save(title = @title, summary = @sunflower.summary) ⇒ Object Also known as: put

Save the modifications to this page, possibly under a different title. Default summary is this page’s Sunflower’s summary (see Sunflower#summary=). Default title is the current title.

Will not perform API request if no changes were made.

Will call #code_cleanup if Sunflower#always_do_code_cleanup is set.

Returns the JSON result of API call or nil when API call was not made.

Raises:



554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
# File 'lib/sunflower/core.rb', line 554

def save title=@title, summary=@sunflower.summary
  preload_attrs unless @preloaded_attrs
  
  raise Sunflower::Error, 'title invalid: '+title if title =~ INVALID_CHARS_REGEX
  raise Sunflower::Error, 'empty or no summary!' if !summary or summary==''
  
  if @orig_text==@text && title==@title
    @sunflower.log('Page '+title+' not saved - no changes.')
    return nil
  end
  
  
  self.code_cleanup if @sunflower.always_do_code_cleanup && self.respond_to?('code_cleanup')
  
  return @sunflower.API("action=edit&bot=1&title=#{CGI.escape(title)}&text=#{CGI.escape(@text)}&summary=#{CGI.escape(summary)}&token=#{CGI.escape(@edittoken)}")
end

#sub(from, to) ⇒ Object



12
13
14
# File 'lib/sunflower/commontasks.rb', line 12

def sub from, to
  self.replace from, to, true
end