Class: Nokorexi

Inherits:
Object
  • Object
show all
Defined in:
lib/nokorexi.rb

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(x, noscript: true, noevents: true, nosvg: true, nostyle: true, nolink: true, filter: false, debug: false) {|raw_doc| ... } ⇒ Nokorexi

Returns a new instance of Nokorexi.

Yields:

  • (raw_doc)


14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# File 'lib/nokorexi.rb', line 14

def initialize(x, noscript: true, noevents: true, nosvg: true,
               nostyle: true, nolink: true, filter: false, debug: false)

  raws = RXFHelper.read(x).first
  s = raws[/.*<\/html>$/m] || raws
  puts 's: ' + s.inspect if debug

  @to_doc2 = raw_doc = Nokogiri::HTML(s.gsub("&nbsp;",' '))

  if filter then

    raw_doc.xpath('//style').each(&:remove) if nostyle
    raw_doc.xpath('//link').each(&:remove) if nolink
    raw_doc.xpath('//script').each(&:remove) if noscript
    raw_doc.xpath('//svg').each(&:remove) if nosvg

    if noevents then

        raw_doc.xpath('//*[@onclick]').each do |e|
          e.attributes['onclick'].value = ''
        end

        raw_doc.xpath('//*[@onmousedown]').each do |e|
          e.attributes['onmousedown'].value = ''
        end
    end

  end

  yield(raw_doc) if block_given?

  @to_s = xml = raw_doc.xpath('html').to_xml
  @to_doc = Rexle.new(xml, debug: debug)

end

Instance Attribute Details

#to_docObject (readonly)

Returns the value of attribute to_doc.



12
13
14
# File 'lib/nokorexi.rb', line 12

def to_doc
  @to_doc
end

#to_doc2Object (readonly)

Returns the value of attribute to_doc2.



12
13
14
# File 'lib/nokorexi.rb', line 12

def to_doc2
  @to_doc2
end

#to_sObject (readonly)

Returns the value of attribute to_s.



12
13
14
# File 'lib/nokorexi.rb', line 12

def to_s
  @to_s
end