Class: Boilerpipe::SAX::Preprocessor

Inherits:
Object
  • Object
show all
Defined in:
lib/boilerpipe/sax/preprocessor.rb

Class Method Summary collapse

Class Method Details

.strip(text) ⇒ Object



3
4
5
6
7
8
9
# File 'lib/boilerpipe/sax/preprocessor.rb', line 3

def self.strip(text)
  # script bug - delete script tags
  text = text.gsub(/\<script.+?<\/script>/im, '')
  # nokogiri uses libxml for mri and nekohtml for jruby
  # mri doesn't remove &nbsp; when missing the semicolon
  text.gsub(/(&nbsp) /, '\1; ')
end