Class: WordToMarkdown::Document
- Inherits:
-
Object
- Object
- WordToMarkdown::Document
- Defined in:
- lib/word-to-markdown/document.rb
Defined Under Namespace
Classes: ConverstionError, NotFoundError
Instance Attribute Summary collapse
-
#path ⇒ Object
readonly
Returns the value of attribute path.
-
#tmpdir ⇒ Object
readonly
Returns the value of attribute tmpdir.
Instance Method Summary collapse
-
#encoding ⇒ Object
Determine the document encoding.
- #extension ⇒ Object
-
#html ⇒ Object
Returns the html representation of the document.
-
#initialize(path, tmpdir = nil) ⇒ Document
constructor
A new instance of Document.
-
#to_s ⇒ Object
Returns the markdown representation of the document.
- #tree ⇒ Object
Constructor Details
#initialize(path, tmpdir = nil) ⇒ Document
Returns a new instance of Document.
9 10 11 12 13 |
# File 'lib/word-to-markdown/document.rb', line 9 def initialize(path, tmpdir = nil) @path = File. path, Dir.pwd @tmpdir = tmpdir || Dir.mktmpdir raise NotFoundError, "File #{@path} does not exist" unless File.exist?(@path) end |
Instance Attribute Details
#path ⇒ Object (readonly)
Returns the value of attribute path.
7 8 9 |
# File 'lib/word-to-markdown/document.rb', line 7 def path @path end |
#tmpdir ⇒ Object (readonly)
Returns the value of attribute tmpdir.
7 8 9 |
# File 'lib/word-to-markdown/document.rb', line 7 def tmpdir @tmpdir end |
Instance Method Details
#encoding ⇒ Object
Determine the document encoding
html - the raw html export
Returns the encoding, defaulting to “UTF-8”
42 43 44 45 46 47 48 49 |
# File 'lib/word-to-markdown/document.rb', line 42 def encoding match = raw_html.encode("UTF-8", :invalid => :replace, :replace => "").match(/charset=([^\"]+)/) if match match[1].sub("macintosh", "MacRoman") else "UTF-8" end end |
#extension ⇒ Object
15 16 17 |
# File 'lib/word-to-markdown/document.rb', line 15 def extension File.extname path end |
#html ⇒ Object
Returns the html representation of the document
28 29 30 |
# File 'lib/word-to-markdown/document.rb', line 28 def html tree.to_html.gsub("</li>\n", "</li>") end |
#to_s ⇒ Object
Returns the markdown representation of the document
33 34 35 |
# File 'lib/word-to-markdown/document.rb', line 33 def to_s @markdown ||= scrub_whitespace(ReverseMarkdown.convert(html, WordToMarkdown::REVERSE_MARKDOWN_OPTIONS)) end |
#tree ⇒ Object
19 20 21 22 23 24 25 |
# File 'lib/word-to-markdown/document.rb', line 19 def tree @tree ||= begin tree = Nokogiri::HTML(normalized_html) tree.css("title").remove tree end end |