Class: ArticleJSON::Import::GoogleDoc::HTML::NodeAnalyzer
- Inherits:
-
Object
- Object
- ArticleJSON::Import::GoogleDoc::HTML::NodeAnalyzer
- Defined in:
- lib/article_json/import/google_doc/html/node_analyzer.rb
Instance Attribute Summary collapse
-
#node ⇒ Object
readonly
Returns the value of attribute node.
Instance Method Summary collapse
-
#begins_with?(text) ⇒ Boolean
Check if the node text begins with a certain text.
-
#br? ⇒ Boolean
Check if the node is a linebreak.
-
#embed? ⇒ Boolean
Check if the node contains an embedded element.
-
#empty? ⇒ Boolean
Check if the node is empty, i.e.
-
#has_text?(text) ⇒ Boolean
Check if a node equals a certain text.
-
#heading? ⇒ Boolean
Check if the node is a header tag between <h1> and <h5>.
-
#hr? ⇒ Boolean
Check if the node is a horizontal line (i.e. ‘<hr>`).
-
#image? ⇒ Boolean
Check if the node contains an image.
-
#initialize(node) ⇒ NodeAnalyzer
constructor
A new instance of NodeAnalyzer.
-
#list? ⇒ Boolean
Check if the node contains an ordered or unordered list.
-
#paragraph? ⇒ Boolean
Check if the node is a normal text paragraph.
-
#quote? ⇒ Boolean
Check if the node starts a quote Quotes start with a single line saying “Quote:”.
-
#text_box? ⇒ Boolean
Check if the node starts a text box Text boxes start with a single line saying “Textbox:” or “Highlight:”.
-
#type ⇒ Symbol
Determine the type of this node The type is one of the elements supported by article_json.
Constructor Details
#initialize(node) ⇒ NodeAnalyzer
Returns a new instance of NodeAnalyzer.
9 10 11 |
# File 'lib/article_json/import/google_doc/html/node_analyzer.rb', line 9 def initialize(node) @node = node end |
Instance Attribute Details
#node ⇒ Object (readonly)
Returns the value of attribute node.
6 7 8 |
# File 'lib/article_json/import/google_doc/html/node_analyzer.rb', line 6 def node @node end |
Instance Method Details
#begins_with?(text) ⇒ Boolean
Check if the node text begins with a certain text
23 24 25 26 |
# File 'lib/article_json/import/google_doc/html/node_analyzer.rb', line 23 def begins_with?(text) first_word = node.inner_text.strip.downcase.split(' ').first first_word == text.strip.downcase end |
#br? ⇒ Boolean
Check if the node is a linebreak. A span only containing whitespaces and
tags is considered a linebreak.
104 105 106 107 |
# File 'lib/article_json/import/google_doc/html/node_analyzer.rb', line 104 def br? return @is_br if defined? @is_br @is_br = node.name == 'br' || only_includes_brs? end |
#embed? ⇒ Boolean
Check if the node contains an embedded element
96 97 98 99 |
# File 'lib/article_json/import/google_doc/html/node_analyzer.rb', line 96 def return if defined? = EmbeddedParser.supported?(node) end |
#empty? ⇒ Boolean
Check if the node is empty, i.e. not containing any text Given that images are the only nodes without text, we have to make sure that it’s not an image.
32 33 34 35 |
# File 'lib/article_json/import/google_doc/html/node_analyzer.rb', line 32 def empty? return @is_empty if defined? @is_empty @is_empty = node.inner_text.strip.empty? && !image? && !hr? && !br? end |
#has_text?(text) ⇒ Boolean
Check if a node equals a certain text
16 17 18 |
# File 'lib/article_json/import/google_doc/html/node_analyzer.rb', line 16 def has_text?(text) node.inner_text.strip.downcase == text.strip.downcase end |
#heading? ⇒ Boolean
Check if the node is a header tag between <h1> and <h5>
39 40 41 42 43 |
# File 'lib/article_json/import/google_doc/html/node_analyzer.rb', line 39 def heading? return @is_heading if defined? @is_heading @is_heading = !quote? && !text_box? && %w(h1 h2 h3 h4 h5).include?(node.name) end |
#hr? ⇒ Boolean
Check if the node is a horizontal line (i.e. ‘<hr>`)
47 48 49 |
# File 'lib/article_json/import/google_doc/html/node_analyzer.rb', line 47 def hr? node.name == 'hr' end |
#image? ⇒ Boolean
Check if the node contains an image
89 90 91 92 |
# File 'lib/article_json/import/google_doc/html/node_analyzer.rb', line 89 def image? return @is_image if defined? @is_image @is_image = node.xpath('.//img').length > 0 end |
#list? ⇒ Boolean
Check if the node contains an ordered or unordered list
66 67 68 69 |
# File 'lib/article_json/import/google_doc/html/node_analyzer.rb', line 66 def list? return @is_list if defined? @is_list @is_list = %w(ul ol).include?(node.name) end |
#paragraph? ⇒ Boolean
Check if the node is a normal text paragraph
53 54 55 56 57 58 59 60 61 62 |
# File 'lib/article_json/import/google_doc/html/node_analyzer.rb', line 53 def paragraph? return @is_paragraph if defined? @is_paragraph @is_paragraph = node.name == 'p' && !empty? && !image? && !text_box? && !quote? && ! end |
#quote? ⇒ Boolean
Check if the node starts a quote Quotes start with a single line saying “Quote:”.
82 83 84 85 |
# File 'lib/article_json/import/google_doc/html/node_analyzer.rb', line 82 def quote? return @is_quote if defined? @is_quote @is_quote = has_text?('quote:') end |
#text_box? ⇒ Boolean
Check if the node starts a text box Text boxes start with a single line saying “Textbox:” or “Highlight:”.
74 75 76 77 |
# File 'lib/article_json/import/google_doc/html/node_analyzer.rb', line 74 def text_box? return @is_text_box if defined? @is_text_box @is_text_box = begins_with?('textbox:') || begins_with?('highlight:') end |
#type ⇒ Symbol
Determine the type of this node The type is one of the elements supported by article_json.
112 113 114 115 116 117 118 119 120 121 122 123 |
# File 'lib/article_json/import/google_doc/html/node_analyzer.rb', line 112 def type return :empty if empty? return :hr if hr? return :heading if heading? return :paragraph if paragraph? return :list if list? return :text_box if text_box? return :quote if quote? return :image if image? return :embed if :unknown end |