Class: Llmsherpa::Paragraph

Inherits:
Block
  • Object
show all
Defined in:
lib/llmsherpa/blocks.rb

Overview

A paragraph is a block of text. It can have children such as lists. A paragraph has tag ‘para’.

Instance Attribute Summary

Attributes inherited from Block

#bbox, #block_idx, #block_json, #children, #left, #level, #page_idx, #parent, #sentences, #tag, #top

Instance Method Summary collapse

Methods inherited from Block

#add_child, #chunks, #initialize, #iter_children, #paragraphs, #parent_chain, #parent_text, #sections, #tables, #to_context_text

Constructor Details

This class inherits a constructor from Llmsherpa::Block

Instance Method Details

#to_html(include_children = false, recurse = false) ⇒ Object



133
134
135
136
137
138
139
140
141
142
143
144
145
# File 'lib/llmsherpa/blocks.rb', line 133

def to_html(include_children = false, recurse = false)
  html_str = "<p>"
  html_str += @sentences.join("\n")
  if include_children && !@children.empty?
    html_str += "<ul>"
    @children.each do |child|
      html_str += child.to_html(include_children: recurse, recurse: recurse)
    end
    html_str += "</ul>"
  end
  html_str += "</p>"
  html_str
end

#to_text(include_children = false, recurse = false) ⇒ Object



123
124
125
126
127
128
129
130
131
# File 'lib/llmsherpa/blocks.rb', line 123

def to_text(include_children = false, recurse = false)
  para_text = @sentences.join("\n")
  if include_children
    @children.each do |child|
      para_text += "\n#{child.to_text(include_children: recurse, recurse: recurse)}"
    end
  end
  para_text
end