Class: Markdown::Merge::FileAnalysis
- Inherits:
-
FileAnalysisBase
- Object
- FileAnalysisBase
- Markdown::Merge::FileAnalysis
- Defined in:
- lib/markdown/merge/file_analysis.rb
Overview
File analysis for Markdown files using tree_haver backends.
Extends FileAnalysisBase with backend-agnostic parsing via tree_haver. Supports both Commonmarker and Markly backends through tree_haver’s unified API.
Parses Markdown source code and extracts:
-
Top-level block elements (headings, paragraphs, lists, code blocks, etc.)
-
Freeze blocks marked with HTML comments
-
Structural signatures for matching elements between files
All nodes are wrapped with canonical types via NodeTypeNormalizer, enabling portable merge rules across backends.
Freeze blocks are marked with HTML comments:
<!-- markdown-merge:freeze -->
... content to preserve ...
<!-- markdown-merge:unfreeze -->
Constant Summary collapse
- DEFAULT_FREEZE_TOKEN =
Default freeze token for identifying freeze blocks
"markdown-merge"
Instance Attribute Summary collapse
-
#backend ⇒ Symbol
readonly
The backend being used (:commonmarker, :markly).
-
#parser_options ⇒ Hash
readonly
Parser-specific options.
Attributes inherited from FileAnalysisBase
#document, #errors, #statements
Instance Method Summary collapse
-
#collect_top_level_nodes ⇒ Array<Object>
Collect top-level nodes from document, wrapping with canonical types.
-
#compute_parser_signature(node) ⇒ Array?
Compute signature for a tree_haver node.
-
#extract_text_content(node) ⇒ String
Extract all text content from a node and its children.
-
#fallthrough_node?(value) ⇒ Boolean
Override to detect tree_haver nodes for signature generator fallthrough.
-
#freeze_node_class ⇒ Class
Returns the FreezeNode class to use.
-
#initialize(source, backend: :auto, freeze_token: DEFAULT_FREEZE_TOKEN, signature_generator: nil, **parser_options) ⇒ FileAnalysis
constructor
Initialize file analysis with tree_haver backend.
-
#next_sibling(node) ⇒ Object?
Get the next sibling of a node.
-
#parse_document(source) ⇒ Object?
Parse the source document using tree_haver backend.
-
#parser_node?(value) ⇒ Boolean
Check if value is a tree_haver node.
-
#safe_string_content(node) ⇒ String
Safely get string content from a node.
Methods inherited from FileAnalysisBase
#compute_node_signature, #source_range, #valid?
Constructor Details
#initialize(source, backend: :auto, freeze_token: DEFAULT_FREEZE_TOKEN, signature_generator: nil, **parser_options) ⇒ FileAnalysis
Initialize file analysis with tree_haver backend.
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
# File 'lib/markdown/merge/file_analysis.rb', line 61 def initialize( source, backend: :auto, freeze_token: DEFAULT_FREEZE_TOKEN, signature_generator: nil, ** ) @requested_backend = backend = # Resolve and initialize the backend @backend = resolve_backend(backend) @parser = create_parser super(source, freeze_token: freeze_token, signature_generator: signature_generator) end |
Instance Attribute Details
#backend ⇒ Symbol (readonly)
Returns The backend being used (:commonmarker, :markly).
47 48 49 |
# File 'lib/markdown/merge/file_analysis.rb', line 47 def backend @backend end |
#parser_options ⇒ Hash (readonly)
Returns Parser-specific options.
50 51 52 |
# File 'lib/markdown/merge/file_analysis.rb', line 50 def end |
Instance Method Details
#collect_top_level_nodes ⇒ Array<Object>
Collect top-level nodes from document, wrapping with canonical types.
242 243 244 245 246 247 248 249 250 251 252 |
# File 'lib/markdown/merge/file_analysis.rb', line 242 def collect_top_level_nodes nodes = [] child = @document.first_child while child # Wrap each node with its canonical type wrapped = NodeTypeNormalizer.wrap(child, @backend) nodes << wrapped child = next_sibling(child) end nodes end |
#compute_parser_signature(node) ⇒ Array?
Compute signature for a tree_haver node.
Uses canonical types from NodeTypeNormalizer for portable signatures.
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
# File 'lib/markdown/merge/file_analysis.rb', line 150 def compute_parser_signature(node) # Get canonical type from wrapper or normalize raw type canonical_type = if Ast::Merge::NodeTyping.typed_node?(node) Ast::Merge::NodeTyping.merge_type_for(node) else NodeTypeNormalizer.canonical_type(node.type, @backend) end # Unwrap to access underlying node methods raw_node = Ast::Merge::NodeTyping.unwrap(node) case canonical_type when :heading # Content-based: Match headings by level and text content [:heading, raw_node.header_level, extract_text_content(raw_node)] when :paragraph # Content-based: Match paragraphs by content hash (first 32 chars of digest) text = extract_text_content(raw_node) [:paragraph, Digest::SHA256.hexdigest(text)[0, 32]] when :code_block # Content-based: Match code blocks by fence info and content hash content = safe_string_content(raw_node) fence_info = raw_node.respond_to?(:fence_info) ? raw_node.fence_info : nil [:code_block, fence_info, Digest::SHA256.hexdigest(content)[0, 16]] when :list # Structure-based: Match lists by type and item count (content may differ) list_type = raw_node.respond_to?(:list_type) ? raw_node.list_type : nil [:list, list_type, count_children(raw_node)] when :block_quote # Content-based: Match block quotes by content hash text = extract_text_content(raw_node) [:block_quote, Digest::SHA256.hexdigest(text)[0, 16]] when :thematic_break # Structure-based: All thematic breaks are equivalent [:thematic_break] when :html_block # Content-based: Match HTML blocks by content hash content = safe_string_content(raw_node) [:html_block, Digest::SHA256.hexdigest(content)[0, 16]] when :table # Content-based: Match tables by structure and header content header_content = extract_table_header_content(raw_node) [:table, count_children(raw_node), Digest::SHA256.hexdigest(header_content)[0, 16]] when :footnote_definition # Name/label-based: Match footnotes by name or label label = raw_node.respond_to?(:name) ? raw_node.name : safe_string_content(raw_node) [:footnote_definition, label] when :custom_block # Content-based: Match custom blocks by content hash text = extract_text_content(raw_node) [:custom_block, Digest::SHA256.hexdigest(text)[0, 16]] else # Unknown type - use canonical type and position pos = raw_node.source_position [:unknown, canonical_type, pos&.dig(:start_line)] end end |
#extract_text_content(node) ⇒ String
Extract all text content from a node and its children.
Override for tree_haver nodes which don’t have a walk method. Uses recursive traversal via children instead.
215 216 217 218 219 |
# File 'lib/markdown/merge/file_analysis.rb', line 215 def extract_text_content(node) text_parts = [] collect_text_recursive(node, text_parts) text_parts.join end |
#fallthrough_node?(value) ⇒ Boolean
Override to detect tree_haver nodes for signature generator fallthrough
137 138 139 140 141 142 |
# File 'lib/markdown/merge/file_analysis.rb', line 137 def fallthrough_node?(value) Ast::Merge::NodeTyping.typed_node?(value) || value.is_a?(Ast::Merge::FreezeNodeBase) || parser_node?(value) || super end |
#freeze_node_class ⇒ Class
Returns the FreezeNode class to use.
118 119 120 |
# File 'lib/markdown/merge/file_analysis.rb', line 118 def freeze_node_class FreezeNode end |
#next_sibling(node) ⇒ Object?
Get the next sibling of a node.
Handles differences between backends:
-
Commonmarker: node.next_sibling
-
Markly: node.next
106 107 108 109 110 111 112 113 |
# File 'lib/markdown/merge/file_analysis.rb', line 106 def next_sibling(node) # tree_haver normalizes this, but handle both patterns for safety if node.respond_to?(:next_sibling) node.next_sibling elsif node.respond_to?(:next) node.next end end |
#parse_document(source) ⇒ Object?
Parse the source document using tree_haver backend.
Error handling follows the same pattern as other *-merge gems:
-
TreeHaver::Error (which inherits from Exception, not StandardError) is caught
-
TreeHaver::NotAvailable is a subclass of TreeHaver::Error, so it’s also caught
-
When an error occurs, the error is stored in @errors and nil is returned
-
SmartMergerBase#parse_and_analyze checks valid? and raises the appropriate parse error
88 89 90 91 92 93 94 95 96 |
# File 'lib/markdown/merge/file_analysis.rb', line 88 def parse_document(source) tree = @parser.parse(source) tree.root_node rescue TreeHaver::Error => e # TreeHaver::Error inherits from Exception, not StandardError. # This also catches TreeHaver::NotAvailable (subclass of Error). @errors << e. nil end |
#parser_node?(value) ⇒ Boolean
Check if value is a tree_haver node.
126 127 128 129 130 131 132 |
# File 'lib/markdown/merge/file_analysis.rb', line 126 def parser_node?(value) # Check for tree_haver node or wrapped node return true if value.respond_to?(:type) && value.respond_to?(:source_position) return true if Ast::Merge::NodeTyping.typed_node?(value) false end |
#safe_string_content(node) ⇒ String
Safely get string content from a node.
Override for tree_haver nodes which use text instead of string_content.
227 228 229 230 231 232 233 234 235 236 237 |
# File 'lib/markdown/merge/file_analysis.rb', line 227 def safe_string_content(node) if node.respond_to?(:string_content) node.string_content.to_s elsif node.respond_to?(:text) node.text.to_s else extract_text_content(node) end rescue TypeError, NoMethodError extract_text_content(node) end |