Class: Markdown::Merge::FileAnalysis

Inherits:
FileAnalysisBase show all
Defined in:
lib/markdown/merge/file_analysis.rb

Overview

File analysis for Markdown files using tree_haver backends.

Extends FileAnalysisBase with backend-agnostic parsing via tree_haver. Supports both Commonmarker and Markly backends through tree_haver’s unified API.

Parses Markdown source code and extracts:

  • Top-level block elements (headings, paragraphs, lists, code blocks, etc.)

  • Freeze blocks marked with HTML comments

  • Structural signatures for matching elements between files

All nodes are wrapped with canonical types via NodeTypeNormalizer, enabling portable merge rules across backends.

Freeze blocks are marked with HTML comments:

<!-- markdown-merge:freeze -->
... content to preserve ...
<!-- markdown-merge:unfreeze -->

Examples:

Basic usage with auto backend

analysis = FileAnalysis.new(markdown_source)
analysis.statements.each do |node|
  puts "#{node.merge_type}: #{node.type}"
end

With specific backend

analysis = FileAnalysis.new(markdown_source, backend: :markly)

With custom freeze token

analysis = FileAnalysis.new(source, freeze_token: "my-merge")
# Looks for: <!-- my-merge:freeze --> / <!-- my-merge:unfreeze -->

See Also:

Constant Summary collapse

DEFAULT_FREEZE_TOKEN =

Default freeze token for identifying freeze blocks

Returns:

"markdown-merge"

Instance Attribute Summary collapse

Attributes inherited from FileAnalysisBase

#document, #errors, #statements

Instance Method Summary collapse

Methods inherited from FileAnalysisBase

#compute_node_signature, #source_range, #valid?

Constructor Details

#initialize(source, backend: :auto, freeze_token: DEFAULT_FREEZE_TOKEN, signature_generator: nil, **parser_options) ⇒ FileAnalysis

Initialize file analysis with tree_haver backend.

Parameters:

  • Markdown source code to analyze

  • (defaults to: :auto)

    Backend to use (:commonmarker, :markly, :auto)

  • (defaults to: DEFAULT_FREEZE_TOKEN)

    Token for freeze block markers

  • (defaults to: nil)

    Custom signature generator

  • Backend-specific parser options For commonmarker: { options: {} } For markly: { flags: Markly::DEFAULT, extensions: [:table] }



61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# File 'lib/markdown/merge/file_analysis.rb', line 61

def initialize(
  source,
  backend: :auto,
  freeze_token: DEFAULT_FREEZE_TOKEN,
  signature_generator: nil,
  **parser_options
)
  @requested_backend = backend
  @parser_options = parser_options

  # Resolve and initialize the backend
  @backend = resolve_backend(backend)
  @parser = create_parser

  super(source, freeze_token: freeze_token, signature_generator: signature_generator)
end

Instance Attribute Details

#backendSymbol (readonly)

Returns The backend being used (:commonmarker, :markly).

Returns:

  • The backend being used (:commonmarker, :markly)



47
48
49
# File 'lib/markdown/merge/file_analysis.rb', line 47

def backend
  @backend
end

#parser_optionsHash (readonly)

Returns Parser-specific options.

Returns:

  • Parser-specific options



50
51
52
# File 'lib/markdown/merge/file_analysis.rb', line 50

def parser_options
  @parser_options
end

Instance Method Details

#collect_top_level_nodesArray<Object>

Collect top-level nodes from document, wrapping with canonical types.

Returns:

  • Wrapped nodes



242
243
244
245
246
247
248
249
250
251
252
# File 'lib/markdown/merge/file_analysis.rb', line 242

def collect_top_level_nodes
  nodes = []
  child = @document.first_child
  while child
    # Wrap each node with its canonical type
    wrapped = NodeTypeNormalizer.wrap(child, @backend)
    nodes << wrapped
    child = next_sibling(child)
  end
  nodes
end

#compute_parser_signature(node) ⇒ Array?

Compute signature for a tree_haver node.

Uses canonical types from NodeTypeNormalizer for portable signatures.

Parameters:

  • The node (may be wrapped)

Returns:

  • Signature array



150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
# File 'lib/markdown/merge/file_analysis.rb', line 150

def compute_parser_signature(node)
  # Get canonical type from wrapper or normalize raw type
  canonical_type = if Ast::Merge::NodeTyping.typed_node?(node)
    Ast::Merge::NodeTyping.merge_type_for(node)
  else
    NodeTypeNormalizer.canonical_type(node.type, @backend)
  end

  # Unwrap to access underlying node methods
  raw_node = Ast::Merge::NodeTyping.unwrap(node)

  case canonical_type
  when :heading
    # Content-based: Match headings by level and text content
    [:heading, raw_node.header_level, extract_text_content(raw_node)]
  when :paragraph
    # Content-based: Match paragraphs by content hash (first 32 chars of digest)
    text = extract_text_content(raw_node)
    [:paragraph, Digest::SHA256.hexdigest(text)[0, 32]]
  when :code_block
    # Content-based: Match code blocks by fence info and content hash
    content = safe_string_content(raw_node)
    fence_info = raw_node.respond_to?(:fence_info) ? raw_node.fence_info : nil
    [:code_block, fence_info, Digest::SHA256.hexdigest(content)[0, 16]]
  when :list
    # Structure-based: Match lists by type and item count (content may differ)
    list_type = raw_node.respond_to?(:list_type) ? raw_node.list_type : nil
    [:list, list_type, count_children(raw_node)]
  when :block_quote
    # Content-based: Match block quotes by content hash
    text = extract_text_content(raw_node)
    [:block_quote, Digest::SHA256.hexdigest(text)[0, 16]]
  when :thematic_break
    # Structure-based: All thematic breaks are equivalent
    [:thematic_break]
  when :html_block
    # Content-based: Match HTML blocks by content hash
    content = safe_string_content(raw_node)
    [:html_block, Digest::SHA256.hexdigest(content)[0, 16]]
  when :table
    # Content-based: Match tables by structure and header content
    header_content = extract_table_header_content(raw_node)
    [:table, count_children(raw_node), Digest::SHA256.hexdigest(header_content)[0, 16]]
  when :footnote_definition
    # Name/label-based: Match footnotes by name or label
    label = raw_node.respond_to?(:name) ? raw_node.name : safe_string_content(raw_node)
    [:footnote_definition, label]
  when :custom_block
    # Content-based: Match custom blocks by content hash
    text = extract_text_content(raw_node)
    [:custom_block, Digest::SHA256.hexdigest(text)[0, 16]]
  else
    # Unknown type - use canonical type and position
    pos = raw_node.source_position
    [:unknown, canonical_type, pos&.dig(:start_line)]
  end
end

#extract_text_content(node) ⇒ String

Extract all text content from a node and its children.

Override for tree_haver nodes which don’t have a walk method. Uses recursive traversal via children instead.

Parameters:

  • The node

Returns:

  • Concatenated text content



215
216
217
218
219
# File 'lib/markdown/merge/file_analysis.rb', line 215

def extract_text_content(node)
  text_parts = []
  collect_text_recursive(node, text_parts)
  text_parts.join
end

#fallthrough_node?(value) ⇒ Boolean

Override to detect tree_haver nodes for signature generator fallthrough

Parameters:

  • The value to check

Returns:

  • true if this is a fallthrough node



137
138
139
140
141
142
# File 'lib/markdown/merge/file_analysis.rb', line 137

def fallthrough_node?(value)
  Ast::Merge::NodeTyping.typed_node?(value) ||
    value.is_a?(Ast::Merge::FreezeNodeBase) ||
    parser_node?(value) ||
    super
end

#freeze_node_classClass

Returns the FreezeNode class to use.

Returns:

  • Markdown::Merge::FreezeNode



118
119
120
# File 'lib/markdown/merge/file_analysis.rb', line 118

def freeze_node_class
  FreezeNode
end

#next_sibling(node) ⇒ Object?

Get the next sibling of a node.

Handles differences between backends:

  • Commonmarker: node.next_sibling

  • Markly: node.next

Parameters:

  • Current node

Returns:

  • Next sibling or nil



106
107
108
109
110
111
112
113
# File 'lib/markdown/merge/file_analysis.rb', line 106

def next_sibling(node)
  # tree_haver normalizes this, but handle both patterns for safety
  if node.respond_to?(:next_sibling)
    node.next_sibling
  elsif node.respond_to?(:next)
    node.next
  end
end

#parse_document(source) ⇒ Object?

Parse the source document using tree_haver backend.

Error handling follows the same pattern as other *-merge gems:

  • TreeHaver::Error (which inherits from Exception, not StandardError) is caught

  • TreeHaver::NotAvailable is a subclass of TreeHaver::Error, so it’s also caught

  • When an error occurs, the error is stored in @errors and nil is returned

  • SmartMergerBase#parse_and_analyze checks valid? and raises the appropriate parse error

Parameters:

  • Markdown source to parse

Returns:

  • Root document node from tree_haver, or nil on error



88
89
90
91
92
93
94
95
96
# File 'lib/markdown/merge/file_analysis.rb', line 88

def parse_document(source)
  tree = @parser.parse(source)
  tree.root_node
rescue TreeHaver::Error => e
  # TreeHaver::Error inherits from Exception, not StandardError.
  # This also catches TreeHaver::NotAvailable (subclass of Error).
  @errors << e.message
  nil
end

#parser_node?(value) ⇒ Boolean

Check if value is a tree_haver node.

Parameters:

  • Value to check

Returns:

  • true if this is a parser node



126
127
128
129
130
131
132
# File 'lib/markdown/merge/file_analysis.rb', line 126

def parser_node?(value)
  # Check for tree_haver node or wrapped node
  return true if value.respond_to?(:type) && value.respond_to?(:source_position)
  return true if Ast::Merge::NodeTyping.typed_node?(value)

  false
end

#safe_string_content(node) ⇒ String

Safely get string content from a node.

Override for tree_haver nodes which use text instead of string_content.

Parameters:

  • The node

Returns:

  • String content or empty string



227
228
229
230
231
232
233
234
235
236
237
# File 'lib/markdown/merge/file_analysis.rb', line 227

def safe_string_content(node)
  if node.respond_to?(:string_content)
    node.string_content.to_s
  elsif node.respond_to?(:text)
    node.text.to_s
  else
    extract_text_content(node)
  end
rescue TypeError, NoMethodError
  extract_text_content(node)
end