Class: Markdown::Merge::FileAnalysisBase Abstract
- Inherits:
-
Object
- Object
- Markdown::Merge::FileAnalysisBase
- Includes:
- Ast::Merge::FileAnalyzable
- Defined in:
- lib/markdown/merge/file_analysis_base.rb
Overview
Subclass and implement parser-specific methods
Base class for file analysis for Markdown files.
Parses Markdown source code and extracts:
-
Top-level block elements (headings, paragraphs, lists, code blocks, etc.)
-
Freeze blocks marked with HTML comments
-
Structural signatures for matching elements between files
Subclasses must implement parser-specific methods:
-
#parse_document(source) - Parse source and return document node
-
#next_sibling(node) - Get next sibling of a node
-
#compute_parser_signature(node) - Compute signature for parser-specific nodes
-
#node_type_name(type) - Map canonical type names if needed
Freeze blocks are marked with HTML comments:
<!-- markdown-merge:freeze -->
... content to preserve ...
<!-- markdown-merge:unfreeze -->
Direct Known Subclasses
Constant Summary collapse
- DEFAULT_FREEZE_TOKEN =
Default freeze token for identifying freeze blocks
"markdown-merge"
Instance Attribute Summary collapse
-
#document ⇒ Object
readonly
The root document node.
-
#errors ⇒ Array
readonly
Parse errors if any.
-
#statements ⇒ Array<Object, FreezeNode>
readonly
Get all statements (block nodes outside freeze blocks + FreezeNode instances).
Instance Method Summary collapse
-
#compute_node_signature(node) ⇒ Array?
Compute default signature for a node.
-
#compute_parser_signature(node) ⇒ Array?
abstract
Compute signature for a parser-specific node.
-
#extract_text_content(node) ⇒ String
Extract all text content from a node and its children.
-
#fallthrough_node?(value) ⇒ Boolean
Override to detect parser nodes for signature generator fallthrough.
-
#initialize(source, freeze_token: DEFAULT_FREEZE_TOKEN, signature_generator: nil, **parser_options) ⇒ FileAnalysisBase
constructor
Initialize file analysis.
-
#next_sibling(node) ⇒ Object?
abstract
Get the next sibling of a node.
-
#parse_document(source) ⇒ Object
abstract
Parse the source document.
-
#parser_node?(value) ⇒ Boolean
Check if value is a parser-specific node.
-
#safe_string_content(node) ⇒ String
Safely get string content from a node.
-
#source_range(start_line, end_line) ⇒ String
Get the source text for a range of lines.
-
#valid? ⇒ Boolean
Check if parse was successful.
Constructor Details
#initialize(source, freeze_token: DEFAULT_FREEZE_TOKEN, signature_generator: nil, **parser_options) ⇒ FileAnalysisBase
Initialize file analysis
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
# File 'lib/markdown/merge/file_analysis_base.rb', line 58 def initialize(source, freeze_token: DEFAULT_FREEZE_TOKEN, signature_generator: nil, **) @source = source # Split by newlines, keeping trailing empty strings (-1) # But remove the final empty string if source ends with newline # (that empty string represents the "line after the last newline" which doesn't exist) @lines = source.split("\n", -1) @lines.pop if @lines.last == "" && source.end_with?("\n") @freeze_token = freeze_token @signature_generator = signature_generator = @errors = [] # Parse the Markdown source - subclasses implement this @document = DebugLogger.time("FileAnalysisBase#parse") do parse_document(source) end # Extract and integrate all nodes including freeze blocks @statements = extract_and_integrate_all_nodes DebugLogger.debug("FileAnalysisBase initialized", { signature_generator: signature_generator ? "custom" : "default", document_children: count_children(@document), statements_count: @statements.size, freeze_blocks: freeze_blocks.size, }) end |
Instance Attribute Details
#document ⇒ Object (readonly)
Returns The root document node.
46 47 48 |
# File 'lib/markdown/merge/file_analysis_base.rb', line 46 def document @document end |
#errors ⇒ Array (readonly)
Returns Parse errors if any.
49 50 51 |
# File 'lib/markdown/merge/file_analysis_base.rb', line 49 def errors @errors end |
#statements ⇒ Array<Object, FreezeNode> (readonly)
Get all statements (block nodes outside freeze blocks + FreezeNode instances)
115 116 117 |
# File 'lib/markdown/merge/file_analysis_base.rb', line 115 def statements @statements end |
Instance Method Details
#compute_node_signature(node) ⇒ Array?
Compute default signature for a node
120 121 122 123 124 125 126 127 128 129 130 131 |
# File 'lib/markdown/merge/file_analysis_base.rb', line 120 def compute_node_signature(node) case node when Ast::Merge::FreezeNodeBase node.signature when LinkDefinitionNode node.signature when GapLineNode node.signature else compute_parser_signature(node) end end |
#compute_parser_signature(node) ⇒ Array?
Subclasses should override this method
Compute signature for a parser-specific node.
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 |
# File 'lib/markdown/merge/file_analysis_base.rb', line 158 def compute_parser_signature(node) type = node.type case type when :heading, :header # Content-based: Match headings by level and text content [:heading, node.header_level, extract_text_content(node)] when :paragraph # Content-based: Match paragraphs by content hash (first 32 chars of digest) text = extract_text_content(node) [:paragraph, Digest::SHA256.hexdigest(text)[0, 32]] when :code_block # Content-based: Match code blocks by fence info and content hash content = safe_string_content(node) fence_info = node.respond_to?(:fence_info) ? node.fence_info : nil [:code_block, fence_info, Digest::SHA256.hexdigest(content)[0, 16]] when :list # Structure-based: Match lists by type and item count (content may differ) list_type = node.respond_to?(:list_type) ? node.list_type : nil [:list, list_type, count_children(node)] when :block_quote, :blockquote # Content-based: Match block quotes by content hash text = extract_text_content(node) [:blockquote, Digest::SHA256.hexdigest(text)[0, 16]] when :thematic_break, :hrule # Structure-based: All thematic breaks are equivalent [:hrule] when :html_block, :html # Content-based: Match HTML blocks by content hash content = safe_string_content(node) [:html, Digest::SHA256.hexdigest(content)[0, 16]] when :table # Content-based: Match tables by structure and header content header_content = extract_table_header_content(node) [:table, count_children(node), Digest::SHA256.hexdigest(header_content)[0, 16]] when :footnote_definition # Name/label-based: Match footnotes by name or label label = node.respond_to?(:name) ? node.name : safe_string_content(node) [:footnote_definition, label] when :custom_block # Content-based: Match custom blocks by content hash text = extract_text_content(node) [:custom_block, Digest::SHA256.hexdigest(text)[0, 16]] else # Unknown type - use type and position pos = node.source_position [:unknown, type, pos&.dig(:start_line)] end end |
#extract_text_content(node) ⇒ String
Extract all text content from a node and its children
220 221 222 223 224 225 226 227 228 229 230 |
# File 'lib/markdown/merge/file_analysis_base.rb', line 220 def extract_text_content(node) text_parts = [] node.walk do |child| if child.type == :text text_parts << child.string_content.to_s elsif child.type == :code text_parts << child.string_content.to_s end end text_parts.join end |
#fallthrough_node?(value) ⇒ Boolean
Override to detect parser nodes for signature generator fallthrough
136 137 138 139 140 141 142 |
# File 'lib/markdown/merge/file_analysis_base.rb', line 136 def fallthrough_node?(value) value.is_a?(Ast::Merge::FreezeNodeBase) || value.is_a?(LinkDefinitionNode) || value.is_a?(GapLineNode) || parser_node?(value) || super end |
#next_sibling(node) ⇒ Object?
Subclasses must implement this method
Get the next sibling of a node.
Different parsers use different methods (next vs next_sibling).
103 104 105 |
# File 'lib/markdown/merge/file_analysis_base.rb', line 103 def next_sibling(node) raise NotImplementedError, "#{self.class} must implement #next_sibling" end |
#parse_document(source) ⇒ Object
Subclasses must implement this method
Parse the source document.
92 93 94 |
# File 'lib/markdown/merge/file_analysis_base.rb', line 92 def parse_document(source) raise NotImplementedError, "#{self.class} must implement #parse_document" end |
#parser_node?(value) ⇒ Boolean
Check if value is a parser-specific node.
148 149 150 151 |
# File 'lib/markdown/merge/file_analysis_base.rb', line 148 def parser_node?(value) # Default: check if it responds to :type (common for AST nodes) value.respond_to?(:type) end |
#safe_string_content(node) ⇒ String
Safely get string content from a node
210 211 212 213 214 215 |
# File 'lib/markdown/merge/file_analysis_base.rb', line 210 def safe_string_content(node) node.string_content.to_s rescue TypeError # Some node types don't support string_content extract_text_content(node) end |
#source_range(start_line, end_line) ⇒ String
Get the source text for a range of lines
Lines are joined with newlines, and each line gets a trailing newline except for the last line of the file (which may or may not have one in the original).
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 |
# File 'lib/markdown/merge/file_analysis_base.rb', line 240 def source_range(start_line, end_line) return "" if start_line < 1 || end_line < start_line extracted_lines = @lines[(start_line - 1)..(end_line - 1)] return "" if extracted_lines.empty? # Add newlines between and after lines, but not after the last line of the file # unless it originally had one result = extracted_lines.join("\n") # Add trailing newline if this isn't the last line of the file # (the last line may or may not have a trailing newline in the original source) if end_line < @lines.length result += "\n" elsif @source&.end_with?("\n") # Last line of file, but original source ends with newline result += "\n" end result end |
#valid? ⇒ Boolean
Check if parse was successful
109 110 111 |
# File 'lib/markdown/merge/file_analysis_base.rb', line 109 def valid? @errors.empty? && !@document.nil? end |