Class: Ast::Merge::FencedCodeBlockDetector
- Inherits:
-
RegionDetectorBase
- Object
- RegionDetectorBase
- Ast::Merge::FencedCodeBlockDetector
- Defined in:
- lib/ast/merge/fenced_code_block_detector.rb
Overview
Detects fenced code blocks with a specific language identifier.
This detector finds Markdown-style fenced code blocks (using “‘ or ~~~) that have a specific language identifier. It can be configured for any language: ruby, json, yaml, mermaid, etc.
## When to Use This Detector
**Use FencedCodeBlockDetector when:**
-
Working with raw Markdown text without parsing to AST
-
Quick extraction from strings without parser dependencies
-
Custom text processing requiring line-level precision
-
Operating on source text directly (e.g., linters, formatters)
**Do NOT use FencedCodeBlockDetector when:**
-
Working with parsed Markdown AST (use native code block nodes instead)
-
Integrating with markdown-merge’s CodeBlockMerger (it uses native nodes)
-
Using tree_haver’s unified Markdown backend API
## Comparison: FencedCodeBlockDetector vs Native AST Nodes
### Native AST Approach (Preferred for AST-based Tools)
When working with parsed Markdown AST via tree_haver (commonmarker/markly backends):
“‘ruby # markdown-merge’s CodeBlockMerger uses this approach: language = node.fence_info.split(/s+/).first # e.g., “ruby” content = node.string_content # Raw code inside block
# Then delegate to language-specific parser: case language when “ruby”
merger = Prism::Merge::SmartMerger.new(template, dest, preference: :destination)
merged_content = merger.merge # Prism parses Ruby code into full AST!
when “yaml”
merger = Psych::Merge::SmartMerger.new(template, dest, preference: :destination)
merged_content = merger.merge # Psych parses YAML into AST!
when “json”
merger = Json::Merge::SmartMerger.new(template, dest, preference: :destination)
merged_content = merger.merge # JSON parser creates AST!
when “bash”
merger = Bash::Merge::SmartMerger.new(template, dest, preference: :destination)
merged_content = merger.merge # tree-sitter parses bash into AST!
end “‘
**Advantages of Native AST approach:**
-
✓ Parser handles all edge cases (nested backticks, indentation, etc.)
-
✓ Respects node boundaries from authoritative source
-
✓ No regex brittleness
-
✓ Automatic handling of “‘ and ~~~ fence styles
-
✓ Enables TRUE language-aware merging (not just text replacement)
-
✓ Language-specific parsers create full ASTs of embedded code
-
✓ Smart merging at semantic level (method definitions, YAML keys, JSON properties)
### Text-Based Approach (This Class)
When working with raw text:
“‘ruby detector = FencedCodeBlockDetector.ruby regions = detector.detect_all(markdown_text) regions.each do |region|
puts "Ruby code at lines #{region.start_line}-#{region.end_line}"
# region.content is just a string - NO parsing happens
end “‘
**Limitations of text-based approach:**
-
• Uses regex to find blocks (may miss edge cases)
-
• Returns strings, not parsed structures
-
• Cannot perform semantic merging
-
• Manual handling of fence variations
-
• No language-specific intelligence
## Real-World Example: markdown-merge Inner Code Block Merging
When ‘inner_merge_code_blocks: true` is enabled in markdown-merge:
-
**Markdown Parser** (commonmarker/markly) parses markdown into AST
-
Creates code_block nodes with ‘fence_info` and `string_content`
-
-
CodeBlockMerger extracts code using native node properties: “‘ruby language = node.fence_info.split(/s+/).first template_code = template_node.string_content dest_code = dest_node.string_content “`
-
**Language-Specific Parser** creates FULL AST of the embedded code:
-
‘Prism::Merge` → Prism parses Ruby into complete AST (ClassNode, DefNode, etc.)
-
‘Psych::Merge` → Psych parses YAML into document structure
-
‘Json::Merge` → JSON parser creates object/array tree
-
‘Bash::Merge` → tree-sitter creates bash statement AST
-
-
**Smart Merger** performs SEMANTIC merging at AST level:
-
Ruby: Merges class definitions, preserves custom methods
-
YAML: Merges keys, preserves custom configuration values
-
JSON: Merges objects, destination values win on conflicts
-
Bash: Merges statements, preserves custom exports
-
-
Result is intelligently merged code, not simple text concatenation!
**This means:** The embedded code is FULLY PARSED by its native language parser, enabling true semantic-level merging. FencedCodeBlockDetector would only find the text boundaries - it cannot perform this semantic merging.
Instance Attribute Summary collapse
-
#aliases ⇒ Array<String>
readonly
Alternative language identifiers.
-
#language ⇒ String
readonly
The primary language identifier.
Class Method Summary collapse
-
.bash ⇒ FencedCodeBlockDetector
Creates a detector for Bash/Shell code blocks.
-
.css ⇒ FencedCodeBlockDetector
Creates a detector for CSS code blocks.
-
.html ⇒ FencedCodeBlockDetector
Creates a detector for HTML code blocks.
-
.javascript ⇒ FencedCodeBlockDetector
Creates a detector for JavaScript code blocks.
-
.json ⇒ FencedCodeBlockDetector
Creates a detector for JSON code blocks.
-
.markdown ⇒ FencedCodeBlockDetector
Creates a detector for Markdown code blocks (nested markdown).
-
.mermaid ⇒ FencedCodeBlockDetector
Creates a detector for Mermaid diagram blocks.
-
.python ⇒ FencedCodeBlockDetector
Creates a detector for Python code blocks.
-
.ruby ⇒ FencedCodeBlockDetector
Creates a detector for Ruby code blocks.
-
.sql ⇒ FencedCodeBlockDetector
Creates a detector for SQL code blocks.
-
.toml ⇒ FencedCodeBlockDetector
Creates a detector for TOML code blocks.
-
.typescript ⇒ FencedCodeBlockDetector
Creates a detector for TypeScript code blocks.
-
.yaml ⇒ FencedCodeBlockDetector
Creates a detector for YAML code blocks.
Instance Method Summary collapse
-
#detect_all(source) ⇒ Array<Region>
Detects all fenced code blocks with the configured language.
-
#initialize(language, aliases: []) ⇒ FencedCodeBlockDetector
constructor
Creates a new detector for the specified language.
-
#inspect ⇒ String
A description of this detector.
-
#matches_language?(lang) ⇒ Boolean
Check if a language identifier matches this detector.
-
#region_type ⇒ Symbol
The region type (e.g., :ruby_code_block).
Methods inherited from RegionDetectorBase
Constructor Details
#initialize(language, aliases: []) ⇒ FencedCodeBlockDetector
Creates a new detector for the specified language.
134 135 136 137 138 139 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 134 def initialize(language, aliases: []) super() @language = language.to_s.downcase @aliases = aliases.map { |a| a.to_s.downcase } @all_identifiers = [@language] + @aliases end |
Instance Attribute Details
#aliases ⇒ Array<String> (readonly)
Returns Alternative language identifiers.
128 129 130 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 128 def aliases @aliases end |
#language ⇒ String (readonly)
Returns The primary language identifier.
125 126 127 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 125 def language @language end |
Class Method Details
.bash ⇒ FencedCodeBlockDetector
Creates a detector for Bash/Shell code blocks.
283 284 285 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 283 def bash new("bash", aliases: ["sh", "shell", "zsh"]) end |
.css ⇒ FencedCodeBlockDetector
Creates a detector for CSS code blocks.
301 302 303 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 301 def css new("css") end |
.html ⇒ FencedCodeBlockDetector
Creates a detector for HTML code blocks.
295 296 297 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 295 def html new("html") end |
.javascript ⇒ FencedCodeBlockDetector
Creates a detector for JavaScript code blocks.
265 266 267 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 265 def javascript new("javascript", aliases: ["js"]) end |
.json ⇒ FencedCodeBlockDetector
Creates a detector for JSON code blocks.
241 242 243 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 241 def json new("json") end |
.markdown ⇒ FencedCodeBlockDetector
Creates a detector for Markdown code blocks (nested markdown).
307 308 309 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 307 def markdown new("markdown", aliases: ["md"]) end |
.mermaid ⇒ FencedCodeBlockDetector
Creates a detector for Mermaid diagram blocks.
259 260 261 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 259 def mermaid new("mermaid") end |
.python ⇒ FencedCodeBlockDetector
Creates a detector for Python code blocks.
277 278 279 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 277 def python new("python", aliases: ["py"]) end |
.ruby ⇒ FencedCodeBlockDetector
Creates a detector for Ruby code blocks.
235 236 237 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 235 def ruby new("ruby", aliases: ["rb"]) end |
.sql ⇒ FencedCodeBlockDetector
Creates a detector for SQL code blocks.
289 290 291 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 289 def sql new("sql") end |
.toml ⇒ FencedCodeBlockDetector
Creates a detector for TOML code blocks.
253 254 255 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 253 def toml new("toml") end |
.typescript ⇒ FencedCodeBlockDetector
Creates a detector for TypeScript code blocks.
271 272 273 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 271 def typescript new("typescript", aliases: ["ts"]) end |
.yaml ⇒ FencedCodeBlockDetector
Creates a detector for YAML code blocks.
247 248 249 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 247 def yaml new("yaml", aliases: ["yml"]) end |
Instance Method Details
#detect_all(source) ⇒ Array<Region>
Detects all fenced code blocks with the configured language.
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 158 def detect_all(source) return [] if source.nil? || source.empty? regions = [] lines = source.lines in_block = false start_line = nil content_lines = [] current_language = nil fence_char = nil fence_length = nil indent = "" lines.each_with_index do |line, idx| line_num = idx + 1 if !in_block # Match opening fence: ```lang or ~~~lang (optionally indented) match = line.match(/^(\s*)(`{3,}|~{3,})(\w*)\s*$/) if match indent = match[1] || "" fence = match[2] lang = match[3].downcase if @all_identifiers.include?(lang) in_block = true start_line = line_num content_lines = [] current_language = lang fence_char = fence[0] fence_length = fence.length end end elsif line.match?(/^#{Regexp.escape(indent)}#{Regexp.escape(fence_char)}{#{fence_length},}\s*$/) # Match closing fence (must use same char, same indent, and at least same length) opening_fence = "#{fence_char * fence_length}#{current_language}" closing_fence = fence_char * fence_length regions << build_region( type: region_type, content: content_lines.join, start_line: start_line, end_line: line_num, delimiters: [opening_fence, closing_fence], metadata: {language: current_language, indent: indent.empty? ? nil : indent}, ) in_block = false start_line = nil content_lines = [] current_language = nil fence_char = nil fence_length = nil indent = "" else # Accumulate content lines (strip the indent if present) content_lines << if indent.empty? line else # Strip the common indent from content lines line.sub(/^#{Regexp.escape(indent)}/, "") end end end # Note: Unclosed blocks are ignored (no region created) regions end |
#inspect ⇒ String
Returns A description of this detector.
227 228 229 230 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 227 def inspect aliases_str = @aliases.empty? ? "" : " aliases=#{@aliases.inspect}" "#<#{self.class.name} language=#{@language}#{aliases_str}>" end |
#matches_language?(lang) ⇒ Boolean
Check if a language identifier matches this detector.
150 151 152 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 150 def matches_language?(lang) @all_identifiers.include?(lang.to_s.downcase) end |
#region_type ⇒ Symbol
Returns The region type (e.g., :ruby_code_block).
142 143 144 |
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 142 def region_type :"#{@language}_code_block" end |