Class: Ast::Merge::FencedCodeBlockDetector

Inherits:
RegionDetectorBase show all
Defined in:
lib/ast/merge/fenced_code_block_detector.rb

Overview

Detects fenced code blocks with a specific language identifier.

This detector finds Markdown-style fenced code blocks (using “‘ or ~~~) that have a specific language identifier. It can be configured for any language: ruby, json, yaml, mermaid, etc.

## When to Use This Detector

**Use FencedCodeBlockDetector when:**

  • Working with raw Markdown text without parsing to AST

  • Quick extraction from strings without parser dependencies

  • Custom text processing requiring line-level precision

  • Operating on source text directly (e.g., linters, formatters)

**Do NOT use FencedCodeBlockDetector when:**

  • Working with parsed Markdown AST (use native code block nodes instead)

  • Integrating with markdown-merge’s CodeBlockMerger (it uses native nodes)

  • Using tree_haver’s unified Markdown backend API

## Comparison: FencedCodeBlockDetector vs Native AST Nodes

### Native AST Approach (Preferred for AST-based Tools)

When working with parsed Markdown AST via tree_haver (commonmarker/markly backends):

“‘ruby # markdown-merge’s CodeBlockMerger uses this approach: language = node.fence_info.split(/s+/).first # e.g., “ruby” content = node.string_content # Raw code inside block

# Then delegate to language-specific parser: case language when “ruby”

merger = Prism::Merge::SmartMerger.new(template, dest, preference: :destination)
merged_content = merger.merge  # Prism parses Ruby code into full AST!

when “yaml”

merger = Psych::Merge::SmartMerger.new(template, dest, preference: :destination)
merged_content = merger.merge  # Psych parses YAML into AST!

when “json”

merger = Json::Merge::SmartMerger.new(template, dest, preference: :destination)
merged_content = merger.merge  # JSON parser creates AST!

when “bash”

merger = Bash::Merge::SmartMerger.new(template, dest, preference: :destination)
merged_content = merger.merge  # tree-sitter parses bash into AST!

end “‘

**Advantages of Native AST approach:**

  • ✓ Parser handles all edge cases (nested backticks, indentation, etc.)

  • ✓ Respects node boundaries from authoritative source

  • ✓ No regex brittleness

  • ✓ Automatic handling of “‘ and ~~~ fence styles

  • ✓ Enables TRUE language-aware merging (not just text replacement)

  • ✓ Language-specific parsers create full ASTs of embedded code

  • ✓ Smart merging at semantic level (method definitions, YAML keys, JSON properties)

### Text-Based Approach (This Class)

When working with raw text:

“‘ruby detector = FencedCodeBlockDetector.ruby regions = detector.detect_all(markdown_text) regions.each do |region|

puts "Ruby code at lines #{region.start_line}-#{region.end_line}"
# region.content is just a string - NO parsing happens

end “‘

**Limitations of text-based approach:**

  • • Uses regex to find blocks (may miss edge cases)

  • • Returns strings, not parsed structures

  • • Cannot perform semantic merging

  • • Manual handling of fence variations

  • • No language-specific intelligence

## Real-World Example: markdown-merge Inner Code Block Merging

When ‘inner_merge_code_blocks: true` is enabled in markdown-merge:

  1. **Markdown Parser** (commonmarker/markly) parses markdown into AST

    • Creates code_block nodes with ‘fence_info` and `string_content`

  2. CodeBlockMerger extracts code using native node properties: “‘ruby language = node.fence_info.split(/s+/).first template_code = template_node.string_content dest_code = dest_node.string_content “`

  3. **Language-Specific Parser** creates FULL AST of the embedded code:

    • ‘Prism::Merge` → Prism parses Ruby into complete AST (ClassNode, DefNode, etc.)

    • ‘Psych::Merge` → Psych parses YAML into document structure

    • ‘Json::Merge` → JSON parser creates object/array tree

    • ‘Bash::Merge` → tree-sitter creates bash statement AST

  4. **Smart Merger** performs SEMANTIC merging at AST level:

    • Ruby: Merges class definitions, preserves custom methods

    • YAML: Merges keys, preserves custom configuration values

    • JSON: Merges objects, destination values win on conflicts

    • Bash: Merges statements, preserves custom exports

  5. Result is intelligently merged code, not simple text concatenation!

**This means:** The embedded code is FULLY PARSED by its native language parser, enabling true semantic-level merging. FencedCodeBlockDetector would only find the text boundaries - it cannot perform this semantic merging.

Examples:

Detecting Ruby code blocks

detector = FencedCodeBlockDetector.new("ruby", aliases: ["rb"])
regions = detector.detect_all(markdown_source)

Using factory methods

detector = FencedCodeBlockDetector.ruby
detector = FencedCodeBlockDetector.yaml
detector = FencedCodeBlockDetector.json

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from RegionDetectorBase

#name, #strip_delimiters?

Constructor Details

#initialize(language, aliases: []) ⇒ FencedCodeBlockDetector

Creates a new detector for the specified language.

Parameters:

  • language (String, Symbol)

    The language identifier (e.g., “ruby”, “json”)

  • aliases (Array<String, Symbol>) (defaults to: [])

    Alternative identifiers (e.g., [“rb”] for ruby)



134
135
136
137
138
139
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 134

def initialize(language, aliases: [])
  super()
  @language = language.to_s.downcase
  @aliases = aliases.map { |a| a.to_s.downcase }
  @all_identifiers = [@language] + @aliases
end

Instance Attribute Details

#aliasesArray<String> (readonly)

Returns Alternative language identifiers.

Returns:

  • (Array<String>)

    Alternative language identifiers



128
129
130
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 128

def aliases
  @aliases
end

#languageString (readonly)

Returns The primary language identifier.

Returns:

  • (String)

    The primary language identifier



125
126
127
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 125

def language
  @language
end

Class Method Details

.bashFencedCodeBlockDetector

Creates a detector for Bash/Shell code blocks.



283
284
285
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 283

def bash
  new("bash", aliases: ["sh", "shell", "zsh"])
end

.cssFencedCodeBlockDetector

Creates a detector for CSS code blocks.



301
302
303
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 301

def css
  new("css")
end

.htmlFencedCodeBlockDetector

Creates a detector for HTML code blocks.



295
296
297
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 295

def html
  new("html")
end

.javascriptFencedCodeBlockDetector

Creates a detector for JavaScript code blocks.



265
266
267
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 265

def javascript
  new("javascript", aliases: ["js"])
end

.jsonFencedCodeBlockDetector

Creates a detector for JSON code blocks.



241
242
243
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 241

def json
  new("json")
end

.markdownFencedCodeBlockDetector

Creates a detector for Markdown code blocks (nested markdown).



307
308
309
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 307

def markdown
  new("markdown", aliases: ["md"])
end

.mermaidFencedCodeBlockDetector

Creates a detector for Mermaid diagram blocks.



259
260
261
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 259

def mermaid
  new("mermaid")
end

.pythonFencedCodeBlockDetector

Creates a detector for Python code blocks.



277
278
279
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 277

def python
  new("python", aliases: ["py"])
end

.rubyFencedCodeBlockDetector

Creates a detector for Ruby code blocks.



235
236
237
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 235

def ruby
  new("ruby", aliases: ["rb"])
end

.sqlFencedCodeBlockDetector

Creates a detector for SQL code blocks.



289
290
291
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 289

def sql
  new("sql")
end

.tomlFencedCodeBlockDetector

Creates a detector for TOML code blocks.



253
254
255
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 253

def toml
  new("toml")
end

.typescriptFencedCodeBlockDetector

Creates a detector for TypeScript code blocks.



271
272
273
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 271

def typescript
  new("typescript", aliases: ["ts"])
end

.yamlFencedCodeBlockDetector

Creates a detector for YAML code blocks.



247
248
249
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 247

def yaml
  new("yaml", aliases: ["yml"])
end

Instance Method Details

#detect_all(source) ⇒ Array<Region>

Detects all fenced code blocks with the configured language.

Parameters:

  • source (String)

    The full document content

Returns:

  • (Array<Region>)

    All detected code blocks, sorted by start_line



158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 158

def detect_all(source)
  return [] if source.nil? || source.empty?

  regions = []
  lines = source.lines
  in_block = false
  start_line = nil
  content_lines = []
  current_language = nil
  fence_char = nil
  fence_length = nil
  indent = ""

  lines.each_with_index do |line, idx|
    line_num = idx + 1

    if !in_block
      # Match opening fence: ```lang or ~~~lang (optionally indented)
      match = line.match(/^(\s*)(`{3,}|~{3,})(\w*)\s*$/)
      if match
        indent = match[1] || ""
        fence = match[2]
        lang = match[3].downcase

        if @all_identifiers.include?(lang)
          in_block = true
          start_line = line_num
          content_lines = []
          current_language = lang
          fence_char = fence[0]
          fence_length = fence.length
        end
      end
    elsif line.match?(/^#{Regexp.escape(indent)}#{Regexp.escape(fence_char)}{#{fence_length},}\s*$/)
      # Match closing fence (must use same char, same indent, and at least same length)
      opening_fence = "#{fence_char * fence_length}#{current_language}"
      closing_fence = fence_char * fence_length

      regions << build_region(
        type: region_type,
        content: content_lines.join,
        start_line: start_line,
        end_line: line_num,
        delimiters: [opening_fence, closing_fence],
        metadata: {language: current_language, indent: indent.empty? ? nil : indent},
      )
      in_block = false
      start_line = nil
      content_lines = []
      current_language = nil
      fence_char = nil
      fence_length = nil
      indent = ""
    else
      # Accumulate content lines (strip the indent if present)
      content_lines << if indent.empty?
        line
      else
        # Strip the common indent from content lines
        line.sub(/^#{Regexp.escape(indent)}/, "")
      end
    end
  end

  # Note: Unclosed blocks are ignored (no region created)
  regions
end

#inspectString

Returns A description of this detector.

Returns:

  • (String)

    A description of this detector



227
228
229
230
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 227

def inspect
  aliases_str = @aliases.empty? ? "" : " aliases=#{@aliases.inspect}"
  "#<#{self.class.name} language=#{@language}#{aliases_str}>"
end

#matches_language?(lang) ⇒ Boolean

Check if a language identifier matches this detector.

Parameters:

  • lang (String)

    The language identifier to check

Returns:

  • (Boolean)

    true if the language matches



150
151
152
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 150

def matches_language?(lang)
  @all_identifiers.include?(lang.to_s.downcase)
end

#region_typeSymbol

Returns The region type (e.g., :ruby_code_block).

Returns:

  • (Symbol)

    The region type (e.g., :ruby_code_block)



142
143
144
# File 'lib/ast/merge/fenced_code_block_detector.rb', line 142

def region_type
  :"#{@language}_code_block"
end