Class: Markdown::Merge::Cleanse::CodeFenceSpacing

Inherits:

Object

Object
Markdown::Merge::Cleanse::CodeFenceSpacing

show all

Defined in:: lib/markdown/merge/cleanse/code_fence_spacing.rb

Overview

Parslet-based parser for fixing malformed fenced code blocks in Markdown.

The Problem

This class fixes **improperly formatted fenced code blocks** where there is unwanted whitespace between the fence markers (“‘ or ~~~) and the language identifier.

A bug in ast-merge (or its dependencies) caused fenced code blocks to be rendered with a space between the fence markers and the language identifier.

Bug Pattern

CommonMark and most Markdown parsers expect NO space between fence and language:

Correct: ‘ ``ruby or ` ~~~python`
Incorrect: ‘ “` ruby` or ` ~~~ python` (extra space)

The extra space can cause:

Syntax highlighting to fail
The language identifier to be ignored
Rendering issues in various Markdown processors

Scope

This fixer handles:

**Any indentation level** (0+ spaces before fence)
- Top-level: ‘ ``ruby
- In lists: ‘ ``python (4 spaces)
**Both fence types:** backticks (“‘) and tildes (~~~)
**Any fence length:** 3+ markers (““, ~~~~~, etc.)

How It Works

The parser uses a **PEG grammar** (via Parslet) to:

Detect fence opening lines with optional indentation
Identify spacing between fence and language identifier
Track opening/closing fence pairs to avoid false positives
Reconstruct fences with proper formatting (no space)

**Why PEG?** The previous regex-based implementation used patterns like ‘([ t]*)` which can cause polynomial backtracking (ReDoS vulnerability) when processing malicious input with many tabs/spaces. PEG parsers are linear-time and immune to ReDoS attacks.

Examples:

Malformed (buggy) input

"``` console\nsome code\n```"

Fixed output

"```console\nsome code\n```"

Basic usage

parser = Markdown::Merge::Cleanse::CodeFenceSpacing.new(content)
fixed_content = parser.fix

Check if content has malformed fences

parser = Markdown::Merge::Cleanse::CodeFenceSpacing.new(content)
parser.malformed? # => true/false

Process a file

content = File.read("README.md")
parser = Markdown::Merge::Cleanse::CodeFenceSpacing.new(content)
if parser.malformed?
  File.write("README.md", parser.fix)
end

Get details about code blocks

parser = Markdown::Merge::Cleanse::CodeFenceSpacing.new(content)
parser.code_blocks.each do |block|
  puts "#{block[:fence]}#{block[:language]}: malformed=#{block[:malformed]}"
end

Defined Under Namespace

Classes: CodeFenceGrammar

Instance Attribute Summary collapse

#source ⇒ String readonly

The input text to parse.

Instance Method Summary collapse

#code_blocks ⇒ Array<Hash>

Parse and return information about all fenced code blocks.
#count ⇒ Integer

Count the total number of code blocks.
#fix ⇒ String

Fix malformed fenced code blocks by removing improper spacing.
#initialize(source) ⇒ CodeFenceSpacing constructor

Create a new parser for the given text.
#malformed? ⇒ Boolean

Check if the source contains malformed fenced code blocks.
#malformed_count ⇒ Integer

Count the number of malformed code blocks.

Constructor Details

#initialize(source) ⇒ `CodeFenceSpacing`

Create a new parser for the given text.

Parameters:

source (String) —

the text that may contain malformed code fences

# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 133

def initialize(source)
  @source = source.to_s
  @grammar = CodeFenceGrammar.new
  @code_blocks = nil
end

Instance Attribute Details

#source ⇒ `String` (readonly)

Returns the input text to parse.

Returns:

(String) —

the input text to parse



128
129
130

# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 128

def source
  @source
end

Instance Method Details

#code_blocks ⇒ `Array<Hash>`

Parse and return information about all fenced code blocks.

Only returns opening fences (not closing fences).

Returns:

(Array<Hash>) —
Array of code block info
- :indent [String] The indentation before the fence
- :fence [String] The fence markers (e.g., ““‘” or “~~~”)
- :language [String, nil] The language identifier
- :spacing [String] Any spacing between fence and language
- :malformed [Boolean] Whether this block has improper spacing
- :line_number [Integer] Line number where block starts (1-based)
- :original [String] The original opening fence line

# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 161

def code_blocks
  return @code_blocks if @code_blocks

  @code_blocks = []
  line_number = 0
  in_code_block = false
  current_fence_char = nil

  source.each_line do |line|
    line_number += 1

    # Try to parse as fence line using PEG grammar
    parsed = parse_fence_line(line)
    next unless parsed

    fence = parsed[:fence]
    fence_char = fence[0]
    spacing = parsed[:spacing] || ""
    info = parsed[:info] || ""
    indent = parsed[:indent] || ""

    # Closing fence: matches current fence type and has no info
    if in_code_block && fence_char == current_fence_char && info.empty?
      in_code_block = false
      current_fence_char = nil
      next
    end

    # Opening fence
    in_code_block = true
    current_fence_char = fence_char

    # Extract just the language (first word of info string)
    language = info.strip.split(/\s+/).first
    language = nil if language&.empty?

    @code_blocks << {
      indent: indent,
      fence: fence,
      language: language,
      info_string: info.strip,
      spacing: spacing,
      malformed: !spacing.empty? && !language.nil?,
      line_number: line_number,
      original: line.chomp,
    }
  end

  @code_blocks
end

#count ⇒ `Integer`

Count the total number of code blocks.

Returns:

(Integer) —

total number of fenced code blocks



239
240
241

# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 239

def count
  code_blocks.size
end

#fix ⇒ `String`

Fix malformed fenced code blocks by removing improper spacing.

Returns:

(String) —

the source with code fences fixed

# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 215

def fix
  return source unless malformed?

  result = source.dup

  # Process line by line, fixing malformed fences
  lines = result.lines
  fixed_lines = lines.map do |line|
    fix_fence_line(line)
  end

  fixed_lines.join
end

#malformed? ⇒ `Boolean`

Check if the source contains malformed fenced code blocks.

Detects the pattern where there’s whitespace between the fence markers and the language identifier.

Returns:

(Boolean) —

true if malformed fences are detected



145
146
147

# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 145

def malformed?
  code_blocks.any? { |block| block[:malformed] }
end

#malformed_count ⇒ `Integer`

Count the number of malformed code blocks.

Returns:

(Integer) —

number of malformed fences found



232
233
234

# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 232

def malformed_count
  code_blocks.count { |block| block[:malformed] }
end

Class: Markdown::Merge::Cleanse::CodeFenceSpacing

Overview

The Problem

Bug Pattern

Scope

How It Works

Examples:

Malformed (buggy) input

Fixed output

Basic usage

Check if content has malformed fences

Process a file

Get details about code blocks

Defined Under Namespace

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(source) ⇒ CodeFenceSpacing

Instance Attribute Details

#source ⇒ String (readonly)

Instance Method Details

#code_blocks ⇒ Array<Hash>

#count ⇒ Integer

#fix ⇒ String

#malformed? ⇒ Boolean

#malformed_count ⇒ Integer

#initialize(source) ⇒ `CodeFenceSpacing`

#source ⇒ `String` (readonly)

#code_blocks ⇒ `Array<Hash>`

#count ⇒ `Integer`

#fix ⇒ `String`

#malformed? ⇒ `Boolean`

#malformed_count ⇒ `Integer`