Class: Markdown::Merge::Cleanse::CodeFenceSpacing

Inherits:
Object
  • Object
show all
Defined in:
lib/markdown/merge/cleanse/code_fence_spacing.rb

Overview

Parslet-based parser for fixing malformed fenced code blocks in Markdown.

The Problem

This class fixes **improperly formatted fenced code blocks** where there is unwanted whitespace between the fence markers (“‘ or ~~~) and the language identifier.

A bug in ast-merge (or its dependencies) caused fenced code blocks to be rendered with a space between the fence markers and the language identifier.

Bug Pattern

CommonMark and most Markdown parsers expect NO space between fence and language:

  • Correct:``ruby or ` ~~~python`

  • Incorrect: ‘ “` ruby` or ` ~~~ python` (extra space)

The extra space can cause:

  • Syntax highlighting to fail

  • The language identifier to be ignored

  • Rendering issues in various Markdown processors

Scope

This fixer handles:

  • **Any indentation level** (0+ spaces before fence)

    • Top-level: ‘ ``ruby

    • In lists: ‘ ``python (4 spaces)

  • **Both fence types:** backticks (“‘) and tildes (~~~)

  • **Any fence length:** 3+ markers (““, ~~~~~, etc.)

How It Works

The parser uses a **PEG grammar** (via Parslet) to:

  • Detect fence opening lines with optional indentation

  • Identify spacing between fence and language identifier

  • Track opening/closing fence pairs to avoid false positives

  • Reconstruct fences with proper formatting (no space)

**Why PEG?** The previous regex-based implementation used patterns like ‘([ t]*)` which can cause polynomial backtracking (ReDoS vulnerability) when processing malicious input with many tabs/spaces. PEG parsers are linear-time and immune to ReDoS attacks.

Examples:

Malformed (buggy) input

"``` console\nsome code\n```"

Fixed output

"```console\nsome code\n```"

Basic usage

parser = Markdown::Merge::Cleanse::CodeFenceSpacing.new(content)
fixed_content = parser.fix

Check if content has malformed fences

parser = Markdown::Merge::Cleanse::CodeFenceSpacing.new(content)
parser.malformed? # => true/false

Process a file

content = File.read("README.md")
parser = Markdown::Merge::Cleanse::CodeFenceSpacing.new(content)
if parser.malformed?
  File.write("README.md", parser.fix)
end

Get details about code blocks

parser = Markdown::Merge::Cleanse::CodeFenceSpacing.new(content)
parser.code_blocks.each do |block|
  puts "#{block[:fence]}#{block[:language]}: malformed=#{block[:malformed]}"
end

Defined Under Namespace

Classes: CodeFenceGrammar

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(source) ⇒ CodeFenceSpacing

Create a new parser for the given text.

Parameters:

  • source (String)

    the text that may contain malformed code fences



133
134
135
136
137
# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 133

def initialize(source)
  @source = source.to_s
  @grammar = CodeFenceGrammar.new
  @code_blocks = nil
end

Instance Attribute Details

#sourceString (readonly)

Returns the input text to parse.

Returns:

  • (String)

    the input text to parse



128
129
130
# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 128

def source
  @source
end

Instance Method Details

#code_blocksArray<Hash>

Parse and return information about all fenced code blocks.

Only returns opening fences (not closing fences).

Returns:

  • (Array<Hash>)

    Array of code block info

    • :indent [String] The indentation before the fence

    • :fence [String] The fence markers (e.g., ““‘” or “~~~”)

    • :language [String, nil] The language identifier

    • :spacing [String] Any spacing between fence and language

    • :malformed [Boolean] Whether this block has improper spacing

    • :line_number [Integer] Line number where block starts (1-based)

    • :original [String] The original opening fence line



161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 161

def code_blocks
  return @code_blocks if @code_blocks

  @code_blocks = []
  line_number = 0
  in_code_block = false
  current_fence_char = nil

  source.each_line do |line|
    line_number += 1

    # Try to parse as fence line using PEG grammar
    parsed = parse_fence_line(line)
    next unless parsed

    fence = parsed[:fence]
    fence_char = fence[0]
    spacing = parsed[:spacing] || ""
    info = parsed[:info] || ""
    indent = parsed[:indent] || ""

    # Closing fence: matches current fence type and has no info
    if in_code_block && fence_char == current_fence_char && info.empty?
      in_code_block = false
      current_fence_char = nil
      next
    end

    # Opening fence
    in_code_block = true
    current_fence_char = fence_char

    # Extract just the language (first word of info string)
    language = info.strip.split(/\s+/).first
    language = nil if language&.empty?

    @code_blocks << {
      indent: indent,
      fence: fence,
      language: language,
      info_string: info.strip,
      spacing: spacing,
      malformed: !spacing.empty? && !language.nil?,
      line_number: line_number,
      original: line.chomp,
    }
  end

  @code_blocks
end

#countInteger

Count the total number of code blocks.

Returns:

  • (Integer)

    total number of fenced code blocks



239
240
241
# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 239

def count
  code_blocks.size
end

#fixString

Fix malformed fenced code blocks by removing improper spacing.

Returns:

  • (String)

    the source with code fences fixed



215
216
217
218
219
220
221
222
223
224
225
226
227
# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 215

def fix
  return source unless malformed?

  result = source.dup

  # Process line by line, fixing malformed fences
  lines = result.lines
  fixed_lines = lines.map do |line|
    fix_fence_line(line)
  end

  fixed_lines.join
end

#malformed?Boolean

Check if the source contains malformed fenced code blocks.

Detects the pattern where there’s whitespace between the fence markers and the language identifier.

Returns:

  • (Boolean)

    true if malformed fences are detected



145
146
147
# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 145

def malformed?
  code_blocks.any? { |block| block[:malformed] }
end

#malformed_countInteger

Count the number of malformed code blocks.

Returns:

  • (Integer)

    number of malformed fences found



232
233
234
# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 232

def malformed_count
  code_blocks.count { |block| block[:malformed] }
end