Class: Markdown::Merge::Cleanse::CodeFenceSpacing
- Inherits:
-
Object
- Object
- Markdown::Merge::Cleanse::CodeFenceSpacing
- Defined in:
- lib/markdown/merge/cleanse/code_fence_spacing.rb
Overview
Parslet-based parser for fixing malformed fenced code blocks in Markdown.
The Problem
This class fixes **improperly formatted fenced code blocks** where there is unwanted whitespace between the fence markers (“‘ or ~~~) and the language identifier.
A bug in ast-merge (or its dependencies) caused fenced code blocks to be rendered with a space between the fence markers and the language identifier.
Bug Pattern
CommonMark and most Markdown parsers expect NO space between fence and language:
-
Correct: ‘
``rubyor ` ~~~python` -
Incorrect: ‘ “` ruby` or ` ~~~ python` (extra space)
The extra space can cause:
-
Syntax highlighting to fail
-
The language identifier to be ignored
-
Rendering issues in various Markdown processors
Scope
This fixer handles:
-
**Any indentation level** (0+ spaces before fence)
-
Top-level: ‘
``ruby -
In lists: ‘
``python(4 spaces)
-
-
**Both fence types:** backticks (“‘) and tildes (~~~)
-
**Any fence length:** 3+ markers (““, ~~~~~, etc.)
How It Works
The parser uses a **PEG grammar** (via Parslet) to:
-
Detect fence opening lines with optional indentation
-
Identify spacing between fence and language identifier
-
Track opening/closing fence pairs to avoid false positives
-
Reconstruct fences with proper formatting (no space)
**Why PEG?** The previous regex-based implementation used patterns like ‘([ t]*)` which can cause polynomial backtracking (ReDoS vulnerability) when processing malicious input with many tabs/spaces. PEG parsers are linear-time and immune to ReDoS attacks.
Defined Under Namespace
Classes: CodeFenceGrammar
Instance Attribute Summary collapse
-
#source ⇒ String
readonly
The input text to parse.
Instance Method Summary collapse
-
#code_blocks ⇒ Array<Hash>
Parse and return information about all fenced code blocks.
-
#count ⇒ Integer
Count the total number of code blocks.
-
#fix ⇒ String
Fix malformed fenced code blocks by removing improper spacing.
-
#initialize(source) ⇒ CodeFenceSpacing
constructor
Create a new parser for the given text.
-
#malformed? ⇒ Boolean
Check if the source contains malformed fenced code blocks.
-
#malformed_count ⇒ Integer
Count the number of malformed code blocks.
Constructor Details
#initialize(source) ⇒ CodeFenceSpacing
Create a new parser for the given text.
133 134 135 136 137 |
# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 133 def initialize(source) @source = source.to_s @grammar = CodeFenceGrammar.new @code_blocks = nil end |
Instance Attribute Details
#source ⇒ String (readonly)
Returns the input text to parse.
128 129 130 |
# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 128 def source @source end |
Instance Method Details
#code_blocks ⇒ Array<Hash>
Parse and return information about all fenced code blocks.
Only returns opening fences (not closing fences).
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 161 def code_blocks return @code_blocks if @code_blocks @code_blocks = [] line_number = 0 in_code_block = false current_fence_char = nil source.each_line do |line| line_number += 1 # Try to parse as fence line using PEG grammar parsed = parse_fence_line(line) next unless parsed fence = parsed[:fence] fence_char = fence[0] spacing = parsed[:spacing] || "" info = parsed[:info] || "" indent = parsed[:indent] || "" # Closing fence: matches current fence type and has no info if in_code_block && fence_char == current_fence_char && info.empty? in_code_block = false current_fence_char = nil next end # Opening fence in_code_block = true current_fence_char = fence_char # Extract just the language (first word of info string) language = info.strip.split(/\s+/).first language = nil if language&.empty? @code_blocks << { indent: indent, fence: fence, language: language, info_string: info.strip, spacing: spacing, malformed: !spacing.empty? && !language.nil?, line_number: line_number, original: line.chomp, } end @code_blocks end |
#count ⇒ Integer
Count the total number of code blocks.
239 240 241 |
# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 239 def count code_blocks.size end |
#fix ⇒ String
Fix malformed fenced code blocks by removing improper spacing.
215 216 217 218 219 220 221 222 223 224 225 226 227 |
# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 215 def fix return source unless malformed? result = source.dup # Process line by line, fixing malformed fences lines = result.lines fixed_lines = lines.map do |line| fix_fence_line(line) end fixed_lines.join end |
#malformed? ⇒ Boolean
Check if the source contains malformed fenced code blocks.
Detects the pattern where there’s whitespace between the fence markers and the language identifier.
145 146 147 |
# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 145 def malformed? code_blocks.any? { |block| block[:malformed] } end |
#malformed_count ⇒ Integer
Count the number of malformed code blocks.
232 233 234 |
# File 'lib/markdown/merge/cleanse/code_fence_spacing.rb', line 232 def malformed_count code_blocks.count { |block| block[:malformed] } end |