Class: Markdown::Merge::LinkParser

Inherits:
Object
  • Object
show all
Defined in:
lib/markdown/merge/link_parser.rb

Overview

Parslet-based parser for markdown link structures.

This parser extracts:

  • Link reference definitions: ‘[label]: url` or `[label]: url “title”`

  • Inline links: ‘[text](url)` or `[text](url “title”)`

  • Inline images: ‘![alt](url)` or `![alt](url “title”)`

  • Linked images: ‘[![alt](img-url)](link-url)` (nested structures)

Handles complex cases like:

  • Emoji in labels (e.g., ‘[🖼️galtzo-discord]`)

  • Nested brackets (for linked images like ‘[![alt]](url)`)

  • Multi-byte UTF-8 characters

Examples:

Parse link definitions

parser = LinkParser.new
defs = parser.parse_definitions("[example]: https://example.com\n[🎨logo]: https://logo.png")
# => [{ label: "example", url: "https://example.com" }, { label: "🎨logo", url: "https://logo.png" }]

Find inline links with nested structure detection

parser = LinkParser.new
items = parser.find_all_link_constructs("Click [![Logo](img.png)](link.com) here")
# Returns a tree structure with :children for nested items

Defined Under Namespace

Classes: DefinitionGrammar, InlineImageGrammar, InlineLinkGrammar

Instance Method Summary collapse

Constructor Details

#initializeLinkParser

Returns a new instance of LinkParser.



143
144
145
146
147
# File 'lib/markdown/merge/link_parser.rb', line 143

def initialize
  @definition_grammar = DefinitionGrammar.new
  @link_grammar = InlineLinkGrammar.new
  @image_grammar = InlineImageGrammar.new
end

Instance Method Details

Build a tree structure from links and images, detecting nesting.

Parameters:

  • links (Array<Hash>)

    Links with :start_pos and :end_pos

  • images (Array<Hash>)

    Images with :start_pos and :end_pos

Returns:

  • (Array<Hash>)

    Links/images with :children for nested items



239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
# File 'lib/markdown/merge/link_parser.rb', line 239

def build_link_tree(links, images)
  # Combine all items
  all_items = links.map { |l| l.merge(type: :link) } +
    images.map { |i| i.merge(type: :image) }

  # Sort by start position
  sorted = all_items.sort_by { |item| item[:start_pos] }

  result = []
  skip_until = -1

  sorted.each do |item|
    # Skip items that are children of a previous item
    next if item[:start_pos] < skip_until

    # Find any items nested inside this one
    children = sorted.select do |other|
      other[:start_pos] > item[:start_pos] &&
        other[:end_pos] <= item[:end_pos] &&
        other != item
    end

    if children.any?
      item = item.merge(children: children)
      # Mark children to be skipped
      skip_until = item[:end_pos]
    end

    result << item
  end

  result
end

#build_url_to_label_map(definitions) ⇒ Hash<String, String>

Build URL to label mapping from definitions.

Parameters:

  • definitions (Array<Hash>)

    From parse_definitions

Returns:

  • (Hash<String, String>)

    URL => best label



205
206
207
208
209
210
211
212
213
214
215
# File 'lib/markdown/merge/link_parser.rb', line 205

def build_url_to_label_map(definitions)
  url_to_labels = Hash.new { |h, k| h[k] = [] }

  definitions.each do |defn|
    url_to_labels[defn[:url]] << defn[:label]
  end

  url_to_labels.transform_values do |labels|
    labels.min_by { |l| [l.length, l] }
  end
end

Find all link constructs (links and images) with proper nesting structure.

This method returns a flat list of items where linked images are represented as a single item with :children containing the nested image. This allows for proper replacement from leaves to root.

Parameters:

  • content (String)

    Markdown content

Returns:

  • (Array<Hash>)

    Array of link/image constructs with :children for nested items



225
226
227
228
229
230
231
232
# File 'lib/markdown/merge/link_parser.rb', line 225

def find_all_link_constructs(content)
  # Find all images and links
  images = find_inline_images(content)
  links = find_inline_links(content)

  # Build a tree structure where images inside links are children
  build_link_tree(links, images)
end

#find_inline_images(content) ⇒ Array<Hash>

Find all inline images in content with positions.

Parameters:

  • content (String)

    Markdown content

Returns:

  • (Array<Hash>)

    Array of { alt:, url:, title:, start_pos:, end_pos: }



197
198
199
# File 'lib/markdown/merge/link_parser.rb', line 197

def find_inline_images(content)
  find_constructs(content, :image)
end

Find all inline links in content with positions.

Parameters:

  • content (String)

    Markdown content

Returns:

  • (Array<Hash>)

    Array of { text:, url:, title:, start_pos:, end_pos: }



189
190
191
# File 'lib/markdown/merge/link_parser.rb', line 189

def find_inline_links(content)
  find_constructs(content, :link)
end

#flatten_leaf_first(items) ⇒ Array<Hash>

Flatten a tree of link constructs to leaf-first order for processing.

This is useful for replacement operations where we want to process innermost items first (depth-first, post-order traversal).

Parameters:

  • items (Array<Hash>)

    Items from find_all_link_constructs

Returns:

  • (Array<Hash>)

    Items in leaf-first order (children before parents)



280
281
282
283
284
285
286
287
288
289
290
291
292
293
# File 'lib/markdown/merge/link_parser.rb', line 280

def flatten_leaf_first(items)
  result = []

  items.each do |item|
    if item[:children]
      # First add children (recursively), then the parent
      result.concat(flatten_leaf_first(item[:children]))
    end
    # Add the item without children key for cleaner processing
    result << item.except(:children)
  end

  result
end

#parse_definition_line(line) ⇒ Hash?

Parse a single line as a link reference definition.

Parameters:

  • line (String)

    A single line

Returns:

  • (Hash, nil)

    { label:, url:, title: } or nil



168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
# File 'lib/markdown/merge/link_parser.rb', line 168

def parse_definition_line(line)
  result = @definition_grammar.parse(line)

  url = result[:url].to_s
  # Strip angle brackets if present
  url = url[1..-2] if url.start_with?("<") && url.end_with?(">")

  definition = {
    label: result[:label].to_s,
    url: url,
  }
  definition[:title] = result[:title].to_s if result[:title]
  definition
rescue Parslet::ParseFailed
  nil
end

#parse_definitions(content) ⇒ Array<Hash>

Parse link reference definitions from content.

Parameters:

  • content (String)

    Markdown content

Returns:

  • (Array<Hash>)

    Array of { label:, url:, title: (optional) }



153
154
155
156
157
158
159
160
161
162
# File 'lib/markdown/merge/link_parser.rb', line 153

def parse_definitions(content)
  definitions = []

  content.each_line do |line|
    result = parse_definition_line(line.chomp)
    definitions << result if result
  end

  definitions
end