Class: Markdown::Merge::TableMatchAlgorithm

Inherits:
Object
  • Object
show all
Defined in:
lib/markdown/merge/table_match_algorithm.rb

Overview

Algorithm for computing match scores between two Markdown tables.

This algorithm uses multiple factors to determine how well two tables match:

  • (A) Percentage of matching header cells (using Levenshtein similarity)

  • (B) Percentage of matching cells in the first column (using Levenshtein similarity)

  • © Average percentage of matching cells in rows with matching first column

  • (D) Percentage of matching total cells

  • (E) Position distance weight (closer tables score higher)

Cell comparisons use Levenshtein distance to compute similarity, allowing partial matches (e.g., “Value” vs “Values” would get a high similarity score).

The final score is the weighted average of these factors.

Examples:

Basic usage

algorithm = TableMatchAlgorithm.new
score = algorithm.call(table_a, table_b)

With position information

algorithm = TableMatchAlgorithm.new(
  position_a: 0,  # First table in template
  position_b: 2,  # Third table in destination
  total_tables_a: 3,
  total_tables_b: 3
)
score = algorithm.call(table_a, table_b)

Constant Summary collapse

DEFAULT_WEIGHTS =

Default weights for each factor in the algorithm

{
  header_match: 0.25,      # (A) Header row matching
  first_column: 0.20,      # (B) First column matching
  row_content: 0.25,       # (C) Content in matching rows
  total_cells: 0.15,       # (D) Overall cell matching
  position: 0.15,          # (E) Position distance
}.freeze
FIRST_COLUMN_SIMILARITY_THRESHOLD =

Minimum similarity threshold to consider cells as potentially matching for first column lookup (used in row content matching)

0.7

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(position_a: nil, position_b: nil, total_tables_a: 1, total_tables_b: 1, weights: {}, backend: :commonmarker) ⇒ TableMatchAlgorithm

Initialize the table match algorithm.

Parameters:

  • position_a (Integer, nil) (defaults to: nil)

    Position of first table in its document

  • position_b (Integer, nil) (defaults to: nil)

    Position of second table in its document

  • total_tables_a (Integer) (defaults to: 1)

    Total tables in first document (default: 1)

  • total_tables_b (Integer) (defaults to: 1)

    Total tables in second document (default: 1)

  • weights (Hash) (defaults to: {})

    Custom weights for scoring factors

  • backend (Symbol) (defaults to: :commonmarker)

    Markdown backend for type normalization (default: :commonmarker)



71
72
73
74
75
76
77
78
# File 'lib/markdown/merge/table_match_algorithm.rb', line 71

def initialize(position_a: nil, position_b: nil, total_tables_a: 1, total_tables_b: 1, weights: {}, backend: :commonmarker)
  @position_a = position_a
  @position_b = position_b
  @total_tables_a = [total_tables_a, 1].max
  @total_tables_b = [total_tables_b, 1].max
  @weights = DEFAULT_WEIGHTS.merge(weights)
  @backend = backend
end

Instance Attribute Details

#backendSymbol (readonly)

Returns The markdown backend being used.

Returns:

  • (Symbol)

    The markdown backend being used



61
62
63
# File 'lib/markdown/merge/table_match_algorithm.rb', line 61

def backend
  @backend
end

#position_aInteger? (readonly)

Returns Position of table A in its document (0-indexed).

Returns:

  • (Integer, nil)

    Position of table A in its document (0-indexed)



46
47
48
# File 'lib/markdown/merge/table_match_algorithm.rb', line 46

def position_a
  @position_a
end

#position_bInteger? (readonly)

Returns Position of table B in its document (0-indexed).

Returns:

  • (Integer, nil)

    Position of table B in its document (0-indexed)



49
50
51
# File 'lib/markdown/merge/table_match_algorithm.rb', line 49

def position_b
  @position_b
end

#total_tables_aInteger (readonly)

Returns Total number of tables in document A.

Returns:

  • (Integer)

    Total number of tables in document A



52
53
54
# File 'lib/markdown/merge/table_match_algorithm.rb', line 52

def total_tables_a
  @total_tables_a
end

#total_tables_bInteger (readonly)

Returns Total number of tables in document B.

Returns:

  • (Integer)

    Total number of tables in document B



55
56
57
# File 'lib/markdown/merge/table_match_algorithm.rb', line 55

def total_tables_b
  @total_tables_b
end

#weightsHash (readonly)

Returns Weights for each scoring factor.

Returns:

  • (Hash)

    Weights for each scoring factor



58
59
60
# File 'lib/markdown/merge/table_match_algorithm.rb', line 58

def weights
  @weights
end

Instance Method Details

#call(table_a, table_b) ⇒ Float

Compute the match score between two tables.

Parameters:

  • table_a (Object)

    First table node

  • table_b (Object)

    Second table node

Returns:

  • (Float)

    Score between 0.0 and 1.0



85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# File 'lib/markdown/merge/table_match_algorithm.rb', line 85

def call(table_a, table_b)
  rows_a = extract_rows(table_a)
  rows_b = extract_rows(table_b)

  return 0.0 if rows_a.empty? || rows_b.empty?

  scores = {
    header_match: compute_header_match(rows_a, rows_b),
    first_column: compute_first_column_match(rows_a, rows_b),
    row_content: compute_row_content_match(rows_a, rows_b),
    total_cells: compute_total_cells_match(rows_a, rows_b),
    position: compute_position_score,
  }

  weighted_average(scores)
end