Class: Markdown::Merge::TableMatchAlgorithm
- Inherits:
-
Object
- Object
- Markdown::Merge::TableMatchAlgorithm
- Defined in:
- lib/markdown/merge/table_match_algorithm.rb
Overview
Algorithm for computing match scores between two Markdown tables.
This algorithm uses multiple factors to determine how well two tables match:
-
(A) Percentage of matching header cells (using Levenshtein similarity)
-
(B) Percentage of matching cells in the first column (using Levenshtein similarity)
-
© Average percentage of matching cells in rows with matching first column
-
(D) Percentage of matching total cells
-
(E) Position distance weight (closer tables score higher)
Cell comparisons use Levenshtein distance to compute similarity, allowing partial matches (e.g., “Value” vs “Values” would get a high similarity score).
The final score is the weighted average of these factors.
Constant Summary collapse
- DEFAULT_WEIGHTS =
Default weights for each factor in the algorithm
{ header_match: 0.25, # (A) Header row matching first_column: 0.20, # (B) First column matching row_content: 0.25, # (C) Content in matching rows total_cells: 0.15, # (D) Overall cell matching position: 0.15, # (E) Position distance }.freeze
- FIRST_COLUMN_SIMILARITY_THRESHOLD =
Minimum similarity threshold to consider cells as potentially matching for first column lookup (used in row content matching)
0.7
Instance Attribute Summary collapse
-
#backend ⇒ Symbol
readonly
The markdown backend being used.
-
#position_a ⇒ Integer?
readonly
Position of table A in its document (0-indexed).
-
#position_b ⇒ Integer?
readonly
Position of table B in its document (0-indexed).
-
#total_tables_a ⇒ Integer
readonly
Total number of tables in document A.
-
#total_tables_b ⇒ Integer
readonly
Total number of tables in document B.
-
#weights ⇒ Hash
readonly
Weights for each scoring factor.
Instance Method Summary collapse
-
#call(table_a, table_b) ⇒ Float
Compute the match score between two tables.
-
#initialize(position_a: nil, position_b: nil, total_tables_a: 1, total_tables_b: 1, weights: {}, backend: :commonmarker) ⇒ TableMatchAlgorithm
constructor
Initialize the table match algorithm.
Constructor Details
#initialize(position_a: nil, position_b: nil, total_tables_a: 1, total_tables_b: 1, weights: {}, backend: :commonmarker) ⇒ TableMatchAlgorithm
Initialize the table match algorithm.
71 72 73 74 75 76 77 78 |
# File 'lib/markdown/merge/table_match_algorithm.rb', line 71 def initialize(position_a: nil, position_b: nil, total_tables_a: 1, total_tables_b: 1, weights: {}, backend: :commonmarker) @position_a = position_a @position_b = position_b @total_tables_a = [total_tables_a, 1].max @total_tables_b = [total_tables_b, 1].max @weights = DEFAULT_WEIGHTS.merge(weights) @backend = backend end |
Instance Attribute Details
#backend ⇒ Symbol (readonly)
Returns The markdown backend being used.
61 62 63 |
# File 'lib/markdown/merge/table_match_algorithm.rb', line 61 def backend @backend end |
#position_a ⇒ Integer? (readonly)
Returns Position of table A in its document (0-indexed).
46 47 48 |
# File 'lib/markdown/merge/table_match_algorithm.rb', line 46 def position_a @position_a end |
#position_b ⇒ Integer? (readonly)
Returns Position of table B in its document (0-indexed).
49 50 51 |
# File 'lib/markdown/merge/table_match_algorithm.rb', line 49 def position_b @position_b end |
#total_tables_a ⇒ Integer (readonly)
Returns Total number of tables in document A.
52 53 54 |
# File 'lib/markdown/merge/table_match_algorithm.rb', line 52 def total_tables_a @total_tables_a end |
#total_tables_b ⇒ Integer (readonly)
Returns Total number of tables in document B.
55 56 57 |
# File 'lib/markdown/merge/table_match_algorithm.rb', line 55 def total_tables_b @total_tables_b end |
#weights ⇒ Hash (readonly)
Returns Weights for each scoring factor.
58 59 60 |
# File 'lib/markdown/merge/table_match_algorithm.rb', line 58 def weights @weights end |
Instance Method Details
#call(table_a, table_b) ⇒ Float
Compute the match score between two tables.
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
# File 'lib/markdown/merge/table_match_algorithm.rb', line 85 def call(table_a, table_b) rows_a = extract_rows(table_a) rows_b = extract_rows(table_b) return 0.0 if rows_a.empty? || rows_b.empty? scores = { header_match: compute_header_match(rows_a, rows_b), first_column: compute_first_column_match(rows_a, rows_b), row_content: compute_row_content_match(rows_a, rows_b), total_cells: compute_total_cells_match(rows_a, rows_b), position: compute_position_score, } weighted_average(scores) end |