Class: LexM::LemmaList
- Inherits:
-
Object
- Object
- LexM::LemmaList
- Defined in:
- lib/lexm/lemma_list.rb
Overview
Represents a collection of lemmas
Instance Attribute Summary collapse
-
#lemmas ⇒ Object
readonly
The array of lemmas.
Instance Method Summary collapse
-
#[](index) ⇒ Lemma
Get lemma by index.
-
#addLemma(lemma, merge = true) ⇒ LemmaList
Adds a lemma to the list If a lemma with the same headword already exists, it will merge the annotations and sublemmas from the new lemma into the existing one.
-
#addLemmas(lemmas, merge = true) ⇒ LemmaList
Add multiple lemmas at once.
-
#allWords ⇒ Array<String>
Get an array of all words (both lemmas and sublemmas).
-
#clear ⇒ LemmaList
Clear all lemmas.
-
#detectCycles(graph, start, visited = [], path = [], location_map = {}) ⇒ Boolean
Helper method for validateCircularDependencies Recursively traverses the dependency graph to find cycles using DFS.
-
#each {|Lemma| ... } ⇒ Object
Iterate through all lemma lemmas.
-
#eachWord {|String| ... } ⇒ Object
Iterate through all words (both lemmas and sublemmas).
-
#findByAnnotation(type, value = nil) ⇒ Array<Lemma>
Find lemmas by annotation.
-
#findByText(text) ⇒ Array<Lemma>
Find lemmas by lemma text.
-
#findRedirectionsTo(target, type = nil) ⇒ Array<Lemma>
Find lemmas that redirect to a given target, optionally filtered by type.
-
#initialize(input = nil) ⇒ LemmaList
constructor
Initialize a new lemma list, optionally from a string or file.
-
#normalLemmas ⇒ Array<Lemma>
Find normal lemmas (not redirection lemmas).
-
#parseFile(filename) ⇒ LemmaList
Parse from a file.
-
#parseString(text) ⇒ LemmaList
Parse a multi-line string.
-
#redirectedLemmas ⇒ Array<Lemma>
Find redirection lemmas.
-
#removeLemma(lemma) ⇒ LemmaList
Remove a lemma.
-
#save(filename) ⇒ void
Save to a file.
-
#size ⇒ Integer
Get number of lemmas.
-
#sort(&block) ⇒ LemmaList
Sort the lemmas based on their headwords (non-destructive).
-
#sort!(&block) ⇒ LemmaList
Sort the lemmas based on their headwords (destructive).
-
#sort_by(&block) ⇒ LemmaList
Sort the lemmas using a custom key function (non-destructive).
-
#sort_by!(&block) ⇒ LemmaList
Sort the lemmas using a custom key function (destructive).
-
#source_location_str(item) ⇒ String
Helper method to format source location.
-
#to_s ⇒ String
Convert to string.
-
#track_sublemma_positions(lemma, line, filename, line_number) ⇒ void
Track source positions for sublemmas.
-
#validate ⇒ Boolean
Validate the entire lemma list for consistency Runs all validation checks.
-
#validateAll ⇒ Array<String>
Performs all validation checks and returns an array of all errors instead of raising on the first error encountered.
-
#validateCircularDependencies ⇒ Boolean
Detects circular dependencies between lemmas and sublemmas A circular dependency would result in infinite recursion when expanding or processing the lemma structure.
-
#validateHeadwords ⇒ Boolean
Ensures no headword appears more than once in the list This prevents ambiguity and conflicts in the dictionary.
-
#validateRedirections ⇒ Boolean
Check for circular redirection chains For example, if A redirects to B, which redirects back to A.
-
#validateSublemmaRelationships ⇒ Boolean
Ensures that words don’t appear as both headwords and sublemmas, and that the same sublemma doesn’t appear under multiple headwords.
Constructor Details
#initialize(input = nil) ⇒ LemmaList
Initialize a new lemma list, optionally from a string or file
19 20 21 22 23 24 25 26 27 28 29 30 |
# File 'lib/lexm/lemma_list.rb', line 19 def initialize(input = nil) @lemmas = [] if input.is_a?(String) # Assume it's a filename if it doesn't contain newlines if input.include?("\n") parseString(input) else parseFile(input) end end end |
Instance Attribute Details
#lemmas ⇒ Object (readonly)
The array of lemmas
15 16 17 |
# File 'lib/lexm/lemma_list.rb', line 15 def lemmas @lemmas end |
Instance Method Details
#[](index) ⇒ Lemma
Get lemma by index
609 610 611 |
# File 'lib/lexm/lemma_list.rb', line 609 def [](index) @lemmas[index] end |
#addLemma(lemma, merge = true) ⇒ LemmaList
Adds a lemma to the list If a lemma with the same headword already exists, it will merge the annotations and sublemmas from the new lemma into the existing one
543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 |
# File 'lib/lexm/lemma_list.rb', line 543 def addLemma(lemma, merge = true) # Find existing lemma with the same headword existing = findByText(lemma.text).first if existing && merge # Merge annotations lemma.annotations.each do |key, value| existing.setAnnotation(key, value) end # Merge sublemmas lemma.sublemmas.each do |sublemma| # Check if this sublemma already exists sublemma_exists = existing.sublemmas.any? do |existing_sublemma| existing_sublemma.text == sublemma.text && (!existing_sublemma.redirected? && !sublemma.redirected?) end # Add the sublemma if it doesn't exist unless sublemma_exists existing.sublemmas << sublemma end end else # Add as new lemma @lemmas << lemma end self end |
#addLemmas(lemmas, merge = true) ⇒ LemmaList
Add multiple lemmas at once
578 579 580 581 582 583 |
# File 'lib/lexm/lemma_list.rb', line 578 def addLemmas(lemmas, merge = true) lemmas.each do |lemma| addLemma(lemma, merge) end self end |
#allWords ⇒ Array<String>
Get an array of all words (both lemmas and sublemmas)
56 57 58 59 60 |
# File 'lib/lexm/lemma_list.rb', line 56 def allWords words = [] eachWord { |word| words << word } words end |
#clear ⇒ LemmaList
Clear all lemmas
595 596 597 598 |
# File 'lib/lexm/lemma_list.rb', line 595 def clear @lemmas = [] self end |
#detectCycles(graph, start, visited = [], path = [], location_map = {}) ⇒ Boolean
Helper method for validateCircularDependencies Recursively traverses the dependency graph to find cycles using DFS
334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 |
# File 'lib/lexm/lemma_list.rb', line 334 def detectCycles(graph, start, visited = [], path = [], location_map = {}) # Mark the current node as visited and add to path visited << start path << start # Visit all neighbors if graph.key?(start) graph[start].each do |neighbor| # Skip if neighbor is not a headword (not in graph) next unless graph.key?(neighbor) if !visited.include?(neighbor) detectCycles(graph, neighbor, visited, path, location_map) elsif path.include?(neighbor) # Cycle detected cycle_start_index = path.index(neighbor) cycle = path[cycle_start_index..-1] << neighbor # Format the cycle with source locations cycle_with_locations = cycle.map do |word| loc = location_map[word] || "unknown location" "#{word} (#{loc})" end raise "Circular dependency detected: #{cycle_with_locations.join(' -> ')}" end end end # Remove the current node from path path.pop true end |
#each {|Lemma| ... } ⇒ Object
Iterate through all lemma lemmas
34 35 36 37 38 |
# File 'lib/lexm/lemma_list.rb', line 34 def each @lemmas.each do |lemma| yield lemma end end |
#eachWord {|String| ... } ⇒ Object
Iterate through all words (both lemmas and sublemmas)
42 43 44 45 46 47 48 49 50 51 52 |
# File 'lib/lexm/lemma_list.rb', line 42 def eachWord @lemmas.each do |lemma| # Yield the main lemma text yield lemma.text if lemma.text # Yield all sublemma texts lemma.sublemmas.each do |sublemma| yield sublemma.text if sublemma.text end end end |
#findByAnnotation(type, value = nil) ⇒ Array<Lemma>
Find lemmas by annotation
527 528 529 530 531 532 533 534 535 |
# File 'lib/lexm/lemma_list.rb', line 527 def findByAnnotation(type, value = nil) @lemmas.select do |lemma| if value.nil? lemma.annotations.key?(type) else lemma.annotations[type] == value end end end |
#findByText(text) ⇒ Array<Lemma>
Find lemmas by lemma text
489 490 491 |
# File 'lib/lexm/lemma_list.rb', line 489 def findByText(text) @lemmas.select { |lemma| lemma.text == text } end |
#findRedirectionsTo(target, type = nil) ⇒ Array<Lemma>
Find lemmas that redirect to a given target, optionally filtered by type
509 510 511 512 513 514 515 516 517 518 519 520 521 |
# File 'lib/lexm/lemma_list.rb', line 509 def findRedirectionsTo(target, type = nil) @lemmas.select do |lemma| if lemma.redirected? && lemma.redirect.target == target type.nil? || lemma.redirect.types.include?(type) else lemma.sublemmas.any? do |sublemma| sublemma.redirected? && sublemma.redirect.target == target && (type.nil? || sublemma.redirect.types.include?(type)) end end end end |
#normalLemmas ⇒ Array<Lemma>
Find normal lemmas (not redirection lemmas)
495 496 497 |
# File 'lib/lexm/lemma_list.rb', line 495 def normalLemmas @lemmas.select { |lemma| not lemma.redirected? } end |
#parseFile(filename) ⇒ LemmaList
Parse from a file
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
# File 'lib/lexm/lemma_list.rb', line 80 def parseFile(filename) begin line_number = 0 File.open(filename, 'r') do |file| file.each_line do |line| line_number += 1 line = line.strip next if line.empty? || line.start_with?('#') begin # Create lemma with source location info lemma = Lemma.new(line, filename, line_number, 1) @lemmas << lemma # Track sublemma positions track_sublemma_positions(lemma, line, filename, line_number) rescue StandardError => e raise "Error on line #{line_number}: #{e.message} (#{line})" end end end rescue Errno::ENOENT raise "File not found: #{filename}" rescue Errno::EACCES raise "Permission denied: #{filename}" rescue StandardError => e raise "Error reading file: #{e.message}" end self end |
#parseString(text) ⇒ LemmaList
Parse a multi-line string
65 66 67 68 69 70 71 72 73 74 75 |
# File 'lib/lexm/lemma_list.rb', line 65 def parseString(text) line_number = 0 text.each_line do |line| line_number += 1 line = line.strip next if line.empty? || line.start_with?('#') lemma = Lemma.new(line, "string input", line_number, 1) @lemmas << lemma end self end |
#redirectedLemmas ⇒ Array<Lemma>
Find redirection lemmas
501 502 503 |
# File 'lib/lexm/lemma_list.rb', line 501 def redirectedLemmas @lemmas.select { |lemma| lemma.redirected? } end |
#removeLemma(lemma) ⇒ LemmaList
Remove a lemma
588 589 590 591 |
# File 'lib/lexm/lemma_list.rb', line 588 def removeLemma(lemma) @lemmas.delete(lemma) self end |
#save(filename) ⇒ void
This method returns an undefined value.
Save to a file
662 663 664 665 666 667 668 669 670 671 672 673 674 |
# File 'lib/lexm/lemma_list.rb', line 662 def save(filename) begin File.open(filename, 'w') do |file| @lemmas.each do |lemma| file.puts(lemma.to_s) end end rescue Errno::EACCES raise "Permission denied: Cannot write to #{filename}" rescue StandardError => e raise "Error writing to file: #{e.message}" end end |
#size ⇒ Integer
Get number of lemmas
602 603 604 |
# File 'lib/lexm/lemma_list.rb', line 602 def size @lemmas.size end |
#sort(&block) ⇒ LemmaList
Sort the lemmas based on their headwords (non-destructive)
616 617 618 619 620 621 622 623 624 625 626 627 |
# File 'lib/lexm/lemma_list.rb', line 616 def sort(&block) if block_given? sorted_list = LemmaList.new sorted_list.instance_variable_set(:@lemmas, @lemmas.sort(&block)) sorted_list else # Default sort by headword text sorted_list = LemmaList.new sorted_list.instance_variable_set(:@lemmas, @lemmas.sort_by { |lemma| lemma.text.to_s.downcase }) sorted_list end end |
#sort!(&block) ⇒ LemmaList
Sort the lemmas based on their headwords (destructive)
632 633 634 635 636 637 638 639 640 |
# File 'lib/lexm/lemma_list.rb', line 632 def sort!(&block) if block_given? @lemmas.sort!(&block) else # Default sort by headword text @lemmas.sort_by! { |lemma| lemma.text.to_s.downcase } end self end |
#sort_by(&block) ⇒ LemmaList
Sort the lemmas using a custom key function (non-destructive)
645 646 647 648 649 |
# File 'lib/lexm/lemma_list.rb', line 645 def sort_by(&block) sorted_list = LemmaList.new sorted_list.instance_variable_set(:@lemmas, @lemmas.sort_by(&block)) sorted_list end |
#sort_by!(&block) ⇒ LemmaList
Sort the lemmas using a custom key function (destructive)
654 655 656 657 |
# File 'lib/lexm/lemma_list.rb', line 654 def sort_by!(&block) @lemmas.sort_by!(&block) self end |
#source_location_str(item) ⇒ String
Helper method to format source location
144 145 146 147 148 149 150 151 |
# File 'lib/lexm/lemma_list.rb', line 144 def source_location_str(item) if item.source_file && item.source_line col_info = item.source_column ? ", col: #{item.source_column}" : "" "#{item.source_file}:#{item.source_line}#{col_info}" else "unknown location" end end |
#to_s ⇒ String
Convert to string
678 679 680 |
# File 'lib/lexm/lemma_list.rb', line 678 def to_s @lemmas.map(&:to_s).join("\n") end |
#track_sublemma_positions(lemma, line, filename, line_number) ⇒ void
This method returns an undefined value.
Track source positions for sublemmas
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
# File 'lib/lexm/lemma_list.rb', line 117 def track_sublemma_positions(lemma, line, filename, line_number) return if line.nil? || lemma.redirected? || !line.include?("|") # Find where sublemmas begin sublemmas_start = line.index("|") + 1 # For each sublemma, try to find its position in the line lemma.sublemmas.each do |sublemma| sublemma.source_file = filename sublemma.source_line = line_number # Determine column position if sublemma.text # Find position of this sublemma text in the line text_pos = line.index(sublemma.text, sublemmas_start) sublemma.source_column = text_pos ? text_pos + 1 : sublemmas_start elsif sublemma.redirect # Find position of redirection marker redirect_pos = line.index('>', sublemmas_start) sublemma.source_column = redirect_pos ? redirect_pos + 1 : sublemmas_start end end end |
#validate ⇒ Boolean
Validate the entire lemma list for consistency Runs all validation checks
371 372 373 374 375 376 377 378 379 380 381 382 |
# File 'lib/lexm/lemma_list.rb', line 371 def validate begin validateHeadwords validateSublemmaRelationships validateCircularDependencies validateRedirections return true rescue StandardError => e puts "Validation error: #{e.message}" return false end end |
#validateAll ⇒ Array<String>
Performs all validation checks and returns an array of all errors instead of raising on the first error encountered
387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 |
# File 'lib/lexm/lemma_list.rb', line 387 def validateAll errors = [] # Create maps for tracking word usage with source locations normal_headwords = {} redirection_headwords = {} sublemmas_map = {} # First, map out all words and their locations @lemmas.each do |lemma| location = source_location_str(lemma) if lemma.redirected? redirection_headwords[lemma.text] = location else normal_headwords[lemma.text] = location # Process sublemmas for non-redirecting lemmas lemma.sublemmas.each do |sublemma| next if sublemma.redirected? sub_location = source_location_str(sublemma) # Record which headword this sublemma belongs to with location if sublemmas_map.key?(sublemma.text) sublemmas_map[sublemma.text] << [lemma.text, sub_location] else sublemmas_map[sublemma.text] = [[lemma.text, sub_location]] end end end end # Check for duplicate headwords with locations headword_locations = {} @lemmas.each do |lemma| location = source_location_str(lemma) if headword_locations.key?(lemma.text) headword_locations[lemma.text] << location else headword_locations[lemma.text] = [location] end end headword_locations.each do |word, locations| if locations.size > 1 errors << "Duplicate headword detected: '#{word}' at #{locations.join(' and ')}" end end # Check for words that are both normal headwords and redirection headwords normal_headwords.each do |word, location| if redirection_headwords.key?(word) errors << "Word '#{word}' is both a normal headword (#{location}) and a redirection headword (#{redirection_headwords[word]})" end end # Check for words that are both headwords and sublemmas normal_headwords.each do |word, location| if sublemmas_map.key?(word) sublemma_info = sublemmas_map[word].map { |h, l| "#{h} (#{l})" }.join(', ') errors << "Word '#{word}' is both a headword (#{location}) and a sublemma of #{sublemma_info}" end end # Check for words that are both redirection headwords and sublemmas redirection_headwords.each do |word, location| if sublemmas_map.key?(word) sublemma_info = sublemmas_map[word].map { |h, l| "#{h} (#{l})" }.join(', ') errors << "Word '#{word}' is both a redirection headword (#{location}) and a sublemma of #{sublemma_info}" end end # Check for sublemmas that appear in multiple entries sublemmas_map.each do |sublemma, headword_list| if headword_list.size > 1 headword_info = headword_list.map { |h, l| "#{h} (#{l})" }.join(', ') errors << "Sublemma '#{sublemma}' appears in multiple entries: #{headword_info}" end end # Check for circular dependencies and redirections if no errors so far if errors.empty? begin validateCircularDependencies rescue StandardError => e errors << e. end begin validateRedirections rescue StandardError => e errors << e. end end errors end |
#validateCircularDependencies ⇒ Boolean
Detects circular dependencies between lemmas and sublemmas A circular dependency would result in infinite recursion when expanding or processing the lemma structure
295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 |
# File 'lib/lexm/lemma_list.rb', line 295 def validateCircularDependencies # Build a graph of dependencies (headword -> sublemmas) with locations dependency_graph = {} location_map = {} @lemmas.each do |lemma| next if lemma.redirected? # Track lemma location location_map[lemma.text] = source_location_str(lemma) # Initialize headword in the graph if not present dependency_graph[lemma.text] ||= [] # Add all non-redirecting sublemmas as dependencies lemma.sublemmas.each do |sublemma| next if sublemma.redirected? dependency_graph[lemma.text] << sublemma.text location_map[sublemma.text] ||= source_location_str(sublemma) end end # For each headword, check for circular dependencies dependency_graph.each_key do |start| detectCycles(dependency_graph, start, [], [], location_map) end true end |
#validateHeadwords ⇒ Boolean
Ensures no headword appears more than once in the list This prevents ambiguity and conflicts in the dictionary
198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
# File 'lib/lexm/lemma_list.rb', line 198 def validateHeadwords # Check for duplicate headwords headwords = {} @lemmas.each do |lemma| if headwords.key?(lemma.text) location1 = source_location_str(headwords[lemma.text]) location2 = source_location_str(lemma) raise "Duplicate headword detected: '#{lemma.text}' at #{location1} and #{location2}" end headwords[lemma.text] = lemma end true end |
#validateRedirections ⇒ Boolean
Check for circular redirection chains For example, if A redirects to B, which redirects back to A
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
# File 'lib/lexm/lemma_list.rb', line 157 def validateRedirections # Build a redirection graph with locations redirection_map = {} location_map = {} @lemmas.each do |lemma| if lemma.redirected? redirection_map[lemma.text] = lemma.redirect.target location_map[lemma.text] = source_location_str(lemma) end end # Check for cycles redirection_map.each_key do |start| visited = [] current = start while redirection_map.key?(current) && !visited.include?(current) visited << current current = redirection_map[current] end if redirection_map.key?(current) && current == start # Format the cycle with locations cycle_path = visited.map do |word| loc = location_map[word] || "unknown location" "#{word} (#{loc})" end cycle_path << "#{current} (#{location_map[current]})" raise "Circular redirection detected: #{cycle_path.join(' -> ')}" end end true end |
#validateSublemmaRelationships ⇒ Boolean
Ensures that words don’t appear as both headwords and sublemmas, and that the same sublemma doesn’t appear under multiple headwords
218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 |
# File 'lib/lexm/lemma_list.rb', line 218 def validateSublemmaRelationships # Build word maps with source tracking normal_headwords = {} redirection_headwords = {} sublemmas_map = {} # First, capture all headwords and their sublemmas @lemmas.each do |lemma| if lemma.redirected? redirection_headwords[lemma.text] = lemma else normal_headwords[lemma.text] = lemma # Process sublemmas for non-redirecting lemmas lemma.sublemmas.each do |sublemma| # Skip redirecting sublemmas, we only care about actual sublemmas with text next if sublemma.redirected? # Record which headword this sublemma belongs to if sublemmas_map.key?(sublemma.text) sublemmas_map[sublemma.text] << [lemma, sublemma] else sublemmas_map[sublemma.text] = [[lemma, sublemma]] end end end end # Check for words that are both normal headwords and redirection headwords normal_headwords.each do |word, lemma| if redirection_headwords.key?(word) location1 = source_location_str(lemma) location2 = source_location_str(redirection_headwords[word]) raise "Word '#{word}' is both a normal headword (#{location1}) and a redirection headword (#{location2})" end end # Check for words that are both headwords and sublemmas normal_headwords.each do |word, lemma| if sublemmas_map.key?(word) location1 = source_location_str(lemma) sublemma_info = sublemmas_map[word].map do |l, s| "#{l.text} (#{source_location_str(s)})" end.join(', ') raise "Word '#{word}' is both a headword (#{location1}) and a sublemma of #{sublemma_info}" end end # Check for words that are both redirection headwords and sublemmas redirection_headwords.each do |word, lemma| if sublemmas_map.key?(word) location1 = source_location_str(lemma) sublemma_info = sublemmas_map[word].map do |l, s| "#{l.text} (#{source_location_str(s)})" end.join(', ') raise "Word '#{word}' is both a redirection headword (#{location1}) and a sublemma of #{sublemma_info}" end end # Check for sublemmas that appear in multiple entries sublemmas_map.each do |sublemma, entries| if entries.size > 1 headword_info = entries.map do |l, s| "#{l.text} (#{source_location_str(s)})" end.join(', ') raise "Sublemma '#{sublemma}' appears in multiple entries: #{headword_info}" end end true end |