Module: BlackStack::Strings::Comparing
- Defined in:
- lib/functions.rb
Overview
Fuzzy String Comparsion Functions: How similar are 2 strings that are not exactly equal.
Class Method Summary collapse
- 
  
    
      .levenshtein_distance(s, t)  ⇒ Object 
    
    
  
  
  
  
  
  
  
  
  
    retorna 0 si los strings son iguales stackoverflow.com/questions/16323571/measure-the-distance-between-two-strings-with-ruby. 
- 
  
    
      .max_sardi_distance(s)  ⇒ Object 
    
    
  
  
  
  
  
  
  
  
  
    retorna la cantidad de palabras con mas de 3 caracteres que se encuentran en el parametro s. 
- 
  
    
      .sardi_distance(s, t)  ⇒ Object 
    
    
  
  
  
  
  
  
  
  
  
    retorna la cantidad de palabras con mas de 3 caracteres del parametro s que se encuentran en el parametro t. 
Class Method Details
.levenshtein_distance(s, t) ⇒ Object
retorna 0 si los strings son iguales stackoverflow.com/questions/16323571/measure-the-distance-between-two-strings-with-ruby
| 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 | # File 'lib/functions.rb', line 505 def self.levenshtein_distance(s, t) s.downcase! t.downcase! m = s.length n = t.length return m if n == 0 return n if m == 0 d = Array.new(m+1) {Array.new(n+1)} (0..m).each {|i| d[i][0] = i} (0..n).each {|j| d[0][j] = j} (1..n).each do |j| (1..m).each do |i| d[i][j] = if s[i-1] == t[j-1] # adjust index into string d[i-1][j-1] # no operation required else [ d[i-1][j]+1, # deletion d[i][j-1]+1, # insertion d[i-1][j-1]+1, # substitution ].min end end end d[m][n] end | 
.max_sardi_distance(s) ⇒ Object
retorna la cantidad de palabras con mas de 3 caracteres que se encuentran en el parametro s
| 533 534 535 536 537 538 539 540 541 542 543 544 545 | # File 'lib/functions.rb', line 533 def self.max_sardi_distance(s) s.downcase! s.gsub!(/-/,' ') ss = s.scan(/\b([a-z]+)\b/) n = 0 ss.each { |x| x = x[0] if (x.size > 3) # para evitar keywords triviales como 'and' n += 1 end } n end | 
.sardi_distance(s, t) ⇒ Object
retorna la cantidad de palabras con mas de 3 caracteres del parametro s que se encuentran en el parametro t
| 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 | # File 'lib/functions.rb', line 548 def self.sardi_distance(s, t) s.downcase! t.downcase! s.gsub!(/-/,' ') t.gsub!(/-/,' ') max_distance = max_sardi_distance(s) ss = s.scan(/\b([a-z]+)\b/) tt = t.scan(/\b([a-z]+)\b/) n = 0 ss.each { |x| x = x[0] if (x.size > 3) # para evitar keywords triviales como 'and' if ( tt.select { |y| y[0] == x }.size > 0 ) n += 1 end end } return max_distance - n end |