Class: Ferret::Analysis::MappingFilter

Inherits:
Object
  • Object
show all
Defined in:
ext/r_analysis.c

Overview

Summary

A MappingFilter maps strings in tokens. This is usually used to map UTF-8 characters to ASCII characters for easier searching and better search recall. The mapping is compiled into a Deterministic Finite Automata so it is super fast. This Filter can therefor be used for indexing very large datasets. Currently regular expressions are not supported. If you are really interested in the feature, please contact me at [email protected].

Example

mapping = {
  ['à','á','â','ã','ä','å','ā','ă']         => 'a',
  'æ'                                       => 'ae',
  ['ď','đ']                                 => 'd',
  ['ç','ć','č','ĉ','ċ']                     => 'c',
  ['è','é','ê','ë','ē','ę','ě','ĕ','ė',]    => 'e',
  ['ƒ']                                     => 'f',
  ['ĝ','ğ','ġ','ģ']                         => 'g',
  ['ĥ','ħ']                                 => 'h',
  ['ì','ì','í','î','ï','ī','ĩ','ĭ']         => 'i',
  ['į','ı','ij','ĵ']                         => 'j',
  ['ķ','ĸ']                                 => 'k',
  ['ł','ľ','ĺ','ļ','ŀ']                     => 'l',
  ['ñ','ń','ň','ņ','ʼn','ŋ']                 => 'n',
  ['ò','ó','ô','õ','ö','ø','ō','ő','ŏ','ŏ'] => 'o',
  ['œ']                                     => 'oek',
  ['ą']                                     => 'q',
  ['ŕ','ř','ŗ']                             => 'r',
  ['ś','š','ş','ŝ','ș']                     => 's',
  ['ť','ţ','ŧ','ț']                         => 't',
  ['ù','ú','û','ü','ū','ů','ű','ŭ','ũ','ų'] => 'u',
  ['ŵ']                                     => 'w',
  ['ý','ÿ','ŷ']                             => 'y',
  ['ž','ż','ź']                             => 'z'
}
filt = MappingFilter.new(token_stream, mapping)