Class: Pho::Analyzers

Inherits:
Object
  • Object
show all
Defined in:
lib/pho/field_predicate_map.rb

Overview

Declares URI constants for the various text analyzers supported by the Talis Platform

Analyzers are configured to operate on specific DataTypePropertys using the FieldPredicateMap

Constant Summary collapse

STANDARD =

A standard English analyzer and the default if no analyzer is specified. Words are split on punctuation characters, removing the punctuation. Words containing a dot are not split. Words containing both hyphens and numbers are not split. Email addresses and hostnames are not split. Stop words are removed. Searches on fields with this type of analyzer are case insensitive.

The following words are considered to be stop words and will not be indexed: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with

"http://schemas.talis.com/2007/bigfoot/analyzers#standard-en".freeze
GREEK =

A standard Greek language analyzer. Words are split on punctuation characters, removing the punctuation. Words containing a dot are not split. Words containing both hyphens and numbers are not split. Email addresses and hostnames are not split. Stop words are removed. Searches on fields with this type of analyzer are case insensitive.

"http://schemas.talis.com/2007/bigfoot/analyzers#standard-el".freeze
GERMAN =

The following words are considered to be stop words and will not be indexed: einer, eine, eines, einem, einen, der, die, das, dass, daß, du, er, sie, es, was, wer, wie, wir, und, oder, ohne, mit, am, im, in, aus, auf, ist, sein, war, wird, ihr, ihre, ihres, als, für, von, mit, dich, dir, mich, mir, mein, sein, kein, durch, wegen, wird

"http://schemas.talis.com/2007/bigfoot/analyzers#standard-de".freeze
FRENCH =

A standard French language analyzer. Words are split on punctuation characters, removing the punctuation. Words containing a dot are not split. Words containing both hyphens and numbers are not split. Email addresses and hostnames are not split. Stop words are removed and any remaining words are stemmed. Searches on fields with this type of analyzer are case insensitive.

The following words are considered to be stop words and will not be indexed: a, afin, ai, ainsi, après, attendu, au, aujourd, auquel, aussi, autre, autres, aux, auxquelles, auxquels, avait, avant, avec, avoir, c, car, ce, ceci, cela, celle, celles, celui, cependant, certain, certaine, certaines, certains, ces, cet, cette, ceux, chez, ci, combien, comme, comment, concernant, contre, d, dans, de, debout, dedans, dehors, delà, depuis, derrière, des, désormais, desquelles, desquels, dessous, dessus, devant, devers, devra, divers, diverse, diverses, doit, donc, dont, du, duquel, durant, dès, elle, elles, en, entre, environ, est, et, etc, etre, eu, eux, excepté, hormis, hors, hélas, hui, il, ils, j, je, jusqu, jusque, l, la, laquelle, le, lequel, les, lesquelles, lesquels, leur, leurs, lorsque, lui, là, ma, mais, malgré, me, merci, mes, mien, mienne, miennes, miens, moi, moins, mon, moyennant, même, mêmes, n, ne, ni, non, nos, notre, nous, néanmoins, nôtre, nôtres, on, ont, ou, outre, où, par, parmi, partant, pas, passé, pendant, plein, plus, plusieurs, pour, pourquoi, proche, près, puisque, qu, quand, que, quel, quelle, quelles, quels, qui, quoi, quoique, revoici, revoilà, s, sa, sans, sauf, se, selon, seront, ses, si, sien, sienne, siennes, siens, sinon, soi, soit, son, sont, sous, suivant, sur, ta, te, tes, tien, tienne, tiennes, tiens, toi, ton, tous, tout, toute, toutes, tu, un, une, va, vers, voici, voilà, vos, votre, vous, vu, vôtre, vôtres, y, à, ça, ès, été, être, ô.

"http://schemas.talis.com/2007/bigfoot/analyzers#standard-fr".freeze
CJK =

A standard CJK language analyzer.

"http://schemas.talis.com/2007/bigfoot/analyzers#standard-cjk".freeze
DUTCH =

The following words are considered to be stop words and will not be indexed: de, en, van, ik, te, dat, die, in, een, hij, het, niet, zijn, is, was, op, aan, met, als, voor, had, er, maar, om, hem, dan, zou, of, wat, mijn, men, dit, zo, door, over, ze, zich, bij, ook, tot, je, mij, uit, der, daar, haar, naar, heb, hoe, heeft, hebben, deze, u, want, nog, zal, me, zij, nu, ge, geen, omdat, iets, worden, toch, al, waren, veel, meer, doen, toen, moet, ben, zonder, kan, hun, dus, alles, onder, ja, eens, hier, wie, werd, altijd, doch, wordt, wezen, kunnen, ons, zelf, tegen, na, reeds, wil, kon, niets, uw, iemand, geweest, andere

"http://schemas.talis.com/2007/bigfoot/analyzers#standard-nl".freeze
CHINESE =
"http://schemas.talis.com/2007/bigfoot/analyzers#standard-cn".freeze
KEYWORD =

This analyzer does not split the field at all. The entire value of the field is indexed as a single token.

"http://schemas.talis.com/2007/bigfoot/analyzers#keyword".freeze
NO_STOP_WORD_STANDARD =

English analyzer without stop words. This is identical to the standard English analyzer but all words are indexed.

"http://schemas.talis.com/2007/bigfoot/analyzers#nostop-en".freeze
NORMALISE_STANDARD =

English analyzer without stop words and with accent support. This is identical to the standard English analyzer but all words are indexed plus any accented characters in the ISO Latin 1 character set are replaced by their unaccented equivalent See API documentation at n2.talis.com/wiki/Field_Predicate_Map for details of replacements

"http://schemas.talis.com/2007/bigfoot/analyzers#norm-en".freeze
PORTER_NORMALIZE_STANDARD =

English analyzer with porter stemming, case normalization, latin 1 normalization, and stop words removal

"http://schemas.talis.com/2007/bigfoot/analyzers#porter-norm-en".freeze
PORTER_NO_STOP_WORD_STANDARD =

English analyzer with porter stemming, case normalization and latin 1 normalization.

"http://schemas.talis.com/2007/bigfoot/analyzers#porter-nostop-norm-en".freeze