Class: DiscourseAi::Tokenizer::MultilingualE5LargeTokenizer

Inherits:
BasicTokenizer
  • Object
show all
Defined in:
lib/discourse_ai/tokenizer/multilingual_e5_large_tokenizer.rb

Overview

Tokenizer from multilingual-e5-large, first multilingual embeddings model used in Discourse

Class Method Summary collapse

Methods inherited from BasicTokenizer

available_llm_tokenizers, below_limit?, decode, encode, size, tokenize, truncate

Class Method Details

.tokenizerObject



7
8
9
10
11
12
# File 'lib/discourse_ai/tokenizer/multilingual_e5_large_tokenizer.rb', line 7

def self.tokenizer
  @tokenizer ||=
    ::Tokenizers.from_file(
      DiscourseAi::Tokenizers.vendor_path("multilingual-e5-large.json")
    )
end