Class: DiscourseAi::Tokenizer::BgeLargeEnTokenizer

Inherits:
BasicTokenizer show all
Defined in:
lib/discourse_ai/tokenizer/bge_large_en_tokenizer.rb

Overview

Tokenizer used in bge-large-en-v1.5, the most common embeddings model used for Discourse

Class Method Summary collapse

Methods inherited from BasicTokenizer

available_llm_tokenizers, below_limit?, decode, encode, size, tokenize, truncate

Class Method Details

.tokenizerObject



7
8
9
10
11
12
# File 'lib/discourse_ai/tokenizer/bge_large_en_tokenizer.rb', line 7

def self.tokenizer
  @tokenizer ||=
    ::Tokenizers.from_file(
      DiscourseAi::Tokenizers.vendor_path("bge-large-en.json")
    )
end