Class: DiscourseAi::Tokenizer::BertTokenizer

Inherits:
BasicTokenizer show all
Defined in:
lib/discourse_ai/tokenizer/bert_tokenizer.rb

Overview

Bert tokenizer, useful for lots of embeddings and small classification models

Class Method Summary collapse

Methods inherited from BasicTokenizer

available_llm_tokenizers, below_limit?, decode, encode, size, tokenize, truncate

Class Method Details

.tokenizerObject



7
8
9
10
11
12
# File 'lib/discourse_ai/tokenizer/bert_tokenizer.rb', line 7

def self.tokenizer
  @tokenizer ||=
    ::Tokenizers.from_file(
      DiscourseAi::Tokenizers.vendor_path("bert-base-uncased.json")
    )
end