Class: Langchain::Vectorsearch::Base

Inherits:

Object

Object
Langchain::Vectorsearch::Base

show all

Extended by:: Forwardable

Includes:: DependencyHelper

Defined in:: lib/langchain/vectorsearch/base.rb

Overview

Vector Databases

A vector database a type of database that stores data as high-dimensional vectors, which are mathematical representations of features or attributes. Each vector has a certain number of dimensions, which can range from tens to thousands, depending on the complexity and granularity of the data.

Available vector databases

Usage

Pick a vector database from list.
Review its documentation to install the required gems, and create an account, get an API key, etc

Instantiate the vector database class:

weaviate = Langchain::Vectorsearch::Weaviate.new(
  url:         ENV["WEAVIATE_URL"],
  api_key:     ENV["WEAVIATE_API_KEY"],
  index_name:  "Documents",
  llm:         Langchain::LLM::OpenAI.new(api_key:)
)

# You can instantiate other supported vector databases the same way:
epsilla  = Langchain::Vectorsearch::Epsilla.new(...)
milvus   = Langchain::Vectorsearch::Milvus.new(...)
qdrant   = Langchain::Vectorsearch::Qdrant.new(...)
pinecone = Langchain::Vectorsearch::Pinecone.new(...)
chroma   = Langchain::Vectorsearch::Chroma.new(...)
pgvector = Langchain::Vectorsearch::Pgvector.new(...)

Schema Creation

‘create_default_schema()` creates default schema in your vector database.

search.create_default_schema

(We plan on offering customizable schema creation shortly)

Adding Data

You can add data with:

‘add_data(path:, paths:)` to add any kind of data type

my_pdf = Langchain.root.join("path/to/my.pdf")
my_text = Langchain.root.join("path/to/my.txt")
my_docx = Langchain.root.join("path/to/my.docx")
my_csv = Langchain.root.join("path/to/my.csv")

search.add_data(paths: [my_pdf, my_text, my_docx, my_csv])

‘add_texts(texts:)` to only add textual data

search.add_texts(
  texts: [
    "Lorem Ipsum is simply dummy text of the printing and typesetting industry.",
    "Lorem Ipsum has been the industry's standard dummy text ever since the 1500s"
  ]
)

Retrieving Data

‘similarity_search_by_vector(embedding:, k:)` searches the vector database for the closest `k` number of embeddings.

search.similarity_search_by_vector(
  embedding: ...,
  k: # number of results to be retrieved
)

‘vector_store.similarity_search(query:, k:)` generates an embedding for the query and searches the vector database for the closest `k` number of embeddings.

search.similarity_search_by_vector(

embedding: ...,
k: # number of results to be retrieved

)

‘ask(question:)` generates an embedding for the passed-in question, searches the vector database for closest embeddings and then passes these as context to the LLM to generate an answer to the question.

search.ask(question: "What is lorem ipsum?")

Direct Known Subclasses

Chroma, Elasticsearch, Epsilla, Hnswlib, Milvus, Pgvector, Pinecone, Qdrant, Weaviate

Constant Summary collapse

DEFAULT_METRIC =

"cosine"

Instance Attribute Summary collapse

#client ⇒ Object readonly

Returns the value of attribute client.
#index_name ⇒ Object readonly

Returns the value of attribute index_name.
#llm ⇒ Object readonly

Returns the value of attribute llm.

Class Method Summary collapse

.logger_options ⇒ Object

Instance Method Summary collapse

#add_data(paths:, options: {}, chunker: Langchain::Chunker::Text) ⇒ Object
#add_texts ⇒ Object

Method supported by Vectorsearch DB to add a list of texts to the index.
#ask ⇒ Object

Method supported by Vectorsearch DB to answer a question given a context (data) pulled from your Vectorsearch DB.
#create_default_schema ⇒ Object

Method supported by Vectorsearch DB to create a default schema.
#destroy_default_schema ⇒ Object

Method supported by Vectorsearch DB to delete the default schema.
#generate_hyde_prompt(question:) ⇒ String

HyDE-style prompt.
#generate_rag_prompt(question:, context:) ⇒ String

Retrieval Augmented Generation (RAG).
#get_default_schema ⇒ Object

Method supported by Vectorsearch DB to retrieve a default schema.
#initialize(llm:) ⇒ Base constructor

A new instance of Base.
#remove_texts ⇒ Object

Method supported by Vectorsearch DB to delete a list of texts from the index.
#similarity_search ⇒ Object

Method supported by Vectorsearch DB to search for similar texts in the index.
#similarity_search_by_vector ⇒ Object

Method supported by Vectorsearch DB to search for similar texts in the index by the passed in vector.
#similarity_search_with_hyde(query:, k: 4) ⇒ String

Paper: arxiv.org/abs/2212.10496 Hypothetical Document Embeddings (HyDE)-augmented similarity search.
#update_texts ⇒ Object

Method supported by Vectorsearch DB to update a list of texts to the index.

Methods included from DependencyHelper

#depends_on

Constructor Details

#initialize(llm:) ⇒ `Base`

Returns a new instance of Base.

Parameters:

llm (Object) —

The LLM client to use



98
99
100

# File 'lib/langchain/vectorsearch/base.rb', line 98

def initialize(llm:)
  @llm = llm
end

Instance Attribute Details

#client ⇒ `Object` (readonly)

Returns the value of attribute client.



93
94
95

# File 'lib/langchain/vectorsearch/base.rb', line 93

def client
  @client
end

#index_name ⇒ `Object` (readonly)

Returns the value of attribute index_name.



93
94
95

# File 'lib/langchain/vectorsearch/base.rb', line 93

def index_name
  @index_name
end

#llm ⇒ `Object` (readonly)

Returns the value of attribute llm.



93
94
95

# File 'lib/langchain/vectorsearch/base.rb', line 93

def llm
  @llm
end

Class Method Details

.logger_options ⇒ `Object`

# File 'lib/langchain/vectorsearch/base.rb', line 198

def self.logger_options
  {
    color: :blue
  }
end

Instance Method Details

#add_data(paths:, options: {}, chunker: Langchain::Chunker::Text) ⇒ `Object`

Raises:

(ArgumentError)

# File 'lib/langchain/vectorsearch/base.rb', line 183

def add_data(paths:, options: {}, chunker: Langchain::Chunker::Text)
  raise ArgumentError, "Paths must be provided" if Array(paths).empty?

  texts = Array(paths)
    .flatten
    .map do |path|
      data = Langchain::Loader.new(path, options, chunker: chunker)&.load&.chunks
      data.map { |chunk| chunk.text }
    end

  texts.flatten!

  add_texts(texts: texts)
end

#add_texts ⇒ `Object`

Method supported by Vectorsearch DB to add a list of texts to the index

Raises:

(NotImplementedError)



118
119
120

# File 'lib/langchain/vectorsearch/base.rb', line 118

def add_texts(...)
  raise NotImplementedError, "#{self.class.name} does not support adding texts"
end

#ask ⇒ `Object`

Method supported by Vectorsearch DB to answer a question given a context (data) pulled from your Vectorsearch DB.

Raises:

(NotImplementedError)



155
156
157

# File 'lib/langchain/vectorsearch/base.rb', line 155

def ask(...)
  raise NotImplementedError, "#{self.class.name} does not support asking questions"
end

#create_default_schema ⇒ `Object`

Method supported by Vectorsearch DB to create a default schema

Raises:

(NotImplementedError)



108
109
110

# File 'lib/langchain/vectorsearch/base.rb', line 108

def create_default_schema
  raise NotImplementedError, "#{self.class.name} does not support creating a default schema"
end

#destroy_default_schema ⇒ `Object`

Method supported by Vectorsearch DB to delete the default schema

Raises:

(NotImplementedError)



113
114
115

# File 'lib/langchain/vectorsearch/base.rb', line 113

def destroy_default_schema
  raise NotImplementedError, "#{self.class.name} does not support deleting a default schema"
end

#generate_hyde_prompt(question:) ⇒ `String`

HyDE-style prompt

Parameters:

User's (String) —

question

Returns:

(String) —

Prompt

# File 'lib/langchain/vectorsearch/base.rb', line 163

def generate_hyde_prompt(question:)
  prompt_template = Langchain::Prompt.load_from_path(
    # Zero-shot prompt to generate a hypothetical document based on a given question
    file_path: Langchain.root.join("langchain/vectorsearch/prompts/hyde.yaml")
  )
  prompt_template.format(question: question)
end

#generate_rag_prompt(question:, context:) ⇒ `String`

Retrieval Augmented Generation (RAG)

Parameters:

question (String) —

User’s question
context (String) —

The context to synthesize the answer from

Returns:

(String) —

Prompt

# File 'lib/langchain/vectorsearch/base.rb', line 176

def generate_rag_prompt(question:, context:)
  prompt_template = Langchain::Prompt.load_from_path(
    file_path: Langchain.root.join("langchain/vectorsearch/prompts/rag.yaml")
  )
  prompt_template.format(question: question, context: context)
end

#get_default_schema ⇒ `Object`

Method supported by Vectorsearch DB to retrieve a default schema

Raises:

(NotImplementedError)



103
104
105

# File 'lib/langchain/vectorsearch/base.rb', line 103

def get_default_schema
  raise NotImplementedError, "#{self.class.name} does not support retrieving a default schema"
end

#remove_texts ⇒ `Object`

Method supported by Vectorsearch DB to delete a list of texts from the index

Raises:

(NotImplementedError)



128
129
130

# File 'lib/langchain/vectorsearch/base.rb', line 128

def remove_texts(...)
  raise NotImplementedError, "#{self.class.name} does not support deleting texts"
end

#similarity_search ⇒ `Object`

Method supported by Vectorsearch DB to search for similar texts in the index

Raises:

(NotImplementedError)



133
134
135

# File 'lib/langchain/vectorsearch/base.rb', line 133

def similarity_search(...)
  raise NotImplementedError, "#{self.class.name} does not support similarity search"
end

#similarity_search_by_vector ⇒ `Object`

Method supported by Vectorsearch DB to search for similar texts in the index by the passed in vector. You must generate your own vector using the same LLM that generated the embeddings stored in the Vectorsearch DB.

Raises:

(NotImplementedError)



150
151
152

# File 'lib/langchain/vectorsearch/base.rb', line 150

def similarity_search_by_vector(...)
  raise NotImplementedError, "#{self.class.name} does not support similarity search by vector"
end

#similarity_search_with_hyde(query:, k: 4) ⇒ `String`

Paper: arxiv.org/abs/2212.10496 Hypothetical Document Embeddings (HyDE)-augmented similarity search

Parameters:

query (String) —

The query to search for
k (Integer) (defaults to: 4) —

The number of results to return

Returns:

(String) —

Response

# File 'lib/langchain/vectorsearch/base.rb', line 143

def similarity_search_with_hyde(query:, k: 4)
  hyde_completion = llm.complete(prompt: generate_hyde_prompt(question: query)).completion
  similarity_search(query: hyde_completion, k: k)
end

#update_texts ⇒ `Object`

Method supported by Vectorsearch DB to update a list of texts to the index

Raises:

(NotImplementedError)



123
124
125

# File 'lib/langchain/vectorsearch/base.rb', line 123

def update_texts(...)
  raise NotImplementedError, "#{self.class.name} does not support updating texts"
end

Class: Langchain::Vectorsearch::Base

Overview

Vector Databases

Available vector databases

Usage

Schema Creation

Adding Data

Retrieving Data

Direct Known Subclasses

Constant Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from DependencyHelper

Constructor Details

#initialize(llm:) ⇒ Base

Instance Attribute Details

#client ⇒ Object (readonly)

#index_name ⇒ Object (readonly)

#llm ⇒ Object (readonly)

Class Method Details

.logger_options ⇒ Object

Instance Method Details

#add_data(paths:, options: {}, chunker: Langchain::Chunker::Text) ⇒ Object

#add_texts ⇒ Object

#ask ⇒ Object

#create_default_schema ⇒ Object

#destroy_default_schema ⇒ Object

#generate_hyde_prompt(question:) ⇒ String

#generate_rag_prompt(question:, context:) ⇒ String

#get_default_schema ⇒ Object

#remove_texts ⇒ Object

#similarity_search ⇒ Object

#similarity_search_by_vector ⇒ Object

#similarity_search_with_hyde(query:, k: 4) ⇒ String

#update_texts ⇒ Object

#initialize(llm:) ⇒ `Base`

#client ⇒ `Object` (readonly)

#index_name ⇒ `Object` (readonly)

#llm ⇒ `Object` (readonly)

.logger_options ⇒ `Object`

#add_data(paths:, options: {}, chunker: Langchain::Chunker::Text) ⇒ `Object`

#add_texts ⇒ `Object`

#ask ⇒ `Object`

#create_default_schema ⇒ `Object`

#destroy_default_schema ⇒ `Object`

#generate_hyde_prompt(question:) ⇒ `String`

#generate_rag_prompt(question:, context:) ⇒ `String`

#get_default_schema ⇒ `Object`

#remove_texts ⇒ `Object`

#similarity_search ⇒ `Object`

#similarity_search_by_vector ⇒ `Object`

#similarity_search_with_hyde(query:, k: 4) ⇒ `String`

#update_texts ⇒ `Object`