Class: RubyLLM::SemanticCache::Middleware

Inherits:
Object
  • Object
show all
Defined in:
lib/ruby_llm/semantic_cache/middleware.rb

Overview

Middleware wrapper for RubyLLM::Chat that automatically caches responses

Examples:

Basic usage

chat = RubyLLM.chat(model: "gpt-5.2")
cached_chat = RubyLLM::SemanticCache.wrap(chat)
cached_chat.ask("What is 2+2?")  # First call - executes LLM

With custom threshold

cached_chat = RubyLLM::SemanticCache.wrap(chat, threshold: 0.95)

Direct Known Subclasses

ScopedMiddleware

Constant Summary collapse

DELEGATED_METHODS =

Methods to delegate directly to the wrapped chat (no caching)

i[
  model messages tools params headers schema
  with_instructions with_tool with_tools with_model
  with_temperature with_context with_params with_headers with_schema
  on_new_message on_end_message on_tool_call on_tool_result
  each reset_messages!
].freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(chat, threshold: nil, ttl: nil, on_cache_hit: nil, max_messages: nil) ⇒ Middleware

Returns a new instance of Middleware.

Parameters:

  • chat (RubyLLM::Chat)

    the chat instance to wrap

  • threshold (Float, nil) (defaults to: nil)

    similarity threshold override

  • ttl (Integer, nil) (defaults to: nil)

    TTL override in seconds

  • on_cache_hit (Proc, nil) (defaults to: nil)

    callback when cache hit occurs, receives (chat, user_message, cached_response)

  • max_messages (Integer, :unlimited, false, nil) (defaults to: nil)

    max conversation messages before skipping cache

    • Integer: skip cache after N messages (default: 1, only first message cached)

    • :unlimited or false: cache all messages regardless of conversation length

    • nil: use config default



37
38
39
40
41
42
43
# File 'lib/ruby_llm/semantic_cache/middleware.rb', line 37

def initialize(chat, threshold: nil, ttl: nil, on_cache_hit: nil, max_messages: nil)
  @chat = chat
  @threshold = threshold
  @ttl = ttl
  @on_cache_hit = on_cache_hit
  @max_messages = max_messages
end

Instance Attribute Details

#chatObject (readonly)

Returns the value of attribute chat.



27
28
29
# File 'lib/ruby_llm/semantic_cache/middleware.rb', line 27

def chat
  @chat
end

Instance Method Details

#ask(message = nil, with: nil, &block) ⇒ RubyLLM::Message Also known as: say

Ask a question with automatic caching

Parameters:

  • message (String) (defaults to: nil)

    the message to send

  • with (Object) (defaults to: nil)

    attachments to include

Returns:

  • (RubyLLM::Message)

    the response message



49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# File 'lib/ruby_llm/semantic_cache/middleware.rb', line 49

def ask(message = nil, with: nil, &block)
  # Skip caching if message has attachments
  return @chat.ask(message, with: with, &block) if with

  # Skip caching for tool-enabled chats (responses may vary)
  return @chat.ask(message, with: with, &block) if @chat.tools.any?

  # Skip caching if conversation exceeds max_messages (excluding system messages)
  return @chat.ask(message, with: with, &block) if conversation_too_long?

  # Skip caching for streaming (too complex to handle correctly)
  return @chat.ask(message, with: with, &block) if block_given?

  # Use cache for non-streaming
  cache_key = build_cache_key(message)

  cached = cache_lookup(cache_key)
  if cached
    handle_cache_hit(message, cached)
    return cached
  end

  # Execute the actual LLM call
  response = @chat.ask(message)

  # Cache the response
  store_in_cache(cache_key, response)
  RubyLLM::SemanticCache.record_miss!

  response
end