Class: HTM::LongTermMemory

Inherits:
Object
  • Object
show all
Defined in:
lib/htm/long_term_memory.rb

Overview

Long-term Memory - PostgreSQL/TimescaleDB-backed permanent storage

LongTermMemory provides durable storage for all memory nodes with:

  • Vector similarity search (RAG)

  • Full-text search

  • Time-range queries

  • Relationship graphs

  • Tag system

  • ActiveRecord ORM for data access

  • Query result caching for efficiency

Constant Summary collapse

DEFAULT_QUERY_TIMEOUT =

milliseconds (30 seconds)

30_000
MAX_VECTOR_DIMENSION =

Maximum supported dimension with HNSW index (pgvector limitation)

2000
DEFAULT_CACHE_SIZE =

Number of queries to cache

1000
DEFAULT_CACHE_TTL =

Cache lifetime in seconds (5 minutes)

300

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(config, pool_size: nil, query_timeout: DEFAULT_QUERY_TIMEOUT, cache_size: DEFAULT_CACHE_SIZE, cache_ttl: DEFAULT_CACHE_TTL) ⇒ LongTermMemory

Returns a new instance of LongTermMemory.



28
29
30
31
32
33
34
35
36
37
38
39
40
# File 'lib/htm/long_term_memory.rb', line 28

def initialize(config, pool_size: nil, query_timeout: DEFAULT_QUERY_TIMEOUT, cache_size: DEFAULT_CACHE_SIZE, cache_ttl: DEFAULT_CACHE_TTL)
  @config = config
  @query_timeout = query_timeout  # in milliseconds

  # Set statement timeout for ActiveRecord queries
  ActiveRecord::Base.connection.execute("SET statement_timeout = #{@query_timeout}")

  # Initialize query result cache (disable with cache_size: 0)
  if cache_size > 0
    @query_cache = LruRedux::TTL::ThreadSafeCache.new(cache_size, cache_ttl)
    @cache_stats = { hits: 0, misses: 0 }
  end
end

Instance Attribute Details

#query_timeoutObject (readonly)

Returns the value of attribute query_timeout.



26
27
28
# File 'lib/htm/long_term_memory.rb', line 26

def query_timeout
  @query_timeout
end

Instance Method Details

#add(content:, source:, token_count: 0, robot_id:, embedding: nil) ⇒ Integer

Add a node to long-term memory

Embeddings should be generated client-side and provided via the embedding parameter.

Parameters:

  • content (String)

    Conversation message/utterance

  • speaker (String)

    Who said it: ‘user’ or robot name

  • token_count (Integer) (defaults to: 0)

    Token count

  • robot_id (String)

    Robot identifier

  • embedding (Array<Float>, nil) (defaults to: nil)

    Pre-generated embedding vector

Returns:

  • (Integer)

    Node database ID



53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# File 'lib/htm/long_term_memory.rb', line 53

def add(content:, source:, token_count: 0, robot_id:, embedding: nil)
  # Prepare embedding if provided
  if embedding
    # Pad embedding to 2000 dimensions if needed
    actual_dimension = embedding.length
    if actual_dimension < 2000
      padded_embedding = embedding + Array.new(2000 - actual_dimension, 0.0)
    else
      padded_embedding = embedding
    end
    embedding_str = "[#{padded_embedding.join(',')}]"
  end

  # Create node using ActiveRecord
  node = HTM::Models::Node.create!(
    content: content,
    source: source,
    token_count: token_count,
    robot_id: robot_id,
    embedding: embedding ? embedding_str : nil,
    embedding_dimension: embedding ? embedding.length : nil
  )

  # Invalidate cache since database content changed
  invalidate_cache!

  node.id
end

#add_tag(node_id:, tag:) ⇒ void

This method returns an undefined value.

Add a tag to a node

Parameters:

  • node_id (Integer)

    Node database ID

  • tag (String)

    Tag name



231
232
233
234
235
236
237
238
239
# File 'lib/htm/long_term_memory.rb', line 231

def add_tag(node_id:, tag:)
  tag_record = HTM::Models::Tag.find_or_create_by(name: tag)
  HTM::Models::NodeTag.create(
    node_id: node_id,
    tag_id: tag_record.id
  )
rescue ActiveRecord::RecordNotUnique
  # Tag association already exists, ignore
end

#calculate_relevance(node:, query_tags: [], vector_similarity: nil) ⇒ Float

Calculate dynamic relevance score for a node given query context

Combines multiple signals:

  • Vector similarity (semantic match)

  • Tag overlap (categorical match)

  • Recency (freshness)

  • Access frequency (popularity/utility)

Parameters:

  • node (Hash)

    Node data with similarity, tags, created_at, access_count

  • query_tags (Array<String>) (defaults to: [])

    Tags associated with the query

  • vector_similarity (Float, nil) (defaults to: nil)

    Pre-computed vector similarity (0-1)

Returns:

  • (Float)

    Composite relevance score (0-10)



413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
# File 'lib/htm/long_term_memory.rb', line 413

def calculate_relevance(node:, query_tags: [], vector_similarity: nil)
  # 1. Vector similarity (semantic match) - weight: 0.5
  semantic_score = if vector_similarity
    vector_similarity
  elsif node['similarity']
    node['similarity'].to_f
  else
    0.5  # Neutral if no embedding
  end

  # 2. Tag overlap (categorical relevance) - weight: 0.3
  node_tags = get_node_tags(node['id'])
  tag_score = if query_tags.any? && node_tags.any?
    weighted_hierarchical_jaccard(query_tags, node_tags)
  else
    0.5  # Neutral if no tags
  end

  # 3. Recency (temporal relevance) - weight: 0.1
  age_hours = (Time.now - Time.parse(node['created_at'].to_s)) / 3600.0
  recency_score = Math.exp(-age_hours / 168.0)  # 1-week half-life

  # 4. Access frequency (behavioral signal) - weight: 0.1
  access_count = node['access_count'] || 0
  access_score = Math.log(1 + access_count) / 10.0  # Normalize to 0-1

  # Weighted composite (scale to 0-10)
  relevance = (
    (semantic_score * 0.5) +
    (tag_score * 0.3) +
    (recency_score * 0.1) +
    (access_score * 0.1)
  ) * 10.0

  relevance.clamp(0.0, 10.0)
end

#delete(node_id) ⇒ void

This method returns an undefined value.

Delete a node

Parameters:

  • node_id (Integer)

    Node database ID



115
116
117
118
119
120
121
# File 'lib/htm/long_term_memory.rb', line 115

def delete(node_id)
  node = HTM::Models::Node.find_by(id: node_id)
  node&.destroy

  # Invalidate cache since database content changed
  invalidate_cache!
end

#exists?(node_id) ⇒ Boolean

Check if a node exists

Parameters:

  • node_id (Integer)

    Node database ID

Returns:

  • (Boolean)

    True if node exists



128
129
130
# File 'lib/htm/long_term_memory.rb', line 128

def exists?(node_id)
  HTM::Models::Node.exists?(node_id)
end

#get_node_tags(node_id) ⇒ Array<String>

Get tags for a specific node

Parameters:

  • node_id (Integer)

    Node database ID

Returns:

  • (Array<String>)

    Tag names



503
504
505
506
507
508
509
510
# File 'lib/htm/long_term_memory.rb', line 503

def get_node_tags(node_id)
  HTM::Models::Tag
    .joins(:node_tags)
    .where(node_tags: { node_id: node_id })
    .pluck(:name)
rescue
  []
end

#mark_evicted(node_ids) ⇒ void

This method returns an undefined value.

Mark nodes as evicted from working memory

Parameters:

  • node_ids (Array<Integer>)

    Node IDs



246
247
248
249
250
# File 'lib/htm/long_term_memory.rb', line 246

def mark_evicted(node_ids)
  return if node_ids.empty?

  HTM::Models::Node.where(id: node_ids).update_all(in_working_memory: false)
end

#node_topics(node_id) ⇒ Array<String>

Get topics for a specific node

Parameters:

  • node_id (Integer)

    Node database ID

Returns:

  • (Array<String>)

    Topic paths



392
393
394
395
396
397
398
# File 'lib/htm/long_term_memory.rb', line 392

def node_topics(node_id)
  HTM::Models::Tag
    .joins(:node_tags)
    .where(node_tags: { node_id: node_id })
    .order(:name)
    .pluck(:name)
end

#nodes_by_topic(topic_path, exact: false, limit: 50) ⇒ Array<Hash>

Retrieve nodes by ontological topic

Parameters:

  • topic_path (String)

    Topic hierarchy path

  • exact (Boolean) (defaults to: false)

    Exact match or prefix match

  • limit (Integer) (defaults to: 50)

    Maximum results

Returns:

  • (Array<Hash>)

    Matching nodes



332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
# File 'lib/htm/long_term_memory.rb', line 332

def nodes_by_topic(topic_path, exact: false, limit: 50)
  if exact
    nodes = HTM::Models::Node
      .joins(:tags)
      .where(tags: { name: topic_path })
      .distinct
      .order(created_at: :desc)
      .limit(limit)
  else
    nodes = HTM::Models::Node
      .joins(:tags)
      .where("tags.name LIKE ?", "#{topic_path}%")
      .distinct
      .order(created_at: :desc)
      .limit(limit)
  end

  nodes.map(&:attributes)
end

#ontology_structureArray<Hash>

Get ontology structure view

Returns:

  • (Array<Hash>)

    Ontology structure



356
357
358
359
360
361
# File 'lib/htm/long_term_memory.rb', line 356

def ontology_structure
  result = ActiveRecord::Base.connection.select_all(
    "SELECT * FROM ontology_structure WHERE root_topic IS NOT NULL ORDER BY root_topic, level1_topic, level2_topic"
  )
  result.to_a
end

#pool_sizeObject

For backwards compatibility with tests/code that expect pool_size



321
322
323
# File 'lib/htm/long_term_memory.rb', line 321

def pool_size
  ActiveRecord::Base.connection_pool.size
end

Get most popular tags

Parameters:

  • limit (Integer) (defaults to: 20)

    Number of tags to return

  • timeframe (Range, nil) (defaults to: nil)

    Optional time range filter

Returns:

  • (Array<Hash>)

    Tags with usage counts



562
563
564
565
566
567
568
569
570
571
572
573
574
575
# File 'lib/htm/long_term_memory.rb', line 562

def popular_tags(limit: 20, timeframe: nil)
  query = HTM::Models::Tag
    .joins(:node_tags)
    .joins('INNER JOIN nodes ON nodes.id = node_tags.node_id')
    .group('tags.id', 'tags.name')
    .select('tags.name, COUNT(node_tags.id) as usage_count')

  query = query.where('nodes.created_at >= ? AND nodes.created_at <= ?', timeframe.begin, timeframe.end) if timeframe

  query
    .order('usage_count DESC')
    .limit(limit)
    .map { |tag| { name: tag.name, usage_count: tag.usage_count } }
end

#register_robot(robot_name) ⇒ void

This method returns an undefined value.

Register a robot

Parameters:

  • robot_id (String)

    Robot identifier

  • robot_name (String)

    Robot name



274
275
276
277
278
# File 'lib/htm/long_term_memory.rb', line 274

def register_robot(robot_name)
  robot = HTM::Models::Robot.find_or_create_by(name: robot_name)
  robot.update(last_active: Time.current)
  robot.id
end

#retrieve(node_id) ⇒ Hash?

Retrieve a node by ID

Automatically tracks access by incrementing access_count and updating last_accessed

Parameters:

  • node_id (Integer)

    Node database ID

Returns:

  • (Hash, nil)

    Node data or nil



89
90
91
92
93
94
95
96
97
98
# File 'lib/htm/long_term_memory.rb', line 89

def retrieve(node_id)
  node = HTM::Models::Node.find_by(id: node_id)
  return nil unless node

  # Track access (atomic increment)
  node.increment!(:access_count)
  node.touch(:last_accessed)

  node.attributes
end

#search(timeframe:, query:, limit:, embedding_service:) ⇒ Array<Hash>

Vector similarity search

Parameters:

  • timeframe (Range)

    Time range to search

  • query (String)

    Search query

  • limit (Integer)

    Maximum results

  • embedding_service (Object)

    Service to generate embeddings

Returns:

  • (Array<Hash>)

    Matching nodes



140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
# File 'lib/htm/long_term_memory.rb', line 140

def search(timeframe:, query:, limit:, embedding_service:)
  # Return uncached if cache disabled
  return search_uncached(timeframe: timeframe, query: query, limit: limit, embedding_service: embedding_service) unless @query_cache

  # Generate cache key
  cache_key = cache_key_for(:search, timeframe, query, limit)

  # Try to get from cache
  cached = @query_cache[cache_key]
  if cached
    @cache_stats[:hits] += 1
    return cached
  end

  # Cache miss - execute query
  @cache_stats[:misses] += 1
  result = search_uncached(timeframe: timeframe, query: query, limit: limit, embedding_service: embedding_service)

  # Store in cache
  @query_cache[cache_key] = result
  result
end

#search_by_tags(tags:, match_all: false, timeframe: nil, limit: 20) ⇒ Array<Hash>

Search nodes by tags

Parameters:

  • tags (Array<String>)

    Tags to search for

  • match_all (Boolean) (defaults to: false)

    If true, match ALL tags; if false, match ANY tag

  • timeframe (Range, nil) (defaults to: nil)

    Optional time range filter

  • limit (Integer) (defaults to: 20)

    Maximum results

Returns:

  • (Array<Hash>)

    Matching nodes with relevance scores



520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
# File 'lib/htm/long_term_memory.rb', line 520

def search_by_tags(tags:, match_all: false, timeframe: nil, limit: 20)
  return [] if tags.empty?

  # Build base query
  query = HTM::Models::Node
    .joins(:tags)
    .where(tags: { name: tags })
    .distinct

  # Apply timeframe filter if provided
  query = query.where(created_at: timeframe) if timeframe

  if match_all
    # Match ALL tags (intersection)
    query = query
      .group('nodes.id')
      .having('COUNT(DISTINCT tags.name) = ?', tags.size)
  end

  # Get results
  nodes = query.limit(limit).map(&:attributes)

  # Calculate relevance and enrich with tags
  nodes.map do |node|
    relevance = calculate_relevance(
      node: node,
      query_tags: tags
    )

    node.merge({
      'relevance' => relevance,
      'tags' => get_node_tags(node['id'])
    })
  end.sort_by { |n| -n['relevance'] }
end

#search_fulltext(timeframe:, query:, limit:) ⇒ Array<Hash>

Full-text search

Parameters:

  • timeframe (Range)

    Time range to search

  • query (String)

    Search query

  • limit (Integer)

    Maximum results

Returns:

  • (Array<Hash>)

    Matching nodes



170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
# File 'lib/htm/long_term_memory.rb', line 170

def search_fulltext(timeframe:, query:, limit:)
  # Return uncached if cache disabled
  return search_fulltext_uncached(timeframe: timeframe, query: query, limit: limit) unless @query_cache

  # Generate cache key
  cache_key = cache_key_for(:fulltext, timeframe, query, limit)

  # Try to get from cache
  cached = @query_cache[cache_key]
  if cached
    @cache_stats[:hits] += 1
    return cached
  end

  # Cache miss - execute query
  @cache_stats[:misses] += 1
  result = search_fulltext_uncached(timeframe: timeframe, query: query, limit: limit)

  # Store in cache
  @query_cache[cache_key] = result
  result
end

#search_hybrid(timeframe:, query:, limit:, embedding_service:, prefilter_limit: 100) ⇒ Array<Hash>

Hybrid search (full-text + vector)

Parameters:

  • timeframe (Range)

    Time range to search

  • query (String)

    Search query

  • limit (Integer)

    Maximum results

  • embedding_service (Object)

    Service to generate embeddings

  • prefilter_limit (Integer) (defaults to: 100)

    Candidates to consider (default: 100)

Returns:

  • (Array<Hash>)

    Matching nodes



202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
# File 'lib/htm/long_term_memory.rb', line 202

def search_hybrid(timeframe:, query:, limit:, embedding_service:, prefilter_limit: 100)
  # Return uncached if cache disabled
  return search_hybrid_uncached(timeframe: timeframe, query: query, limit: limit, embedding_service: embedding_service, prefilter_limit: prefilter_limit) unless @query_cache

  # Generate cache key
  cache_key = cache_key_for(:hybrid, timeframe, query, limit, prefilter_limit)

  # Try to get from cache
  cached = @query_cache[cache_key]
  if cached
    @cache_stats[:hits] += 1
    return cached
  end

  # Cache miss - execute query
  @cache_stats[:misses] += 1
  result = search_hybrid_uncached(timeframe: timeframe, query: query, limit: limit, embedding_service: embedding_service, prefilter_limit: prefilter_limit)

  # Store in cache
  @query_cache[cache_key] = result
  result
end

#search_with_relevance(timeframe:, query: nil, query_tags: [], limit: 20, embedding_service: nil) ⇒ Array<Hash>

Search with dynamic relevance scoring

Returns nodes with calculated relevance scores based on query context

Parameters:

  • timeframe (Range)

    Time range to search

  • query (String, nil) (defaults to: nil)

    Search query

  • query_tags (Array<String>) (defaults to: [])

    Tags to match

  • limit (Integer) (defaults to: 20)

    Maximum results

  • embedding_service (Object, nil) (defaults to: nil)

    Service to generate embeddings

Returns:

  • (Array<Hash>)

    Nodes with relevance scores



461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
# File 'lib/htm/long_term_memory.rb', line 461

def search_with_relevance(timeframe:, query: nil, query_tags: [], limit: 20, embedding_service: nil)
  # Get candidates from appropriate search method
  candidates = if query && embedding_service
    # Vector search
    search_uncached(timeframe: timeframe, query: query, limit: limit * 2, embedding_service: embedding_service)
  elsif query
    # Full-text search
    search_fulltext_uncached(timeframe: timeframe, query: query, limit: limit * 2)
  else
    # Time-range only
    HTM::Models::Node
      .where(created_at: timeframe)
      .order(created_at: :desc)
      .limit(limit * 2)
      .map(&:attributes)
  end

  # Calculate relevance for each candidate
  scored_nodes = candidates.map do |node|
    relevance = calculate_relevance(
      node: node,
      query_tags: query_tags,
      vector_similarity: node['similarity']&.to_f
    )

    node.merge({
      'relevance' => relevance,
      'tags' => get_node_tags(node['id'])
    })
  end

  # Sort by relevance and return top K
  scored_nodes
    .sort_by { |n| -n['relevance'] }
    .take(limit)
end

#shutdownObject

Shutdown - no-op with ActiveRecord (connection pool managed by ActiveRecord)



315
316
317
318
# File 'lib/htm/long_term_memory.rb', line 315

def shutdown
  # ActiveRecord handles connection pool shutdown
  # This method kept for API compatibility
end

#statsHash

Get memory statistics

Returns:

  • (Hash)

    Statistics



294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
# File 'lib/htm/long_term_memory.rb', line 294

def stats
  base_stats = {
    total_nodes: HTM::Models::Node.count,
    nodes_by_robot: HTM::Models::Node.group(:robot_id).count,
    total_tags: HTM::Models::Tag.count,
    oldest_memory: HTM::Models::Node.minimum(:created_at),
    newest_memory: HTM::Models::Node.maximum(:created_at),
    active_robots: HTM::Models::Robot.count,
    robot_activity: HTM::Models::Robot.select(:id, :name, :last_active).map(&:attributes),
    database_size: ActiveRecord::Base.connection.select_value("SELECT pg_database_size(current_database())").to_i
  }

  # Include cache statistics if cache is enabled
  if @query_cache
    base_stats[:cache] = cache_stats
  end

  base_stats
end

#topic_relationships(min_shared_nodes: 2, limit: 50) ⇒ Array<Hash>

Get topic relationships (co-occurrence)

Parameters:

  • min_shared_nodes (Integer) (defaults to: 2)

    Minimum shared nodes

  • limit (Integer) (defaults to: 50)

    Maximum relationships

Returns:

  • (Array<Hash>)

    Topic relationships



369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
# File 'lib/htm/long_term_memory.rb', line 369

def topic_relationships(min_shared_nodes: 2, limit: 50)
  result = ActiveRecord::Base.connection.select_all(
    <<~SQL,
      SELECT t1.name AS topic1, t2.name AS topic2, COUNT(DISTINCT nt1.node_id) AS shared_nodes
      FROM tags t1
      JOIN node_tags nt1 ON t1.id = nt1.tag_id
      JOIN node_tags nt2 ON nt1.node_id = nt2.node_id
      JOIN tags t2 ON nt2.tag_id = t2.id
      WHERE t1.name < t2.name
      GROUP BY t1.name, t2.name
      HAVING COUNT(DISTINCT nt1.node_id) >= #{min_shared_nodes.to_i}
      ORDER BY shared_nodes DESC
      LIMIT #{limit.to_i}
    SQL
  )
  result.to_a
end

#track_access(node_ids) ⇒ void

This method returns an undefined value.

Track access for multiple nodes (bulk operation)

Updates access_count and last_accessed for all nodes in the array

Parameters:

  • node_ids (Array<Integer>)

    Node IDs that were accessed



259
260
261
262
263
264
265
266
# File 'lib/htm/long_term_memory.rb', line 259

def track_access(node_ids)
  return if node_ids.empty?

  # Atomic batch update
  HTM::Models::Node.where(id: node_ids).update_all(
    "access_count = access_count + 1, last_accessed = NOW()"
  )
end

#update_last_accessed(node_id) ⇒ void

This method returns an undefined value.

Update last_accessed timestamp

Parameters:

  • node_id (Integer)

    Node database ID



105
106
107
108
# File 'lib/htm/long_term_memory.rb', line 105

def update_last_accessed(node_id)
  node = HTM::Models::Node.find_by(id: node_id)
  node&.update(last_accessed: Time.current)
end

#update_robot_activity(robot_id) ⇒ void

This method returns an undefined value.

Update robot activity timestamp

Parameters:

  • robot_id (String)

    Robot identifier



285
286
287
288
# File 'lib/htm/long_term_memory.rb', line 285

def update_robot_activity(robot_id)
  robot = HTM::Models::Robot.find_by(id: robot_id)
  robot&.update(last_active: Time.current)
end