ruby-claw 🦀

AI Agent framework for Ruby. Built on ruby-mana.

What is Claw?

Claw turns ruby-mana's embedded LLM engine into a full agent with persistent memory, interactive chat, and session recovery. Think of it as the agent layer on top of mana's execution engine.

gem install ruby-claw

Features

Interactive TUI

Running claw launches a full-screen terminal UI (built on Charm Ruby's bubbletea) with 4 zones: top status bar, left chat panel, right status panel, and bottom command bar.

Claw.chat still works for the legacy REPL mode:

require "claw"
Claw.chat

Auto-detects Ruby code vs natural language
Streaming output with markdown rendering
! prefix forces Ruby eval
Session persists across restarts

Persistent Memory

Claw stores memories as human-readable Markdown in .ruby-claw/:

.ruby-claw/
  MEMORY.md          # Long-term facts (editable!)
  session.md         # Conversation summary
  system_prompt.md   # Custom agent personality
  values.json        # Variable snapshots
  definitions.rb     # Method definitions
  log/
    2026-03-29.md    # Daily interaction log
  traces/
    20260405_103000.md  # Execution traces
  evolution/
    20260405_accept.md  # Evolution logs
  gems/              # Editable gem source (after claw init)

The LLM can remember facts that persist across sessions:

claw> remember that the API uses OAuth2
claw> # ... next session ...
claw> what auth does our API use?
# => "OAuth2 — I remembered this from a previous session"

Runtime Persistence

Variables and method definitions survive across sessions:

claw> a = 42
claw> def greet(name) = "Hello #{name}"
claw> exit

$ claw  # restart
claw> a        # => 42
claw> greet("world")  # => "Hello world"

Memory Compaction

When conversation grows large, old messages are automatically summarized in the background.

Incognito Mode

Temporarily disable memory loading and saving:

Claw.incognito do
  ~"translate <text> to French, store in <french>"
  # No memories loaded, nothing remembered
end

Claw::Memory.incognito?  # => true inside the block

Keyword Memory Search

With many memories (>20), only the most relevant are injected into prompts.

Reversible Runtime

Snapshot and rollback the entire agent state (context, memory, variables, filesystem):

claw> /snapshot before-refactor
  ✓ snapshot #2 created (before-refactor)

claw> # ... make changes ...

claw> /rollback 2
  ✓ rolled back to snapshot #2

REPL commands: | Command | Description | |---------|-------------| | /snapshot [label] | Snapshot all resources | | /rollback <id> | Rollback to a snapshot | | /diff [id_a id_b] | Show diff between snapshots | | /history | List all snapshots | | /status | Show current resource state | | /evolve | Run a self-evolution cycle | | /role <name> | Switch agent role/identity | | /forge <method> | Promote a method to a formal tool |

Plan Mode

/plan toggles plan mode. When active, the LLM generates a step-by-step plan without executing any tools. The user reviews the proposed steps, then confirms execution -- which runs in a safe fork so the original state is preserved if anything goes wrong.

Roles

Role files are Markdown documents stored in .ruby-claw/roles/. Each role defines an agent identity (system prompt, constraints, tool permissions).

/role <name> switches the active agent identity at runtime
claw init creates a default role

Benchmark

claw benchmark run executes the benchmark suite -- 9 built-in tasks spanning the mana, claw, runtime, and evolution layers. Each task runs 3 times, and scoring covers:

Correctness -- did the agent produce the right result?
Rounds efficiency -- how many LLM round-trips were needed?
Token efficiency -- total token usage
Tool path accuracy -- did the agent call the expected tools in the expected order?

claw benchmark diff <a> <b> compares two benchmark reports side by side. Auto-triggers an evolution cycle on score regression or 3 consecutive failures.

Multi-Agent

runtime.fork_async(prompt:, vars:, role:) spawns a child agent that runs in an isolated thread with deep-copied variables and an optional git worktree for filesystem isolation.

Child lifecycle methods:

child.join -- block until the child finishes
child.cancel! -- abort the child
child.diff -- inspect changes made by the child
child.merge! -- merge the child's results back into the parent

All operations are thread-safe with Mutex protection.

Execution Traces

Every LLM interaction is logged as a Markdown file in .ruby-claw/traces/:

# Task: compute average of numbers
- Model: claude-sonnet-4-20250514
- Steps: 2
- Total tokens: 1100 in / 350 out
- Total latency: 1400ms

## Step 1
- Latency: 800ms
- Tokens: 500 in / 200 out
### Tool calls
- **read_var**(name: "numbers") -> [1, 2, 3]

Tool System

Claw has a three-layer tool architecture:

Core tools (always loaded): read_var, write_var, call_func, eval, remember, search_tools, load_tool
Project tools (on-demand): .ruby-claw/tools/*.rb — indexed at startup, loaded via load_tool
Hub tools (remote): community tools from a ruby-claw-toolhub, downloaded on demand

Create a project tool:

# .ruby-claw/tools/format_report.rb
class FormatReport
  include Claw::Tool
  tool_name   "format_report"
  description "Format raw data into a readable report"
  parameter   :data,  type: "Hash",   required: true,  desc: "Raw data"
  parameter   :style, type: "String", required: false, desc: "brief or detailed"

  def call(data:, style: "brief")
    # ...
  end
end

The agent discovers tools via search_tools and loads them via load_tool. Use /forge <method_name> to promote an eval-defined method into a formal tool class.

Web Console

claw console launches a local web UI at http://127.0.0.1:4567 for observability and operations:

Dashboard — version, tool/memory/snapshot counts
Prompt Inspector — view and edit the assembled system prompt
LLM Monitor — real-time event stream via Server-Sent Events
Trace Explorer — browse execution traces
Memory Manager — add/remove long-term memories
Tool Manager — view core tools, load/unload project tools
Snapshot Manager — create snapshots, rollback state

All data is served via a REST API (/api/status, /api/traces, /api/memory, etc.).

Project Scaffolding

Initialize a project with editable gem source for self-evolution:

claw init

Creates:

.ruby-claw/
  gems/
    ruby-claw/    # Editable source
    ruby-mana/
  tools/            # Project tool classes
  roles/            # Agent role definitions
  benchmarks/       # Benchmark reports
  system_prompt.md  # Customizable agent personality
  MEMORY.md
  .git/             # Filesystem snapshots

Self-Evolution

The agent can improve its own code:

claw> /evolve
  ⚡ running evolution cycle...
  ✓ accepted: Improve error message specificity

Flow: read traces → LLM diagnoses improvement → fork runtime → apply change → run tests → keep or rollback.

Evolution logs are written to .ruby-claw/evolution/.

CLI Subcommands

Command	Description
`claw`	Launch the TUI (default)
`claw init`	Scaffold a new project
`claw status`	Show current resource state
`claw history`	List all snapshots
`claw rollback <id>`	Rollback to a snapshot
`claw trace [id]`	View execution traces
`claw evolve`	Run a self-evolution cycle
`claw benchmark run`	Run the benchmark suite
`claw benchmark diff <a> <b>`	Compare two benchmark reports
`claw console`	Launch the web console UI
`claw version`	Print version
`claw help`	Show help

Configuration

Claw.configure do |c|
  c.memory_pressure = 0.7       # Compact when tokens > 70% of context window
  c.memory_keep_recent = 4      # Keep last 4 conversation rounds during compaction
  c.compact_model = nil          # nil = use main model for summarization
  c.persist_session = true       # Save/restore session across restarts
  c.memory_top_k = 10           # Max memories to inject when searching
  c.on_compact = ->(summary) { puts summary }
  c.tools_dir = nil              # Custom tools directory (default: .ruby-claw/tools)
  c.hub_url = nil                # Remote tool hub URL
  c.console_port = 4567          # Web console port
end

# Mana config (inherited)
Mana.configure do |c|
  c.model = "claude-sonnet-4-6"
  c.api_key = "sk-..."
end

Architecture

Claw extends mana via its tool registration interface — no monkey-patching:

# Claw registers the "remember" tool into mana's engine
Mana.register_tool(remember_tool_definition) { |input| ... }

# Claw injects long-term memories into mana's system prompt
Mana.register_prompt_section { |context| memory_text }

ruby-mana = Embedded LLM engine (~"..." syntax, binding manipulation, tool calling)
ruby-claw = Agent framework (chat REPL, memory, persistence, knowledge)

Claw depends on mana. You can use mana standalone for embedding LLM in Ruby code, or add claw for interactive agent features.

License

MIT