| 📍 NOTE |
|---|
| RubyGems (the GitHub org, not the website) suffered a hostile takeover in September 2025. |
| Ultimately 4 maintainers were hard removed and a reason has been given for only 1 of those, while 2 others resigned in protest. |
| It is a complicated story which is difficult to parse quickly. |
| Simply put - there was active policy for adding or removing maintainers/owners of rubygems and bundler, and those policies were not followed. |
| I'm adding notes like this to gems because I don't condone theft of repositories or gems from their rightful owners. |
| If a similar theft happened with my repos/gems, I'd hope some would stand up for me. |
| Disenfranchised former-maintainers have started gem.coop. |
| Once available I will publish there exclusively; unless RubyCentral makes amends with the community. |
| The "Technology for Humans: Joel Draper" podcast episode by reinteractive is the most cogent summary I'm aware of. |
| See here, here and here for more info on what comes next. |
| What I'm doing: A (WIP) proposal for bundler/gem scopes, and a (WIP) proposal for a federated gem server. |
🌴 TreeHaver
if ci_badges.map(&:color).detect { it != "green"} ☝️ let me know, as I may have missed the discord notification.
if ci_badges.map(&:color).all? { it == "green"} 👇️ send money so I can do more of this. FLOSS maintenance is now my full-time job.
🌻 Synopsis
TreeHaver is a cross-Ruby adapter for the tree-sitter and Citrus parsing libraries and other dedicated parsing tools that works seamlessly across MRI Ruby, JRuby, and TruffleRuby. It provides a unified API for parsing source code using grammars, regardless of your Ruby implementation.
The Adapter Pattern: Like Faraday, but for Parsing
If you've used Faraday, multi_json, or multi_xml, you'll feel right at home with TreeHaver. These gems share a common philosophy:
| Gem | Unified API for | Backend Examples |
|---|---|---|
| Faraday | HTTP requests | Net::HTTP, Typhoeus, Patron, Excon |
| multi_json | JSON parsing | Oj, Yajl, JSON gem |
| multi_xml | XML parsing | Nokogiri, LibXML, Ox |
| TreeHaver | Code parsing | MRI, Rust, FFI, Java, Prism, Psych, Commonmarker, Markly, Citrus (& Co.) |
Write once, run anywhere.
Learn once, write anywhere.
Just as Faraday lets you swap HTTP adapters without changing your code, TreeHaver lets you swap tree-sitter backends. Your parsing code remains the same whether you're running on MRI with native C extensions, JRuby with FFI, or TruffleRuby.
# Your code stays the same regardless of backend
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Language.from_library("/path/to/grammar.so")
tree = parser.parse(source_code)
# TreeHaver automatically picks the best backend:
# - MRI → ruby_tree_sitter (C extensions)
# - JRuby → FFI (system's libtree-sitter)
# - TruffleRuby → FFI or MRI backend
Key Features
- Universal Ruby Support: Works on MRI Ruby, JRuby, and TruffleRuby
- 10 Parsing Backends - Choose the right backend for your needs:
- Tree-sitter Backends (high-performance, incremental parsing):
- MRI Backend: Leverages
ruby_tree_sittergem (C extension, fastest on MRI) - Rust Backend: Uses
tree_stumpgem (Rust with precompiled binaries)- Note: Currently requires pboling's fork until PRs #5, #7, #11, and #13 are merged
- FFI Backend: Pure Ruby FFI bindings to
libtree-sitter(ideal for JRuby, TruffleRuby) - Java Backend: Native Java integration for JRuby with java-tree-sitter grammar JARs
- Language-Specific Backends (native parser integration):
- Prism Backend: Ruby's official parser (Prism, stdlib in Ruby 3.4+)
- Psych Backend: Ruby's YAML parser (Psych, stdlib)
- Commonmarker Backend: Fast Markdown parser (Commonmarker, comrak Rust)
- Markly Backend: GitHub Flavored Markdown (Markly, cmark-gfm C)
- Pure Ruby Fallback:
- Citrus Backend: Pure Ruby parsing via
citrus(no native dependencies)
- Automatic Backend Selection: Intelligently selects the best backend for your Ruby implementation
- Language Agnostic: Parse any language - Ruby, Markdown, YAML, JSON, Bash, TOML, JavaScript, etc.
- Grammar Discovery: Built-in
GrammarFinderutility for platform-aware grammar library discovery - Unified Position API: Consistent
start_line,end_line,source_positionacross all backends - Thread-Safe: Built-in language registry with thread-safe caching
- Minimal API Surface: Simple, focused API that covers the most common use cases
Backend Requirements
TreeHaver has minimal dependencies and automatically selects the best backend for your Ruby implementation. Each backend has specific version requirements:
MRI Backend (ruby_tree_sitter, C extensions)
Requires ruby_tree_sitter v2.0+
In ruby_tree_sitter v2.0, all TreeSitter exceptions were changed to inherit from Exception (not StandardError). This was an intentional breaking change made for thread-safety and signal handling reasons.
Exception Mapping: TreeHaver catches TreeSitter::TreeSitterError and its subclasses, converting them to TreeHaver::NotAvailable while preserving the original error message. This provides a consistent exception API across all backends:
| ruby_tree_sitter Exception | TreeHaver Exception | When It Occurs |
|---|---|---|
TreeSitter::ParserNotFoundError |
TreeHaver::NotAvailable |
Parser library file cannot be loaded |
TreeSitter::LanguageLoadError |
TreeHaver::NotAvailable |
Language symbol loads but returns nothing |
TreeSitter::SymbolNotFoundError |
TreeHaver::NotAvailable |
Symbol not found in library |
TreeSitter::ParserVersionError |
TreeHaver::NotAvailable |
Parser version incompatible with tree-sitter |
TreeSitter::QueryCreationError |
TreeHaver::NotAvailable |
Query creation fails |
# Add to your Gemfile for MRI backend
gem "ruby_tree_sitter", "~> 2.0"
Rust Backend (tree_stump)
Currently requires pboling's fork until upstream PRs are merged.
# Add to your Gemfile for Rust backend
gem "tree_stump", github: "pboling/tree_stump", branch: "tree_haver"
FFI Backend
Requires the ffi gem and a system installation of libtree-sitter:
# Add to your Gemfile for FFI backend
gem "ffi", ">= 1.15", "< 2.0"
# Install libtree-sitter on your system:
# macOS
brew install tree-sitter
# Ubuntu/Debian
apt-get install libtree-sitter0 libtree-sitter-dev
# Fedora
dnf install tree-sitter tree-sitter-devel
Citrus Backend
Pure Ruby parser with no native dependencies:
# Add to your Gemfile for Citrus backend
gem "citrus", "~> 3.0"
Java Backend (JRuby only)
No additional dependencies required beyond grammar JARs built for java-tree-sitter.
Why TreeHaver?
tree-sitter is a powerful parser generator that creates incremental parsers for many programming languages. However, integrating it into Ruby applications can be challenging:
- MRI-based C extensions don't work on JRuby
- FFI-based solutions may not be optimal for MRI
- Managing different backends for different Ruby implementations is cumbersome
TreeHaver solves these problems by providing a unified API that automatically selects the appropriate backend for your Ruby implementation, allowing you to write code once and run it anywhere.
Comparison with Other Ruby AST / Parser Bindings
| Feature | tree_haver | ruby_tree_sitter | tree_stump | citrus |
|---|---|---|---|---|
| MRI Ruby | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| JRuby | ✅ Yes (FFI, Java, or Citrus backend) | ❌ No | ❌ No | ✅ Yes |
| TruffleRuby | ✅ Yes (FFI or Citrus) | ❌ No | ❓ Unknown | ✅ Yes |
| Backend | Multi (MRI C, Rust, FFI, Java, Citrus) | C extension only | Rust extension | Pure Ruby |
| Incremental Parsing | ✅ Via MRI C/Rust/Java backend | ✅ Yes | ✅ Yes | ❌ No |
| Query API | ⚡ Via MRI/Rust/Java backend | ✅ Yes | ✅ Yes | ❌ No |
| Grammar Discovery | ✅ Built-in GrammarFinder |
❌ Manual | ❌ Manual | ❌ Manual |
| Security Validations | ✅ PathValidator |
❌ No | ❌ No | ❌ No |
| Language Registration | ✅ Thread-safe registry | ❌ No | ❌ No | ❌ No |
| Native Performance | ⚡ Backend-dependent | ✅ Native C | ✅ Native Rust | ❌ Pure Ruby |
| Precompiled Binaries | ⚡ Via Rust backend | ✅ Yes | ✅ Yes | ✅ Pure Ruby |
| Zero Native Deps | ⚡ Via Citrus backend | ❌ No | ❌ No | ✅ Yes |
| Minimum Ruby | 3.2+ | 3.0+ | 3.1+ | 0+ |
Note: Java backend works with grammar JARs built specifically for java-tree-sitter, or grammar .so files that statically link tree-sitter. This is why FFI is recommended for JRuby & TruffleRuby.
Note: TreeHaver can use ruby_tree_sitter (MRI) or tree_stump (MRI, JRuby?) as backends, or jruby-tree-sitter (JRuby), giving you TreeHaver's unified API, grammar discovery, and security features, plus full access to incremental parsing when using those backends.
Note: tree_stump currently requires pboling's fork (tree_haver branch) until upstream PRs #5, #7, #11, and #13 are merged.
When to Use Each
Choose TreeHaver when:
- You need JRuby or TruffleRuby support
- You're building a library that should work across Ruby implementations
- You want automatic grammar discovery and security validations
- You want flexibility to switch backends without code changes
- You need incremental parsing with a unified API
Choose ruby_tree_sitter directly when:
- You only target MRI Ruby
- You need the full Query API without abstraction
- You want the most battle-tested C bindings
- You don't need TreeHaver's grammar discovery
Choose tree_stump directly when:
- You only target MRI Ruby
- You prefer Rust-based native extensions
- You want precompiled binaries without system dependencies
- You don't need TreeHaver's grammar discovery
- Note: Use pboling's fork (tree_haver branch) until PRs #5, #7, #11, #13 are merged
Choose citrus directly when:
- You need zero native dependencies (pure Ruby)
- You're using a Citrus grammar (not tree-sitter grammars)
- Performance is less critical than portability
- You don't need TreeHaver's unified API
💡 Info you can shake a stick at
| Tokens to Remember | |
|---|---|
| Works with JRuby | |
| Works with Truffle Ruby | |
| Works with MRI Ruby 3 | |
| Support & Community | |
| Source | |
| Documentation | |
| Compliance | |
| Style | |
| Maintainer 🎖️ | |
... 💖 |
Compatibility
Compatible with MRI Ruby 3.2.0+, and concordant releases of JRuby, and TruffleRuby.
| 🚚 Amazing test matrix was brought to you by | 🔎 appraisal2 🔎 and the color 💚 green 💚 |
|---|---|
| 👟 Check it out! | ✨ github.com/appraisal-rb/appraisal2 ✨ |
Federated DVCS
Find this repo on federated forges (Coming soon!)
| Federated [DVCS][💎d-in-dvcs] Repository | Status | Issues | PRs | Wiki | CI | Discussions | | ----------------------------------------------- | --------------------------------------------------------------------- | ------------------------- | ------------------------ | ------------------------- | ------------------------ | ---------------------------- | | 🧪 [kettle-rb/tree_haver on GitLab][📜src-gl] | The Truth | [💚][🤝gl-issues] | [💚][🤝gl-pulls] | [💚][📜gl-wiki] | 🐭 Tiny Matrix | ➖ | | 🧊 [kettle-rb/tree_haver on CodeBerg][📜src-cb] | An Ethical Mirror ([Donate][🤝cb-donate]) | [💚][🤝cb-issues] | [💚][🤝cb-pulls] | ➖ | ⭕️ No Matrix | ➖ | | 🐙 [kettle-rb/tree_haver on GitHub][📜src-gh] | Another Mirror | [💚][🤝gh-issues] | [💚][🤝gh-pulls] | [💚][📜gh-wiki] | 💯 Full Matrix | [💚][gh-discussions] | | 🎮️ [Discord Server][✉️discord-invite] | [![Live Chat on Discord][✉️discord-invite-img-ftb]][✉️discord-invite] | [Let's][✉️discord-invite] | [talk][✉️discord-invite] | [about][✉️discord-invite] | [this][✉️discord-invite] | [library!][✉️discord-invite] |Enterprise Support 
Available as part of the Tidelift Subscription.
Need enterprise-level guarantees?
The maintainers of this and thousands of other packages are working with Tidelift to deliver commercial support and maintenance for the open source packages you use to build your applications. Save time, reduce risk, and improve code health, while paying the maintainers of the exact packages you use. [![Get help from me on Tidelift][🏙️entsup-tidelift-img]][🏙️entsup-tidelift] - 💡Subscribe for support guarantees covering _all_ your FLOSS dependencies - 💡Tidelift is part of [Sonar][🏙️entsup-tidelift-sonar] - 💡Tidelift pays maintainers to maintain the software you depend on!📊`@`Pointy Haired Boss: An [enterprise support][🏙️entsup-tidelift] subscription is "[never gonna let you down][🧮kloc]", and _supports_ open source maintainers Alternatively: - [![Live Chat on Discord][✉️discord-invite-img-ftb]][✉️discord-invite] - [![Get help from me on Upwork][👨🏼🏫expsup-upwork-img]][👨🏼🏫expsup-upwork] - [![Get help from me on Codementor][👨🏼🏫expsup-codementor-img]][👨🏼🏫expsup-codementor]
✨ Installation
Install the gem and add to the application's Gemfile by executing:
bundle add tree_haver
If bundler is not being used to manage dependencies, install the gem by executing:
gem install tree_haver
🔒 Secure Installation
For Medium or High Security Installations
This gem is cryptographically signed, and has verifiable [SHA-256 and SHA-512][💎SHA_checksums] checksums by [stone_checksums][💎stone_checksums]. Be sure the gem you install hasn’t been tampered with by following the instructions below. Add my public key (if you haven’t already, expires 2045-04-29) as a trusted certificate: ```console gem cert --add <(curl -Ls https://raw.github.com/galtzo-floss/certs/main/pboling.pem) ``` You only need to do that once. Then proceed to install with: ```console gem install tree_haver -P HighSecurity ``` The `HighSecurity` trust profile will verify signed gems, and not allow the installation of unsigned dependencies. If you want to up your security game full-time: ```console bundle config set --global trust-policy MediumSecurity ``` `MediumSecurity` instead of `HighSecurity` is necessary if not all the gems you use are signed. NOTE: Be prepared to track down certs for signed gems and add them the same way you added mine.⚙️ Configuration
Available Backends
TreeHaver supports 10 parsing backends, each with different trade-offs. The auto backend automatically selects the best available option.
Tree-sitter Backends (Universal Parsing)
| Backend | Description | Performance | Portability | Examples |
|---|---|---|---|---|
| Auto | Auto-selects best backend | Varies | ✅ Universal | JSON · JSONC · Bash · TOML |
| MRI | C extension via ruby_tree_sitter | ⚡ Fastest | MRI only | JSON · JSONC · ~~Bash~~* · TOML |
| Rust | Precompiled via tree_stump | ⚡ Very Fast | ✅ Good | JSON · JSONC · ~~Bash~~* · TOML |
| FFI | Dynamic linking via FFI | 🔵 Fast | ✅ Universal | JSON · JSONC · Bash · TOML |
| Java | JNI bindings | ⚡ Very Fast | JRuby only | JSON · JSONC · Bash · TOML |
Language-Specific Backends (Native Parser Integration)
| Backend | Description | Performance | Portability | Examples |
|---|---|---|---|---|
| Prism | Ruby's official parser | ⚡ Very Fast | ✅ Universal | Ruby |
| Psych | Ruby's YAML parser (stdlib) | ⚡ Very Fast | ✅ Universal | YAML |
| Commonmarker | Markdown via comrak (Rust) | ⚡ Very Fast | ✅ Good | Markdown · Merge |
| Markly | GFM via cmark-gfm (C) | ⚡ Very Fast | ✅ Good | Markdown · Merge |
| Citrus | Pure Ruby parsing | 🟡 Slower | ✅ Universal | TOML · Finitio · Dhall |
Selection Priority (Auto mode): MRI → Rust → FFI → Java → Prism → Psych → Commonmarker → Markly → Citrus
Known Issues:
- *MRI + Bash: ABI incompatibility (use FFI instead)
- *Rust + Bash: Version mismatch (use FFI instead)
Backend Requirements:
# Tree-sitter backends
gem "ruby_tree_sitter", "~> 2.0" # MRI backend
gem "tree_stump" # Rust backend
gem "ffi", ">= 1.15", "< 2.0" # FFI backend
# Java backend: no gem required (uses JRuby's built-in JNI)
# Language-specific backends
gem "prism", "~> 1.0" # Ruby parsing (stdlib in Ruby 3.4+)
# Psych: no gem required (Ruby stdlib)
gem "commonmarker", ">= 0.23" # Markdown parsing (comrak)
gem "markly", "~> 0.11" # GFM parsing (cmark-gfm)
# Pure Ruby fallback
gem "citrus", "~> 3.0" # Citrus backend
# Plus grammar gems: toml-rb, dhall, finitio, etc.
Force Specific Backend:
# Tree-sitter backends
TreeHaver.backend = :mri # Force MRI backend (ruby_tree_sitter)
TreeHaver.backend = :rust # Force Rust backend (tree_stump)
TreeHaver.backend = :ffi # Force FFI backend
TreeHaver.backend = :java # Force Java backend (JRuby only)
# Language-specific backends
TreeHaver.backend = :prism # Force Prism (Ruby parsing)
TreeHaver.backend = :psych # Force Psych (YAML parsing)
TreeHaver.backend = :commonmarker # Force Commonmarker (Markdown)
TreeHaver.backend = :markly # Force Markly (GFM Markdown)
# Pure Ruby fallback
TreeHaver.backend = :citrus # Force Citrus backend
# Auto-selection (default)
TreeHaver.backend = :auto # Let TreeHaver choose
Block-based Backend Switching:
Use with_backend to temporarily switch backends for a specific block of code.
This is thread-safe and supports nesting—the previous backend is automatically
restored when the block exits (even if an exception is raised).
# Temporarily use a specific backend
TreeHaver.with_backend(:mri) do
parser = TreeHaver::Parser.new
tree = parser.parse(source)
# All operations in this block use the MRI backend
end
# Backend is restored to its previous value here
# Nested blocks work correctly
TreeHaver.with_backend(:rust) do
# Uses :rust
TreeHaver.with_backend(:citrus) do
# Uses :citrus
parser = TreeHaver::Parser.new
end
# Back to :rust
end
# Back to original backend
This is particularly useful for:
- Testing: Test the same code with different backends
- Performance comparison: Benchmark different backends
- Fallback scenarios: Try one backend, fall back to another
- Thread isolation: Each thread can use a different backend safely
# Example: Testing with multiple backends
[:mri, :rust, :citrus].each do |backend_name|
TreeHaver.with_backend(backend_name) do
parser = TreeHaver::Parser.new
result = parser.parse(source)
puts "#{backend_name}: #{result.root_node.type}"
end
end
Check Backend Capabilities:
TreeHaver.backend # => :ffi
TreeHaver.backend_module # => TreeHaver::Backends::FFI
TreeHaver.capabilities # => { backend: :ffi, parse: true, query: false, ... }
See examples/ directory for 26 complete working examples demonstrating all 10 backends with multiple languages (JSON, JSONC, Bash, TOML, Ruby, YAML, Markdown) plus markdown-merge integration examples.
Security Considerations
⚠️ Loading shared libraries (.so/.dylib/.dll) executes arbitrary native code.
TreeHaver provides defense-in-depth validations, but you should understand the risks:
Attack Vectors Mitigated
TreeHaver's PathValidator module protects against:
- Path traversal: Paths containing
/../or/./are rejected - Null byte injection: Paths containing null bytes are rejected
- Non-absolute paths: Relative paths are rejected to prevent CWD-based attacks
- Invalid extensions: Only
.so,.dylib, and.dllfiles are accepted - Malicious filenames: Filenames must match a safe pattern (alphanumeric, hyphens, underscores)
- Invalid language names: Language names must be lowercase alphanumeric with underscores
- Invalid symbol names: Symbol names must be valid C identifiers
Secure Usage
# Standard usage - paths from ENV are validated
finder = TreeHaver::GrammarFinder.new(:toml)
path = finder.find_library_path # Validates ENV path before returning
# Maximum security - only trusted system directories
path = finder.find_library_path_safe # Ignores ENV, only /usr/lib etc.
# Manual validation
if TreeHaver::PathValidator.safe_library_path?(user_provided_path)
language = TreeHaver::Language.from_library(user_provided_path)
end
# Get validation errors for debugging
errors = TreeHaver::PathValidator.validation_errors(path)
# => ["Path is not absolute", "Path contains traversal sequence"]
Trusted Directories
The find_library_path_safe method only returns paths in trusted directories.
Default trusted directories:
/usr/lib,/usr/lib64/usr/lib/x86_64-linux-gnu,/usr/lib/aarch64-linux-gnu/usr/local/lib/opt/homebrew/lib,/opt/local/lib
Adding custom trusted directories:
For non-standard installations (Homebrew on Linux, luarocks, mise, asdf, etc.), register additional trusted directories:
# Programmatically at application startup
TreeHaver::PathValidator.add_trusted_directory("/home/linuxbrew/.linuxbrew/Cellar")
TreeHaver::PathValidator.add_trusted_directory("~/.local/share/mise/installs/lua")
# Or via environment variable (comma-separated, in your shell profile)
export TREE_HAVER_TRUSTED_DIRS = "/home/linuxbrew/.linuxbrew/Cellar,~/.local/share/mise/installs/lua"
Example: Fedora Silverblue with Homebrew and luarocks
# In ~/.bashrc or ~/.zshrc
export TREE_HAVER_TRUSTED_DIRS="/home/linuxbrew/.linuxbrew/Cellar,~/.local/share/mise/installs/lua"
# tree-sitter runtime library
export TREE_SITTER_RUNTIME_LIB=/home/linuxbrew/.linuxbrew/Cellar/tree-sitter/0.26.3/lib/libtree-sitter.so
# Language grammar (luarocks-installed)
export TREE_SITTER_TOML_PATH=~/.local/share/mise/installs/lua/5.4.8/luarocks/lib/luarocks/rocks-5.4/tree-sitter-toml/0.0.31-1/parser/toml.so
Recommendations
- Production: Consider using
find_library_path_safeto ignore ENV overrides - Development: Standard
find_library_pathis convenient for testing - User Input: Always validate paths before passing to
Language.from_library - CI/CD: Be cautious of ENV vars that could be set by untrusted sources
- Custom installs: Register trusted directories via
TREE_HAVER_TRUSTED_DIRSoradd_trusted_directory
Backend Selection
TreeHaver automatically selects the best backend for your Ruby implementation, but you can override this behavior:
# Automatic backend selection (default)
TreeHaver.backend = :auto
# Force a specific backend
TreeHaver.backend = :mri # Use ruby_tree_sitter (MRI only, C extension)
TreeHaver.backend = :rust # Use tree_stump (MRI, Rust extension with precompiled binaries)
# Note: Requires pboling's fork until PRs #5, #7, #11, #13 are merged
# See: https://github.com/pboling/tree_stump/tree/tree_haver
TreeHaver.backend = :ffi # Use FFI bindings (works on MRI and JRuby)
TreeHaver.backend = :java # Use Java bindings (JRuby only, coming soon)
TreeHaver.backend = :citrus # Use Citrus pure Ruby parser
# NOTE: Portable, all Ruby implementations
# CAVEAT: few major language grammars, but many esoteric grammars
Auto-selection priority on MRI: MRI → Rust → FFI → Citrus
You can also set the backend via environment variable:
export TREE_HAVER_BACKEND=rust
Environment Variables
TreeHaver recognizes several environment variables for configuration:
Note: All path-based environment variables are validated before use. Invalid paths are ignored.
Security Configuration
TREE_HAVER_TRUSTED_DIRS: Comma-separated list of additional trusted directories for grammar libraries
# For Homebrew on Linux and luarocks
export TREE_HAVER_TRUSTED_DIRS="/home/linuxbrew/.linuxbrew/Cellar,~/.local/share/mise/installs/lua"
Tilde (~) is expanded to the user's home directory. Directories listed here are considered safe for find_library_path_safe.
Core Runtime Library
TREE_SITTER_RUNTIME_LIB: Absolute path to the corelibtree-sittershared librarybash export TREE_SITTER_RUNTIME_LIB=/usr/local/lib/libtree-sitter.so
If not set, TreeHaver tries these names in order:
tree-sitterlibtree-sitter.so.0libtree-sitter.solibtree-sitter.dyliblibtree-sitter.dll
Language Symbol Resolution
When loading a language grammar, if you don't specify the symbol: parameter, TreeHaver resolves it in this precedence:
TREE_SITTER_LANG_SYMBOL: Explicit symbol override- Guessed from filename (e.g.,
libtree-sitter-toml.so→tree_sitter_toml) - Default fallback (
tree_sitter_toml)
export TREE_SITTER_LANG_SYMBOL=tree_sitter_toml
Language Library Paths
For specific languages, you can set environment variables to point to grammar libraries:
export TREE_SITTER_TOML_PATH=/usr/local/lib/libtree-sitter-toml.so
export TREE_SITTER_JSON_PATH=/usr/local/lib/libtree-sitter-json.so
JRuby-Specific: Java Backend JARs
For the Java backend on JRuby:
export TREE_SITTER_JAVA_JARS_DIR=/path/to/java-tree-sitter/jars
Language Registration
Register languages once at application startup for convenient access:
# Register a TOML grammar
TreeHaver.register_language(
:toml,
path: "/usr/local/lib/libtree-sitter-toml.so",
symbol: "tree_sitter_toml", # optional, will be inferred if omitted
)
# Now you can use the convenient helper
language = TreeHaver::Language.toml
# Or still override path/symbol per-call
language = TreeHaver::Language.toml(
path: "/custom/path/libtree-sitter-toml.so",
)
Grammar Discovery with GrammarFinder
For libraries that need to automatically locate tree-sitter grammars (like the *-merge family of gems), TreeHaver provides the GrammarFinder utility class. It handles platform-aware grammar discovery without requiring language-specific code in TreeHaver itself.
# Create a finder for any language
finder = TreeHaver::GrammarFinder.new(:toml)
# Check if the grammar is available
if finder.available?
puts "TOML grammar found at: #{finder.find_library_path}"
else
puts finder.
# => "tree-sitter toml grammar not found. Searched: /usr/lib/libtree-sitter-toml.so, ..."
end
# Register the language if available
finder.register! if finder.available?
# Now use the registered language
language = TreeHaver::Language.toml
GrammarFinder Automatic Derivation
Given just the language name, GrammarFinder automatically derives:
| Property | Derived Value (for :toml) |
|---|---|
| ENV var | TREE_SITTER_TOML_PATH |
| Library filename | libtree-sitter-toml.so (Linux) or .dylib (macOS) |
| Symbol name | tree_sitter_toml |
Search Order
GrammarFinder searches for grammars in this order:
- Environment variable:
TREE_SITTER_<LANG>_PATH(highest priority) - Extra paths: Custom paths provided at initialization
- System paths: Common installation directories (
/usr/lib,/usr/local/lib,/opt/homebrew/lib, etc.)
Usage in *-merge Gems
The GrammarFinder pattern enables clean integration in language-specific merge gems:
# In toml-merge
finder = TreeHaver::GrammarFinder.new(:toml)
finder.register! if finder.available?
# In json-merge
finder = TreeHaver::GrammarFinder.new(:json)
finder.register! if finder.available?
# In bash-merge
finder = TreeHaver::GrammarFinder.new(:bash)
finder.register! if finder.available?
Each gem uses the same API—only the language name changes.
Adding Custom Search Paths
For non-standard installations, provide extra search paths:
finder = TreeHaver::GrammarFinder.new(:toml, extra_paths: [
"/opt/custom/lib",
"/home/user/.local/lib",
])
Debug Information
Get detailed information about the grammar search:
finder = TreeHaver::GrammarFinder.new(:toml)
puts finder.search_info
# => {
# language: :toml,
# env_var: "TREE_SITTER_TOML_PATH",
# env_value: nil,
# symbol: "tree_sitter_toml",
# library_filename: "libtree-sitter-toml.so",
# search_paths: ["/usr/lib/libtree-sitter-toml.so", ...],
# found_path: "/usr/lib/libtree-sitter-toml.so",
# available: true
# }
Checking Capabilities
Different backends may support different features:
TreeHaver.capabilities
# => { backend: :mri, query: true, bytes_field: true }
# or
# => { backend: :ffi, parse: true, query: false, bytes_field: true }
# or
# => { backend: :citrus, parse: true, query: false, bytes_field: false }
Compatibility Mode
For codebases migrating from ruby_tree_sitter, TreeHaver provides a compatibility shim:
require "tree_haver/compat"
# Now TreeSitter constants map to TreeHaver
parser = TreeSitter::Parser.new # Actually creates TreeHaver::Parser
This is safe and idempotent—if the real TreeSitter module is already loaded, the shim does nothing.
⚠️ Important: Exception Hierarchy
Both ruby_tree_sitter v2+ and TreeHaver exceptions inherit from Exception (not StandardError).
This design decision follows ruby_tree_sitter's lead for thread-safety and signal handling reasons. See ruby_tree_sitter PR #83 for the rationale.
What this means for exception handling:
# ⚠️ This will NOT catch TreeHaver errors
begin
TreeHaver::Language.from_library("/nonexistent.so")
rescue => e
puts "Caught!" # Never reached - TreeHaver::Error inherits Exception
end
# ✅ Explicit rescue is required
begin
TreeHaver::Language.from_library("/nonexistent.so")
rescue TreeHaver::Error => e
puts "Caught!" # This works
end
# ✅ Or rescue specific exceptions
begin
TreeHaver::Language.from_library("/nonexistent.so")
rescue TreeHaver::NotAvailable => e
puts "Grammar not available: #{e.message}"
end
TreeHaver Exception Hierarchy:
Exception
Compatibility Mode Behavior:
The compat mode (require "tree_haver/compat") creates aliases but does not change the exception hierarchy:
require "tree_haver/compat"
# TreeSitter constants are now aliases to TreeHaver
TreeSitter::Error # => TreeHaver::Error (still inherits Exception)
TreeSitter::Parser # => TreeHaver::Parser
TreeSitter::Language # => TreeHaver::Language
# Exception handling remains the same
begin
TreeSitter::Language.load("missing", "/nonexistent.so")
rescue TreeSitter::Error => e # Still requires explicit rescue
puts "Error: #{e.message}"
end
Best Practices:
Always use explicit rescue for TreeHaver errors:
begin finder = TreeHaver::GrammarFinder.new(:toml) finder.register! if finder.available? language = TreeHaver::Language.toml rescue TreeHaver::NotAvailable => e warn("TOML grammar not available: #{e.message}") # Fallback to another backend or fail gracefully endNever rely on
rescue => eto catch TreeHaver errors (it won't work)
Why inherit from Exception?
Following ruby_tree_sitter's reasoning:
- Thread safety: Prevents accidental catching in thread cleanup code
- Signal handling: Ensures parsing errors don't interfere with SIGTERM/SIGINT
- Intentional handling: Forces developers to explicitly handle parsing errors
See lib/tree_haver/compat.rb for compatibility layer documentation.
🔧 Basic Usage
Quick Start
TreeHaver works with any language through its 10 backends. Here are examples for different parsing needs:
Parsing with Tree-sitter (Universal Languages)
require "tree_haver"
# Load a tree-sitter grammar (works with MRI, Rust, FFI, or Java backend)
language = TreeHaver::Language.from_library(
"/usr/local/lib/libtree-sitter-toml.so",
symbol: "tree_sitter_toml",
)
# Create a parser
parser = TreeHaver::Parser.new
parser.language = language
# Parse source code
source = "[package]\nname = \"my-app\"\nversion = \"1.0.0\"\n"
tree = parser.parse(source)
# Access the unified Position API (works across all backends)
root = tree.root_node
puts "Root type: #{root.type}" # => "document"
puts "Start line: #{root.start_line}" # => 1 (1-based)
puts "End line: #{root.end_line}" # => 3
puts "Position: #{root.source_position}" # => {start_line: 1, end_line: 3, ...}
# Traverse the tree
root.each do |child|
puts "Child: #{child.type} at line #{child.start_line}"
end
Parsing Ruby with Prism
require "tree_haver"
TreeHaver.backend = :prism
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Backends::Prism::Language.ruby
source = "class Example\n def hello\n puts \"Hello, world!\"\n end\nend\n"
tree = parser.parse(source)
root = tree.root_node
# Find all method definitions
def find_methods(node, results = [])
results << node if node.type == "def_node"
node.children.each { |child| find_methods(child, results) }
results
end
methods = find_methods(root)
methods.each do |method_node|
pos = method_node.source_position
puts "Method at lines #{pos[:start_line]}-#{pos[:end_line]}"
end
Parsing YAML with Psych
require "tree_haver"
TreeHaver.backend = :psych
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Backends::Psych::Language.yaml
source = "database:\n host: localhost\n port: 5432\n"
tree = parser.parse(source)
root = tree.root_node
# Navigate YAML structure
def show_structure(node, indent = 0)
prefix = " " * indent
puts "#{prefix}#{node.type} (line #{node.start_line})"
node.children.each { |child| show_structure(child, indent + 1) }
end
show_structure(root)
Parsing Markdown with Commonmarker or Markly
require "tree_haver"
# Choose your backend
TreeHaver.backend = :commonmarker # or :markly for GFM
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Backends::Commonmarker::Language.markdown
source = "# My Document\n\n## Section\n\n- Item 1\n- Item 2\n"
tree = parser.parse(source)
root = tree.root_node
# Find all headings
def find_headings(node, results = [])
results << node if node.type == "heading"
node.children.each { |child| find_headings(child, results) }
results
end
headings = find_headings(root)
headings.each do |heading|
level = heading.header_level
text = heading.children.map(&:text).join
puts "H#{level}: #{text} (line #{heading.start_line})"
end
Using Language Registration
For cleaner code, register languages at startup:
# At application initialization
TreeHaver.register_language(
:toml,
path: "/usr/local/lib/libtree-sitter-toml.so",
)
TreeHaver.register_language(
:json,
path: "/usr/local/lib/libtree-sitter-json.so",
)
# Later in your code
toml_language = TreeHaver::Language.toml
json_language = TreeHaver::Language.json
parser = TreeHaver::Parser.new
parser.language = toml_language
tree = parser.parse(toml_source)
Flexible Language Names
The name parameter in register_language is an arbitrary identifier you choose—it doesn't
need to match the actual language name. The actual grammar identity comes from the path
and symbol parameters (for tree-sitter) or grammar_module (for Citrus).
This flexibility is useful for:
- Aliasing: Register the same grammar under multiple names
- Versioning: Register different grammar versions (e.g.,
:ruby_2,:ruby_3) - Testing: Use unique names to avoid collisions between tests
- Context-specific naming: Use names that make sense for your application
# Register the same TOML grammar under different names for different purposes
TreeHaver.register_language(
:config_parser, # Custom name for your app
path: "/usr/local/lib/libtree-sitter-toml.so",
symbol: "tree_sitter_toml",
)
TreeHaver.register_language(
:toml_v1, # Version-specific name
path: "/usr/local/lib/libtree-sitter-toml.so",
symbol: "tree_sitter_toml",
)
# Use your custom names
config_lang = TreeHaver::Language.config_parser
versioned_lang = TreeHaver::Language.toml_v1
Parsing Different Languages
TreeHaver works with any tree-sitter grammar:
# Parse Ruby code
ruby_lang = TreeHaver::Language.from_library(
"/path/to/libtree-sitter-ruby.so",
)
parser = TreeHaver::Parser.new
parser.language = ruby_lang
tree = parser.parse("class Foo; end")
# Parse JavaScript
js_lang = TreeHaver::Language.from_library(
"/path/to/libtree-sitter-javascript.so",
)
parser.language = js_lang # Reuse the same parser
tree = parser.parse("const x = 42;")
Walking the AST
TreeHaver provides simple node traversal:
tree = parser.parse(source)
root = tree.root_node
# Recursive tree walk
def walk_tree(node, depth = 0)
puts "#{" " * depth}#{node.type}"
node.each { |child| walk_tree(child, depth + 1) }
end
walk_tree(root)
Incremental Parsing
TreeHaver supports incremental parsing when using the MRI or Rust backends. This is a major performance optimization for editors and IDEs that need to re-parse on every keystroke.
# Check if current backend supports incremental parsing
if TreeHaver.capabilities[:incremental]
puts "Incremental parsing is available!"
end
# Initial parse
parser = TreeHaver::Parser.new
parser.language = language
tree = parser.parse_string(nil, "x = 1")
# User edits the source: "x = 1" -> "x = 42"
# Mark the tree as edited (tell tree-sitter what changed)
tree.edit(
start_byte: 4, # edit starts at byte 4
old_end_byte: 5, # old text "1" ended at byte 5
new_end_byte: 6, # new text "42" ends at byte 6
start_point: {row: 0, column: 4},
old_end_point: {row: 0, column: 5},
new_end_point: {row: 0, column: 6},
)
# Re-parse incrementally - tree-sitter reuses unchanged nodes
new_tree = parser.parse_string(tree, "x = 42")
Note: Incremental parsing requires the MRI (ruby_tree_sitter), Rust (tree_stump), or Java (java-tree-sitter) backend. The FFI and Citrus backends do not currently support incremental parsing. You can check support with:
Note: tree_stump requires pboling's fork (tree_haver branch) until PRs #5, #7, #11, #13 are merged.
tree.supports_editing? # => true if edit() is available
Error Handling
begin
language = TreeHaver::Language.from_library("/path/to/grammar.so")
rescue TreeHaver::NotAvailable => e
puts "Failed to load grammar: #{e.message}"
end
# Check if a backend is available
if TreeHaver.backend_module.nil?
puts "No TreeHaver backend is available!"
puts "Install ruby_tree_sitter (MRI), ffi gem with libtree-sitter, or citrus gem"
end
Platform-Specific Examples
MRI Ruby
On MRI, TreeHaver uses ruby_tree_sitter by default:
# Gemfile
gem "tree_haver"
gem "ruby_tree_sitter" # MRI backend
# Code - no changes needed, TreeHaver auto-selects MRI backend
parser = TreeHaver::Parser.new
JRuby
On JRuby, TreeHaver can use the FFI backend, Java backend, or Citrus backend:
Option 1: FFI Backend (recommended for tree-sitter grammars)
# Gemfile
gem "tree_haver"
gem "ffi" # Required for FFI backend
# Ensure libtree-sitter is installed on your system
# On macOS with Homebrew:
# brew install tree-sitter
# On Ubuntu/Debian:
# sudo apt-get install libtree-sitter0 libtree-sitter-dev
# Code - TreeHaver auto-selects FFI backend on JRuby
parser = TreeHaver::Parser.new
Option 2: Java Backend (native JVM performance)
# 1. Download java-tree-sitter JAR from Maven Central
mkdir -p vendor/jars
curl -fSL -o vendor/jars/jtreesitter-0.23.2.jar \
"https://repo1.maven.org/maven2/io/github/tree-sitter/jtreesitter/0.23.2/jtreesitter-0.23.2.jar"
# 2. Set environment variables
export CLASSPATH="$(pwd)/vendor/jars:$CLASSPATH"
export LD_LIBRARY_PATH="/path/to/libtree-sitter/lib:$LD_LIBRARY_PATH"
# 3. Run with JRuby (requires Java 22+ for Foreign Function API)
JAVA_OPTS="--enable-native-access=ALL-UNNAMED" jruby your_script.rb
# Force Java backend
TreeHaver.backend = :java
# Check if Java backend is available
if TreeHaver::Backends::Java.available?
puts "Java backend is ready!"
puts TreeHaver.capabilities
# => { backend: :java, parse: true, query: true, bytes_field: true, incremental: true }
end
⚠️ Java Backend Limitation: Symbol Resolution
The Java backend uses Java's Foreign Function & Memory (FFM) API which loads libraries in isolation. Unlike the system's dynamic linker (dlopen), FFM's SymbolLookup.or() chains symbol lookups but doesn't resolve dynamic library dependencies.
This means grammar .so files with unresolved references to libtree-sitter.so symbols won't load correctly. Most grammars from luarocks, npm, or other sources have these dependencies.
Recommended approach for JRuby: Use the FFI backend:
# On JRuby, use FFI backend (recommended)
TreeHaver.backend = :ffi
The FFI backend uses Ruby's FFI gem which relies on the system's dynamic linker, correctly resolving symbol dependencies between libtree-sitter.so and grammar libraries.
The Java backend will work with:
- Grammar JARs built specifically for java-tree-sitter (self-contained)
- Grammar
.sofiles that statically link tree-sitter
Option 3: Citrus Backend (pure Ruby, portable)
# Gemfile
gem "tree_haver"
gem "citrus" # Pure Ruby parser, zero native dependencies
# Code - Force Citrus backend for maximum portability
TreeHaver.backend = :citrus
# Check if Citrus backend is available
if TreeHaver::Backends::Citrus.available?
puts "Citrus backend is ready!"
puts TreeHaver.capabilities
# => { backend: :citrus, parse: true, query: false, bytes_field: false }
end
⚠️ Citrus Backend Limitations:
- Uses Citrus grammars (not tree-sitter grammars)
- No incremental parsing support
- No query API
- Pure Ruby performance (slower than native backends)
- Best for: prototyping, environments without native extension support, teaching
TruffleRuby
TruffleRuby can use the MRI, FFI, or Citrus backend:
# Use FFI backend (recommended for tree-sitter grammars)
TreeHaver.backend = :ffi
# Or try MRI backend if ruby_tree_sitter compiles on your TruffleRuby version
TreeHaver.backend = :mri
# Or use Citrus backend for zero native dependencies
TreeHaver.backend = :citrus
Advanced: Thread-Safe Backend Switching
TreeHaver provides with_backend for thread-safe, temporary backend switching. This is
essential for testing, benchmarking, and applications that need different backends in
different contexts.
Testing with Multiple Backends
Test the same code path with different backends using with_backend:
# In your test setup
RSpec.describe("MyParser") do
# Test with each available backend
[:mri, :rust, :citrus].each do |backend_name|
context "with #{backend_name} backend" do
it "parses correctly" do
TreeHaver.with_backend(backend_name) do
parser = TreeHaver::Parser.new
result = parser.parse("x = 42")
expect(result.root_node.type).to(eq("document"))
end
# Backend automatically restored after block
end
end
end
end
Thread Isolation
Each thread can use a different backend safely—with_backend uses thread-local storage:
threads = []
threads << Thread.new do
TreeHaver.with_backend(:mri) do
# This thread uses MRI backend
parser = TreeHaver::Parser.new
100.times { parser.parse("x = 1") }
end
end
threads << Thread.new do
TreeHaver.with_backend(:citrus) do
# This thread uses Citrus backend simultaneously
parser = TreeHaver::Parser.new
100.times { parser.parse("x = 1") }
end
end
threads.each(&:join)
Nested Blocks
with_backend supports nesting—inner blocks override outer blocks:
TreeHaver.with_backend(:rust) do
puts TreeHaver.effective_backend # => :rust
TreeHaver.with_backend(:citrus) do
puts TreeHaver.effective_backend # => :citrus
end
puts TreeHaver.effective_backend # => :rust (restored)
end
Fallback Pattern
Try one backend, fall back to another on failure:
def parse_with_fallback(source)
TreeHaver.with_backend(:mri) do
TreeHaver::Parser.new.tap { |p| p.language = load_language }.parse(source)
end
rescue TreeHaver::NotAvailable
# Fall back to Citrus if MRI backend unavailable
TreeHaver.with_backend(:citrus) do
TreeHaver::Parser.new.tap { |p| p.language = load_language }.parse(source)
end
end
Complete Real-World Example
Here's a practical example that extracts package names from a TOML file:
require "tree_haver"
# Setup
TreeHaver.register_language(
:toml,
path: "/usr/local/lib/libtree-sitter-toml.so",
)
def extract_package_name(toml_content)
# Create parser
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Language.toml
# Parse
tree = parser.parse(toml_content)
root = tree.root_node
# Find [package] table
root.each do |child|
next unless child.type == "table"
child.each do |table_elem|
if table_elem.type == "pair"
# Look for name = "..." pair
key = table_elem.each.first&.type
# In a real implementation, you'd extract the text value
# This is simplified for demonstration
end
end
end
end
# Usage
toml = "[package]\nname = \"awesome-app\"\nversion = \"2.0.0\"\n"
package_name = extract_package_name(toml)
🦷 FLOSS Funding
While kettle-rb tools are free software and will always be, the project would benefit immensely from some funding. Raising a monthly budget of... "dollars" would make the project more sustainable.
We welcome both individual and corporate sponsors! We also offer a wide array of funding channels to account for your preferences (although currently Open Collective is our preferred funding platform).
If you're working in a company that's making significant use of kettle-rb tools we'd appreciate it if you suggest to your company to become a kettle-rb sponsor.
You can support the development of kettle-rb tools via GitHub Sponsors, Liberapay, PayPal, Open Collective and Tidelift.
| 📍 NOTE |
|---|
| If doing a sponsorship in the form of donation is problematic for your company from an accounting standpoint, we'd recommend the use of Tidelift, where you can get a support-like subscription instead. |
Open Collective for Individuals
Support us with a monthly donation and help us continue our activities. [Become a backer]
NOTE: kettle-readme-backers updates this list every day, automatically.
No backers yet. Be the first!
Open Collective for Organizations
Become a sponsor and get your logo on our README on GitHub with a link to your site. [Become a sponsor]
NOTE: kettle-readme-backers updates this list every day, automatically.
No sponsors yet. Be the first!
Another way to support open-source
I’m driven by a passion to foster a thriving open-source community – a space where people can tackle complex problems, no matter how small. Revitalizing libraries that have fallen into disrepair, and building new libraries focused on solving real-world challenges, are my passions. I was recently affected by layoffs, and the tech jobs market is unwelcoming. I’m reaching out here because your support would significantly aid my efforts to provide for my family, and my farm (11 🐔 chickens, 2 🐶 dogs, 3 🐰 rabbits, 8 🐈 cats).
If you work at a company that uses my work, please encourage them to support me as a corporate sponsor. My work on gems you use might show up in bundle fund.
I’m developing a new library, floss_funding, designed to empower open-source developers like myself to get paid for the work we do, in a sustainable way. Please give it a look.
Floss-Funding.dev: 👉️ No network calls. 👉️ No tracking. 👉️ No oversight. 👉️ Minimal crypto hashing. 💡 Easily disabled nags
🔐 Security
See SECURITY.md.
🤝 Contributing
If you need some ideas of where to help, you could work on adding more code coverage, or if it is already 💯 (see below) check reek, issues, or PRs, or use the gem and think about how it could be better.
We so if you make changes, remember to update it.
See CONTRIBUTING.md for more detailed instructions.
🚀 Release Instructions
See CONTRIBUTING.md.
Code Coverage
🪇 Code of Conduct
Everyone interacting with this project's codebases, issue trackers,
chat rooms and mailing lists agrees to follow the .
🌈 Contributors
Made with contributors-img.
Also see GitLab Contributors: https://gitlab.com/kettle-rb/tree_haver/-/graphs/main
📌 Versioning
This Library adheres to .
Violations of this scheme should be reported as bugs.
Specifically, if a minor or patch version is released that breaks backward compatibility,
a new version should be immediately released that restores compatibility.
Breaking changes to the public API will only be introduced with new major versions.
dropping support for a platform is both obviously and objectively a breaking change
—Jordan Harband (@ljharb, maintainer of SemVer) in SemVer issue 716
I understand that policy doesn't work universally ("exceptions to every rule!"), but it is the policy here. As such, in many cases it is good to specify a dependency on this library using the Pessimistic Version Constraint with two digits of precision.
For example:
spec.add_dependency("tree_haver", "~> 1.0")
📌 Is "Platform Support" part of the public API? More details inside.
SemVer should, IMO, but doesn't explicitly, say that dropping support for specific Platforms is a _breaking change_ to an API, and for that reason the bike shedding is endless. To get a better understanding of how SemVer is intended to work over a project's lifetime, read this article from the creator of SemVer: - ["Major Version Numbers are Not Sacred"][📌major-versions-not-sacred]See CHANGELOG.md for a list of releases.
📄 License
The gem is available as open source under the terms of
the MIT License .
See LICENSE.txt for the official Copyright Notice.
© Copyright
-
Copyright (c) 2025 Peter H. Boling, of
Galtzo.com
, and tree_haver contributors.
🤑 A request for help
Maintainers have teeth and need to pay their dentists. After getting laid off in an RIF in March, and encountering difficulty finding a new one, I began spending most of my time building open source tools. I'm hoping to be able to pay for my kids' health insurance this month, so if you value the work I am doing, I need your support. Please consider sponsoring me or the project.
To join the community or get help 👇️ Join the Discord.
To say "thanks!" ☝️ Join the Discord or 👇️ send money.
Please give the project a star ⭐ ♥.
Thanks for RTFM. ☺️