Module: TreeHaver
- Defined in:
- lib/tree_haver.rb,
lib/tree_haver/node.rb,
lib/tree_haver/tree.rb,
lib/tree_haver/version.rb,
lib/tree_haver/backends/ffi.rb,
lib/tree_haver/backends/mri.rb,
lib/tree_haver/backends/java.rb,
lib/tree_haver/backends/rust.rb,
lib/tree_haver/backends/prism.rb,
lib/tree_haver/backends/psych.rb,
lib/tree_haver/grammar_finder.rb,
lib/tree_haver/path_validator.rb,
lib/tree_haver/backends/citrus.rb,
lib/tree_haver/backends/markly.rb,
lib/tree_haver/language_registry.rb,
lib/tree_haver/backends/commonmarker.rb,
lib/tree_haver/citrus_grammar_finder.rb
Overview
TreeHaver is a cross-Ruby adapter for code parsing with 10 backends.
Provides a unified API for parsing source code across MRI Ruby, JRuby, and TruffleRuby using tree-sitter grammars or language-specific native parsers.
Supports 10 backends:
-
Tree-sitter: MRI ©, Rust, FFI, Java
-
Native parsers: Prism (Ruby), Psych (YAML), Commonmarker (Markdown), Markly (GFM)
-
Pure Ruby: Citrus (portable fallback)
Defined Under Namespace
Modules: Backends, LanguageRegistry, PathValidator, Version Classes: BackendConflict, CitrusGrammarFinder, Error, GrammarFinder, Language, Node, NotAvailable, Parser, Point, Tree
Constant Summary collapse
- VERSION =
Traditional location for VERSION constant
Version::VERSION
Class Method Summary collapse
- .backend ⇒ Object
-
.backend=(name) ⇒ Symbol?
Set the backend to use.
-
.backend_module ⇒ Module?
Determine the concrete backend module to use.
-
.backend_protect ⇒ Object
Alias for backend_protect?.
-
.backend_protect=(value) ⇒ Boolean
Whether backend conflict protection is enabled.
-
.backend_protect? ⇒ Boolean
Check if backend conflict protection is enabled.
-
.backends_used ⇒ Set<Symbol>
Track which backends have been used in this process.
-
.capabilities ⇒ Hash{Symbol => Object}
Get capabilities of the current backend.
-
.check_backend_conflict!(backend) ⇒ void
Check if using a backend would cause a conflict.
-
.conflicting_backends_for(backend) ⇒ Array<Symbol>
Check if a backend would conflict with previously used backends.
-
.current_backend_context ⇒ Hash{Symbol => Object}
Thread-local backend context storage.
-
.effective_backend ⇒ Symbol
Get the effective backend for current context.
-
.record_backend_usage(backend) ⇒ void
private
Record that a backend has been used.
-
.register_language(name, path: nil, symbol: nil, grammar_module: nil, gem_name: nil) ⇒ void
Register a language helper by name (backend-agnostic).
-
.registered_language(name) ⇒ Hash?
private
Fetch a registered language entry.
-
.reset_backend!(to: :auto) ⇒ void
Reset backend selection memoization.
-
.resolve_backend_module(explicit_backend = nil) ⇒ Module?
Get backend module for a specific backend (with explicit override).
-
.resolve_effective_backend(explicit_backend = nil) ⇒ Symbol
Resolve the effective backend considering explicit override.
-
.with_backend(name) { ... } ⇒ Object
Execute a block with a specific backend in thread-local context.
Class Method Details
.backend ⇒ Object
299 300 301 302 303 304 305 306 307 308 309 310 311 312 |
# File 'lib/tree_haver.rb', line 299 def backend @backend ||= case (ENV["TREE_HAVER_BACKEND"] || :auto).to_s # rubocop:disable ThreadSafety/ClassInstanceVariable when "mri" then :mri when "rust" then :rust when "ffi" then :ffi when "java" then :java when "citrus" then :citrus when "prism" then :prism when "psych" then :psych when "commonmarker" then :commonmarker when "markly" then :markly else :auto end end |
.backend=(name) ⇒ Symbol?
Set the backend to use
322 323 324 |
# File 'lib/tree_haver.rb', line 322 def backend=(name) @backend = name&.to_sym # rubocop:disable ThreadSafety/ClassInstanceVariable end |
.backend_module ⇒ Module?
Determine the concrete backend module to use
This method performs backend auto-selection when backend is :auto. On JRuby, prefers Java backend if available, then FFI, then Citrus. On MRI, prefers MRI backend if available, then Rust, then FFI, then Citrus. Citrus is the final fallback as it’s pure Ruby and works everywhere.
558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 |
# File 'lib/tree_haver.rb', line 558 def backend_module case effective_backend # Changed from: backend when :mri Backends::MRI when :rust Backends::Rust when :ffi Backends::FFI when :java Backends::Java when :citrus Backends::Citrus when :prism Backends::Prism when :psych Backends::Psych when :commonmarker Backends::Commonmarker when :markly Backends::Markly else # auto-select: prefer native/fast backends, fall back to pure Ruby (Citrus) if defined?(RUBY_ENGINE) && RUBY_ENGINE == "jruby" && Backends::Java.available? Backends::Java elsif defined?(RUBY_ENGINE) && RUBY_ENGINE == "ruby" && Backends::MRI.available? Backends::MRI elsif defined?(RUBY_ENGINE) && RUBY_ENGINE == "ruby" && Backends::Rust.available? Backends::Rust elsif Backends::FFI.available? Backends::FFI elsif Backends::Citrus.available? Backends::Citrus # Pure Ruby fallback else # No backend available nil end end end |
.backend_protect ⇒ Object
Alias for backend_protect?
251 252 253 |
# File 'lib/tree_haver.rb', line 251 def backend_protect backend_protect? end |
.backend_protect=(value) ⇒ Boolean
Whether backend conflict protection is enabled
When true (default), TreeHaver will raise BackendConflict if you try to use a backend that is known to conflict with a previously used backend. For example, FFI will not work after MRI has been used.
Set to false to disable protection (useful for testing compatibility).
237 238 239 240 |
# File 'lib/tree_haver.rb', line 237 def backend_protect=(value) @backend_protect_mutex ||= Mutex.new @backend_protect_mutex.synchronize { @backend_protect = value } end |
.backend_protect? ⇒ Boolean
Check if backend conflict protection is enabled
245 246 247 248 |
# File 'lib/tree_haver.rb', line 245 def backend_protect? return @backend_protect if defined?(@backend_protect) # rubocop:disable ThreadSafety/ClassInstanceVariable true # Default is protected end |
.backends_used ⇒ Set<Symbol>
Track which backends have been used in this process
258 259 260 |
# File 'lib/tree_haver.rb', line 258 def backends_used @backends_used ||= Set.new # rubocop:disable ThreadSafety/ClassInstanceVariable end |
.capabilities ⇒ Hash{Symbol => Object}
Get capabilities of the current backend
Returns a hash describing what features the selected backend supports. Common keys include:
-
:backend - Symbol identifying the backend (:mri, :rust, :ffi, :java)
-
:parse - Whether parsing is implemented
-
:query - Whether the Query API is available
-
:bytes_field - Whether byte position fields are available
-
:incremental - Whether incremental parsing is supported
611 612 613 614 615 |
# File 'lib/tree_haver.rb', line 611 def capabilities mod = backend_module return {} unless mod mod.capabilities end |
.check_backend_conflict!(backend) ⇒ void
This method returns an undefined value.
Check if using a backend would cause a conflict
285 286 287 288 289 290 291 292 293 294 295 |
# File 'lib/tree_haver.rb', line 285 def check_backend_conflict!(backend) return unless backend_protect? conflicts = conflicting_backends_for(backend) return if conflicts.empty? raise BackendConflict, "Cannot use #{backend} backend: it is blocked by previously used backend(s): #{conflicts.join(", ")}. " \ "The #{backend} backend will segfault when #{conflicts.first} has already loaded. " \ "To disable this protection (at risk of segfaults), set TreeHaver.backend_protect = false" end |
.conflicting_backends_for(backend) ⇒ Array<Symbol>
Check if a backend would conflict with previously used backends
275 276 277 278 |
# File 'lib/tree_haver.rb', line 275 def conflicting_backends_for(backend) blockers = Backends::BLOCKED_BY[backend] || [] blockers & backends_used.to_a end |
.current_backend_context ⇒ Hash{Symbol => Object}
Thread-local backend context storage
Returns a hash containing the thread-local backend context with keys:
-
:backend - The backend name (Symbol) or nil if using global default
-
:depth - The nesting depth (Integer) for proper cleanup
351 352 353 354 355 356 |
# File 'lib/tree_haver.rb', line 351 def current_backend_context Thread.current[:tree_haver_backend_context] ||= { backend: nil, # nil means "use global default" depth: 0, # Track nesting depth for proper cleanup } end |
.effective_backend ⇒ Symbol
Get the effective backend for current context
Priority: thread-local context → global @backend → :auto
369 370 371 372 |
# File 'lib/tree_haver.rb', line 369 def effective_backend ctx = current_backend_context ctx[:backend] || backend || :auto end |
.record_backend_usage(backend) ⇒ void
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
This method returns an undefined value.
Record that a backend has been used
267 268 269 |
# File 'lib/tree_haver.rb', line 267 def record_backend_usage(backend) backends_used << backend end |
.register_language(name, path: nil, symbol: nil, grammar_module: nil, gem_name: nil) ⇒ void
This method returns an undefined value.
Register a language helper by name (backend-agnostic)
After registration, you can use dynamic helpers like ‘TreeHaver::Language.toml` to load the registered language. TreeHaver will automatically use the appropriate grammar based on the active backend.
The ‘name` parameter is an arbitrary identifier you choose - it doesn’t need to match the actual language name. This is useful for:
-
Testing: Use unique names like ‘:toml_test` to avoid collisions
-
Aliasing: Register the same grammar under multiple names
-
Versioning: Register different grammar versions as ‘:ruby_2` and `:ruby_3`
The actual grammar identity comes from ‘path`/`symbol` (tree-sitter) or `grammar_module` (Citrus), not from the name.
IMPORTANT: This method INTENTIONALLY allows registering BOTH a tree-sitter library AND a Citrus grammar for the same language IN A SINGLE CALL. This is achieved by using separate ‘if` statements (not `elsif`) and no early returns. This design is deliberate and provides significant benefits:
Why register both backends for one language?
-
Backend flexibility: Code works regardless of which backend is active
-
Performance testing: Compare tree-sitter vs Citrus performance
-
Gradual migration: Transition between backends without breaking code
-
Fallback scenarios: Use Citrus when tree-sitter library unavailable
-
Platform portability: tree-sitter on Linux/Mac, Citrus on JRuby/Windows
The active backend determines which registration is used automatically. No code changes needed to switch backends - just change TreeHaver.backend.
690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 |
# File 'lib/tree_haver.rb', line 690 def register_language(name, path: nil, symbol: nil, grammar_module: nil, gem_name: nil) # Register tree-sitter backend if path provided # Note: Uses `if` not `elsif` so both backends can be registered in one call if path LanguageRegistry.register(name, :tree_sitter, path: path, symbol: symbol) end # Register Citrus backend if grammar_module provided # Note: Uses `if` not `elsif` so both backends can be registered in one call # This allows maximum flexibility - register once, use with any backend if grammar_module unless grammar_module.respond_to?(:parse) raise ArgumentError, "Grammar module must respond to :parse" end LanguageRegistry.register(name, :citrus, grammar_module: grammar_module, gem_name: gem_name) end # Require at least one backend to be registered if path.nil? && grammar_module.nil? raise ArgumentError, "Must provide at least one of: path (tree-sitter) or grammar_module (Citrus)" end # Note: No early return! This method intentionally processes both `if` blocks # above to allow registering multiple backends for the same language. # Both tree-sitter and Citrus can be registered simultaneously for maximum # flexibility. See method documentation for rationale. nil end |
.registered_language(name) ⇒ Hash?
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Fetch a registered language entry
725 726 727 |
# File 'lib/tree_haver.rb', line 725 def registered_language(name) LanguageRegistry.registered(name) end |
.reset_backend!(to: :auto) ⇒ void
This method returns an undefined value.
Reset backend selection memoization
Primarily useful in tests to switch backends without cross-example leakage.
336 337 338 |
# File 'lib/tree_haver.rb', line 336 def reset_backend!(to: :auto) @backend = to&.to_sym # rubocop:disable ThreadSafety/ClassInstanceVariable end |
.resolve_backend_module(explicit_backend = nil) ⇒ Module?
Get backend module for a specific backend (with explicit override)
491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 |
# File 'lib/tree_haver.rb', line 491 def resolve_backend_module(explicit_backend = nil) # Temporarily override effective backend requested = resolve_effective_backend(explicit_backend) mod = case requested when :mri Backends::MRI when :rust Backends::Rust when :ffi Backends::FFI when :java Backends::Java when :citrus Backends::Citrus when :prism Backends::Prism when :psych Backends::Psych when :commonmarker Backends::Commonmarker when :markly Backends::Markly when :auto backend_module # Fall back to normal resolution for :auto else # Unknown backend name - return nil to trigger error in caller nil end # Return nil if the module doesn't exist return unless mod # Check for backend conflicts FIRST, before checking availability # This is critical because the conflict causes the backend to report unavailable # We want to raise a clear error explaining WHY it's unavailable # Use the requested backend name directly (not capabilities) because # capabilities may be empty when the backend is blocked/unavailable check_backend_conflict!(requested) if requested && requested != :auto # Now check if the backend is available # Why assume modules without available? are available? # - Some backends might be mocked in tests without an available? method # - This makes the code more defensive and test-friendly # - It allows graceful degradation if a backend module is incomplete # - Backward compatibility: if a module doesn't declare availability, assume it works return if mod.respond_to?(:available?) && !mod.available? # Record that this backend is being used record_backend_usage(requested) if requested && requested != :auto mod end |
.resolve_effective_backend(explicit_backend = nil) ⇒ Symbol
Resolve the effective backend considering explicit override
Priority: explicit > thread context > global > :auto
478 479 480 481 |
# File 'lib/tree_haver.rb', line 478 def resolve_effective_backend(explicit_backend = nil) return explicit_backend.to_sym if explicit_backend effective_backend end |
.with_backend(name) { ... } ⇒ Object
Execute a block with a specific backend in thread-local context
This method provides temporary, thread-safe backend switching for a block of code. The backend setting is automatically restored when the block exits, even if an exception is raised. Supports nesting—inner blocks override outer blocks, and each level is properly unwound.
Thread Safety: Each thread maintains its own backend context, so concurrent threads can safely use different backends without interfering with each other.
Use Cases:
-
Testing: Test the same code path with different backends
-
Performance comparison: Benchmark parsing with different backends
-
Fallback scenarios: Try one backend, fall back to another on failure
-
Thread isolation: Different threads can use different backends safely
442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 |
# File 'lib/tree_haver.rb', line 442 def with_backend(name) raise ArgumentError, "Backend name required" if name.nil? # Get context FIRST to ensure it exists ctx = current_backend_context old_backend = ctx[:backend] old_depth = ctx[:depth] begin # Set new backend and increment depth ctx[:backend] = name.to_sym ctx[:depth] += 1 # Execute block yield ensure # Restore previous backend and depth # This ensures proper unwinding even with exceptions ctx[:backend] = old_backend ctx[:depth] = old_depth end end |