Module: TreeHaver
- Defined in:
- lib/tree_haver.rb,
lib/tree_haver/node.rb,
lib/tree_haver/tree.rb,
lib/tree_haver/point.rb,
lib/tree_haver/parser.rb,
lib/tree_haver/version.rb,
lib/tree_haver/language.rb,
lib/tree_haver/backends/ffi.rb,
lib/tree_haver/backends/mri.rb,
lib/tree_haver/backends/java.rb,
lib/tree_haver/backends/rust.rb,
lib/tree_haver/backends/prism.rb,
lib/tree_haver/backends/psych.rb,
lib/tree_haver/grammar_finder.rb,
lib/tree_haver/path_validator.rb,
lib/tree_haver/backends/citrus.rb,
lib/tree_haver/backends/markly.rb,
lib/tree_haver/language_registry.rb,
lib/tree_haver/library_path_utils.rb,
lib/tree_haver/backends/commonmarker.rb,
lib/tree_haver/citrus_grammar_finder.rb,
lib/tree_haver/rspec/dependency_tags.rb
Overview
TreeHaver is a cross-Ruby adapter for code parsing with 10 backends.
Provides a unified API for parsing source code across MRI Ruby, JRuby, and TruffleRuby using tree-sitter grammars or language-specific native parsers.
Backends
Supports 9 backends:
-
Tree-sitter: MRI ©, Rust, FFI, Java
-
Native parsers: Prism (Ruby), Psych (YAML), Commonmarker (Markdown), Markly (GFM)
-
Pure Ruby: Citrus (portable fallback)
Platform Compatibility
Not all backends work on all Ruby platforms:
| Backend | MRI | JRuby | TruffleRuby |
|--------------|-----|-------|-------------|
| MRI (C ext) |
-
JRuby: Cannot load native C/Rust extensions; use FFI, Java, or pure Ruby backends
-
TruffleRuby: FFI doesn’t support STRUCT_BY_VALUE; magnus/rb-sys incompatible with C API; use Prism, Psych, Citrus, or potentially Commonmarker/Markly
Defined Under Namespace
Modules: Backends, LanguageRegistry, LibraryPathUtils, PathValidator, RSpec, Version Classes: BackendConflict, CitrusGrammarFinder, Error, GrammarFinder, Language, Node, NotAvailable, Parser, Point, Tree
Constant Summary collapse
- CITRUS_DEFAULTS =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Default Citrus configurations for known languages
These are used by parser_for when no explicit citrus_config is provided and tree-sitter backends are not available (e.g., on TruffleRuby).
{ toml: { gem_name: "toml-rb", grammar_const: "TomlRB::Document", require_path: "toml-rb", }, }.freeze
- NATIVE_BACKENDS =
Native tree-sitter backends that support loading shared libraries (.so files) These backends wrap the tree-sitter C library via various bindings. Pure Ruby backends (Citrus, Prism, Psych, Commonmarker, Markly) are excluded.
i[mri rust ffi java].freeze
- VERSION =
Traditional location for VERSION constant
Version::VERSION
Class Method Summary collapse
- .backend ⇒ Object
-
.backend=(name) ⇒ Symbol?
Set the backend to use.
-
.backend_module ⇒ Module?
Determine the concrete backend module to use.
-
.backend_protect ⇒ Object
Alias for backend_protect?.
-
.backend_protect=(value) ⇒ Boolean
Whether backend conflict protection is enabled.
-
.backend_protect? ⇒ Boolean
Check if backend conflict protection is enabled.
-
.backends_used ⇒ Set<Symbol>
Track which backends have been used in this process.
-
.capabilities ⇒ Hash{Symbol => Object}
Get capabilities of the current backend.
-
.check_backend_conflict!(backend) ⇒ void
Check if using a backend would cause a conflict.
-
.conflicting_backends_for(backend) ⇒ Array<Symbol>
Check if a backend would conflict with previously used backends.
-
.current_backend_context ⇒ Hash{Symbol => Object}
Thread-local backend context storage.
-
.effective_backend ⇒ Symbol
Get the effective backend for current context.
-
.parser_for(language_name, library_path: nil, symbol: nil, citrus_config: nil) ⇒ TreeHaver::Parser
Create a parser configured for a specific language.
-
.record_backend_usage(backend) ⇒ void
private
Record that a backend has been used.
-
.register_language(name, path: nil, symbol: nil, grammar_module: nil, gem_name: nil) ⇒ void
Register a language helper by name (backend-agnostic).
-
.registered_language(name) ⇒ Hash?
private
Fetch a registered language entry.
-
.reset_backend!(to: :auto) ⇒ void
Reset backend selection memoization.
-
.resolve_backend_module(explicit_backend = nil) ⇒ Module?
Get backend module for a specific backend (with explicit override).
-
.resolve_effective_backend(explicit_backend = nil) ⇒ Symbol
Resolve the effective backend considering explicit override.
-
.resolve_native_backend_module(explicit_backend = nil) ⇒ Module?
Resolve a native tree-sitter backend module (for from_library).
-
.with_backend(name) { ... } ⇒ Object
Execute a block with a specific backend in thread-local context.
Class Method Details
.backend ⇒ Object
352 353 354 355 356 357 358 359 360 361 362 363 364 365 |
# File 'lib/tree_haver.rb', line 352 def backend @backend ||= case (ENV["TREE_HAVER_BACKEND"] || :auto).to_s # rubocop:disable ThreadSafety/ClassInstanceVariable when "mri" then :mri when "rust" then :rust when "ffi" then :ffi when "java" then :java when "citrus" then :citrus when "prism" then :prism when "psych" then :psych when "commonmarker" then :commonmarker when "markly" then :markly else :auto end end |
.backend=(name) ⇒ Symbol?
Set the backend to use
375 376 377 |
# File 'lib/tree_haver.rb', line 375 def backend=(name) @backend = name&.to_sym # rubocop:disable ThreadSafety/ClassInstanceVariable end |
.backend_module ⇒ Module?
Determine the concrete backend module to use
This method performs backend auto-selection when backend is :auto. On JRuby, prefers Java backend if available, then FFI, then Citrus. On MRI, prefers MRI backend if available, then Rust, then FFI, then Citrus. Citrus is the final fallback as it’s pure Ruby and works everywhere.
666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 |
# File 'lib/tree_haver.rb', line 666 def backend_module case effective_backend # Changed from: backend when :mri Backends::MRI when :rust Backends::Rust when :ffi Backends::FFI when :java Backends::Java when :citrus Backends::Citrus when :prism Backends::Prism when :psych Backends::Psych when :commonmarker Backends::Commonmarker when :markly Backends::Markly else # auto-select: prefer native/fast backends, fall back to pure Ruby (Citrus) if defined?(RUBY_ENGINE) && RUBY_ENGINE == "jruby" && Backends::Java.available? Backends::Java elsif defined?(RUBY_ENGINE) && RUBY_ENGINE == "ruby" && Backends::MRI.available? Backends::MRI elsif defined?(RUBY_ENGINE) && RUBY_ENGINE == "ruby" && Backends::Rust.available? Backends::Rust elsif Backends::FFI.available? Backends::FFI elsif Backends::Citrus.available? Backends::Citrus # Pure Ruby fallback else # No backend available nil end end end |
.backend_protect ⇒ Object
Alias for backend_protect?
304 305 306 |
# File 'lib/tree_haver.rb', line 304 def backend_protect backend_protect? end |
.backend_protect=(value) ⇒ Boolean
Whether backend conflict protection is enabled
When true (default), TreeHaver will raise BackendConflict if you try to use a backend that is known to conflict with a previously used backend. For example, FFI will not work after MRI has been used.
Set to false to disable protection (useful for testing compatibility).
290 291 292 293 |
# File 'lib/tree_haver.rb', line 290 def backend_protect=(value) @backend_protect_mutex ||= Mutex.new @backend_protect_mutex.synchronize { @backend_protect = value } end |
.backend_protect? ⇒ Boolean
Check if backend conflict protection is enabled
298 299 300 301 |
# File 'lib/tree_haver.rb', line 298 def backend_protect? return @backend_protect if defined?(@backend_protect) # rubocop:disable ThreadSafety/ClassInstanceVariable true # Default is protected end |
.backends_used ⇒ Set<Symbol>
Track which backends have been used in this process
311 312 313 |
# File 'lib/tree_haver.rb', line 311 def backends_used @backends_used ||= Set.new # rubocop:disable ThreadSafety/ClassInstanceVariable end |
.capabilities ⇒ Hash{Symbol => Object}
Get capabilities of the current backend
Returns a hash describing what features the selected backend supports. Common keys include:
-
:backend - Symbol identifying the backend (:mri, :rust, :ffi, :java)
-
:parse - Whether parsing is implemented
-
:query - Whether the Query API is available
-
:bytes_field - Whether byte position fields are available
-
:incremental - Whether incremental parsing is supported
719 720 721 722 723 |
# File 'lib/tree_haver.rb', line 719 def capabilities mod = backend_module return {} unless mod mod.capabilities end |
.check_backend_conflict!(backend) ⇒ void
This method returns an undefined value.
Check if using a backend would cause a conflict
338 339 340 341 342 343 344 345 346 347 348 |
# File 'lib/tree_haver.rb', line 338 def check_backend_conflict!(backend) return unless backend_protect? conflicts = conflicting_backends_for(backend) return if conflicts.empty? raise BackendConflict, "Cannot use #{backend} backend: it is blocked by previously used backend(s): #{conflicts.join(", ")}. " \ "The #{backend} backend will segfault when #{conflicts.first} has already loaded. " \ "To disable this protection (at risk of segfaults), set TreeHaver.backend_protect = false" end |
.conflicting_backends_for(backend) ⇒ Array<Symbol>
Check if a backend would conflict with previously used backends
328 329 330 331 |
# File 'lib/tree_haver.rb', line 328 def conflicting_backends_for(backend) blockers = Backends::BLOCKED_BY[backend] || [] blockers & backends_used.to_a end |
.current_backend_context ⇒ Hash{Symbol => Object}
Thread-local backend context storage
Returns a hash containing the thread-local backend context with keys:
-
:backend - The backend name (Symbol) or nil if using global default
-
:depth - The nesting depth (Integer) for proper cleanup
404 405 406 407 408 409 |
# File 'lib/tree_haver.rb', line 404 def current_backend_context Thread.current[:tree_haver_backend_context] ||= { backend: nil, # nil means "use global default" depth: 0, # Track nesting depth for proper cleanup } end |
.effective_backend ⇒ Symbol
Get the effective backend for current context
Priority: thread-local context → global @backend → :auto
422 423 424 425 |
# File 'lib/tree_haver.rb', line 422 def effective_backend ctx = current_backend_context ctx[:backend] || backend || :auto end |
.parser_for(language_name, library_path: nil, symbol: nil, citrus_config: nil) ⇒ TreeHaver::Parser
Create a parser configured for a specific language
Respects the effective backend setting (via TREE_HAVER_BACKEND env var, TreeHaver.backend=, or with_backend block).
854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 |
# File 'lib/tree_haver.rb', line 854 def parser_for(language_name, library_path: nil, symbol: nil, citrus_config: nil) name = language_name.to_sym symbol ||= "tree_sitter_#{name}" requested = effective_backend # Determine which backends to try based on effective_backend try_tree_sitter = (requested == :auto) || NATIVE_BACKENDS.include?(requested) try_citrus = (requested == :auto) || (requested == :citrus) language = nil # Try tree-sitter if applicable if try_tree_sitter && !language language = load_tree_sitter_language(name, library_path: library_path, symbol: symbol) end # Try Citrus if applicable if try_citrus && !language language = load_citrus_language(name, citrus_config: citrus_config) end # Raise if nothing worked raise NotAvailable, "No parser available for #{name}. " \ "Install tree-sitter-#{name} or configure a Citrus grammar." unless language # Create and configure parser parser = Parser.new parser.language = language parser end |
.record_backend_usage(backend) ⇒ void
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
This method returns an undefined value.
Record that a backend has been used
320 321 322 |
# File 'lib/tree_haver.rb', line 320 def record_backend_usage(backend) backends_used << backend end |
.register_language(name, path: nil, symbol: nil, grammar_module: nil, gem_name: nil) ⇒ void
This method returns an undefined value.
Register a language helper by name (backend-agnostic)
After registration, you can use dynamic helpers like TreeHaver::Language.toml to load the registered language. TreeHaver will automatically use the appropriate grammar based on the active backend.
The name parameter is an arbitrary identifier you choose - it doesn’t need to match the actual language name. This is useful for:
-
Testing: Use unique names like
:toml_testto avoid collisions -
Aliasing: Register the same grammar under multiple names
-
Versioning: Register different grammar versions as
:ruby_2and:ruby_3
The actual grammar identity comes from path/symbol (tree-sitter) or grammar_module (Citrus), not from the name.
IMPORTANT: This method INTENTIONALLY allows registering BOTH a tree-sitter library AND a Citrus grammar for the same language IN A SINGLE CALL. This is achieved by using separate if statements (not elsif) and no early returns. This design is deliberate and provides significant benefits:
Why register both backends for one language?
-
Backend flexibility: Code works regardless of which backend is active
-
Performance testing: Compare tree-sitter vs Citrus performance
-
Gradual migration: Transition between backends without breaking code
-
Fallback scenarios: Use Citrus when tree-sitter library unavailable
-
Platform portability: tree-sitter on Linux/Mac, Citrus on JRuby/Windows
The active backend determines which registration is used automatically. No code changes needed to switch backends - just change TreeHaver.backend.
798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 |
# File 'lib/tree_haver.rb', line 798 def register_language(name, path: nil, symbol: nil, grammar_module: nil, gem_name: nil) # Register tree-sitter backend if path provided # Note: Uses `if` not `elsif` so both backends can be registered in one call if path LanguageRegistry.register(name, :tree_sitter, path: path, symbol: symbol) end # Register Citrus backend if grammar_module provided # Note: Uses `if` not `elsif` so both backends can be registered in one call # This allows maximum flexibility - register once, use with any backend if grammar_module unless grammar_module.respond_to?(:parse) raise ArgumentError, "Grammar module must respond to :parse" end LanguageRegistry.register(name, :citrus, grammar_module: grammar_module, gem_name: gem_name) end # Require at least one backend to be registered if path.nil? && grammar_module.nil? raise ArgumentError, "Must provide at least one of: path (tree-sitter) or grammar_module (Citrus)" end # Note: No early return! This method intentionally processes both `if` blocks # above to allow registering multiple backends for the same language. # Both tree-sitter and Citrus can be registered simultaneously for maximum # flexibility. See method documentation for rationale. nil end |
.registered_language(name) ⇒ Hash?
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Fetch a registered language entry
833 834 835 |
# File 'lib/tree_haver.rb', line 833 def registered_language(name) LanguageRegistry.registered(name) end |
.reset_backend!(to: :auto) ⇒ void
This method returns an undefined value.
Reset backend selection memoization
Primarily useful in tests to switch backends without cross-example leakage.
389 390 391 |
# File 'lib/tree_haver.rb', line 389 def reset_backend!(to: :auto) @backend = to&.to_sym # rubocop:disable ThreadSafety/ClassInstanceVariable end |
.resolve_backend_module(explicit_backend = nil) ⇒ Module?
Get backend module for a specific backend (with explicit override)
544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 |
# File 'lib/tree_haver.rb', line 544 def resolve_backend_module(explicit_backend = nil) # Temporarily override effective backend requested = resolve_effective_backend(explicit_backend) mod = case requested when :mri Backends::MRI when :rust Backends::Rust when :ffi Backends::FFI when :java Backends::Java when :citrus Backends::Citrus when :prism Backends::Prism when :psych Backends::Psych when :commonmarker Backends::Commonmarker when :markly Backends::Markly when :auto backend_module # Fall back to normal resolution for :auto else # Unknown backend name - return nil to trigger error in caller nil end # Return nil if the module doesn't exist return unless mod # Check for backend conflicts FIRST, before checking availability # This is critical because the conflict causes the backend to report unavailable # We want to raise a clear error explaining WHY it's unavailable # Use the requested backend name directly (not capabilities) because # capabilities may be empty when the backend is blocked/unavailable check_backend_conflict!(requested) if requested && requested != :auto # Now check if the backend is available # Why assume modules without available? are available? # - Some backends might be mocked in tests without an available? method # - This makes the code more defensive and test-friendly # - It allows graceful degradation if a backend module is incomplete # - Backward compatibility: if a module doesn't declare availability, assume it works return if mod.respond_to?(:available?) && !mod.available? # Record that this backend is being used record_backend_usage(requested) if requested && requested != :auto mod end |
.resolve_effective_backend(explicit_backend = nil) ⇒ Symbol
Resolve the effective backend considering explicit override
Priority: explicit > thread context > global > :auto
531 532 533 534 |
# File 'lib/tree_haver.rb', line 531 def resolve_effective_backend(explicit_backend = nil) return explicit_backend.to_sym if explicit_backend effective_backend end |
.resolve_native_backend_module(explicit_backend = nil) ⇒ Module?
Resolve a native tree-sitter backend module (for from_library)
This method is similar to resolve_backend_module but ONLY considers backends that support loading shared libraries (.so files):
-
MRI (ruby_tree_sitter C extension)
-
Rust (tree_stump)
-
FFI (ffi gem with libtree-sitter)
-
Java (jtreesitter on JRuby)
Pure Ruby backends (Citrus, Prism, Psych, Commonmarker, Markly) are NOT considered because they don’t support from_library.
613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 |
# File 'lib/tree_haver.rb', line 613 def resolve_native_backend_module(explicit_backend = nil) # Short-circuit on TruffleRuby: no native backends work # - MRI: C extension, MRI only # - Rust: magnus requires MRI's C API # - FFI: STRUCT_BY_VALUE not supported # - Java: requires JRuby's Java interop if defined?(RUBY_ENGINE) && RUBY_ENGINE == "truffleruby" return unless explicit_backend # Auto-select: no backends available # If explicit backend requested, let it fail with proper error below end # Get the effective backend (considers thread-local and global settings) requested = resolve_effective_backend(explicit_backend) # If the effective backend is a native backend, use it if NATIVE_BACKENDS.include?(requested) return resolve_backend_module(requested) end # If a specific non-native backend was explicitly requested, return nil # (from_library only works with native backends that load .so files) return if explicit_backend # If effective backend is :auto, auto-select from native backends in priority order # Note: non-native backends set via with_backend are NOT used here because # from_library only works with native backends native_priority = if defined?(RUBY_ENGINE) && RUBY_ENGINE == "jruby" i[java ffi] # JRuby: Java first, then FFI else i[mri rust ffi] # MRI: MRI first, then Rust, then FFI end native_priority.each do |backend| mod = resolve_backend_module(backend) return mod if mod end nil # No native backend available end |
.with_backend(name) { ... } ⇒ Object
Execute a block with a specific backend in thread-local context
This method provides temporary, thread-safe backend switching for a block of code. The backend setting is automatically restored when the block exits, even if an exception is raised. Supports nesting—inner blocks override outer blocks, and each level is properly unwound.
Thread Safety: Each thread maintains its own backend context, so concurrent threads can safely use different backends without interfering with each other.
Use Cases:
-
Testing: Test the same code path with different backends
-
Performance comparison: Benchmark parsing with different backends
-
Fallback scenarios: Try one backend, fall back to another on failure
-
Thread isolation: Different threads can use different backends safely
495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 |
# File 'lib/tree_haver.rb', line 495 def with_backend(name) raise ArgumentError, "Backend name required" if name.nil? # Get context FIRST to ensure it exists ctx = current_backend_context old_backend = ctx[:backend] old_depth = ctx[:depth] begin # Set new backend and increment depth ctx[:backend] = name.to_sym ctx[:depth] += 1 # Execute block yield ensure # Restore previous backend and depth # This ensures proper unwinding even with exceptions ctx[:backend] = old_backend ctx[:depth] = old_depth end end |