Module: TreeHaver
- Defined in:
- lib/tree_haver.rb,
lib/tree_haver/node.rb,
lib/tree_haver/tree.rb,
lib/tree_haver/point.rb,
lib/tree_haver/parser.rb,
lib/tree_haver/version.rb,
lib/tree_haver/language.rb,
lib/tree_haver/backends/ffi.rb,
lib/tree_haver/backends/mri.rb,
lib/tree_haver/backends/java.rb,
lib/tree_haver/backends/rust.rb,
lib/tree_haver/backends/prism.rb,
lib/tree_haver/backends/psych.rb,
lib/tree_haver/grammar_finder.rb,
lib/tree_haver/path_validator.rb,
lib/tree_haver/backends/citrus.rb,
lib/tree_haver/backends/markly.rb,
lib/tree_haver/language_registry.rb,
lib/tree_haver/library_path_utils.rb,
lib/tree_haver/backends/commonmarker.rb,
lib/tree_haver/citrus_grammar_finder.rb,
lib/tree_haver/rspec/dependency_tags.rb
Overview
TreeHaver is a cross-Ruby adapter for code parsing with 10 backends.
Provides a unified API for parsing source code across MRI Ruby, JRuby, and TruffleRuby using tree-sitter grammars or language-specific native parsers.
Backends
Supports 10 backends:
-
Tree-sitter: MRI ©, Rust, FFI, Java
-
Native parsers: Prism (Ruby), Psych (YAML), Commonmarker (Markdown), Markly (GFM)
-
Pure Ruby: Citrus (portable fallback)
Platform Compatibility
Not all backends work on all Ruby platforms:
| Backend | MRI | JRuby | TruffleRuby |
|--------------|-----|-------|-------------|
| MRI (C ext) | ✓ | ✗ | ✗ |
| Rust | ✓ | ✗ | ✗ |
| FFI | ✓ | ✓ | ✗ |
| Java | ✗ | ✓ | ✗ |
| Prism | ✓ | ✓ | ✓ |
| Psych | ✓ | ✓ | ✓ |
| Citrus | ✓ | ✓ | ✓ |
| Commonmarker | ✓ | ✗ | ? |
| Markly | ✓ | ✗ | ? |
-
JRuby: Cannot load native C/Rust extensions; use FFI, Java, or pure Ruby backends
-
TruffleRuby: FFI doesn’t support STRUCT_BY_VALUE; magnus/rb-sys incompatible with C API; use Prism, Psych, Citrus, or potentially Commonmarker/Markly
Defined Under Namespace
Modules: Backends, LanguageRegistry, LibraryPathUtils, PathValidator, RSpec, Version Classes: BackendConflict, CitrusGrammarFinder, Error, GrammarFinder, Language, Node, NotAvailable, Parser, Point, Tree
Constant Summary collapse
- CITRUS_DEFAULTS =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Default Citrus configurations for known languages
These are used by parser_for when no explicit citrus_config is provided and tree-sitter backends are not available (e.g., on TruffleRuby).
{ toml: { gem_name: "toml-rb", grammar_const: "TomlRB::Document", require_path: "toml-rb", }, }.freeze
- NATIVE_BACKENDS =
Native tree-sitter backends that support loading shared libraries (.so files) These backends wrap the tree-sitter C library via various bindings. Pure Ruby backends (Citrus, Prism, Psych, Commonmarker, Markly) are excluded.
%i[mri rust ffi java].freeze
- VERSION =
Traditional location for VERSION constant
Version::VERSION
Class Method Summary collapse
- .backend ⇒ Object
-
.backend=(name) ⇒ Symbol?
Set the backend to use.
-
.backend_module ⇒ Module?
Determine the concrete backend module to use.
-
.backend_protect ⇒ Object
Alias for backend_protect?.
-
.backend_protect=(value) ⇒ Boolean
Whether backend conflict protection is enabled.
-
.backend_protect? ⇒ Boolean
Check if backend conflict protection is enabled.
-
.backends_used ⇒ Set<Symbol>
Track which backends have been used in this process.
-
.capabilities ⇒ Hash{Symbol => Object}
Get capabilities of the current backend.
-
.check_backend_conflict!(backend) ⇒ void
Check if using a backend would cause a conflict.
-
.conflicting_backends_for(backend) ⇒ Array<Symbol>
Check if a backend would conflict with previously used backends.
-
.current_backend_context ⇒ Hash{Symbol => Object}
Thread-local backend context storage.
-
.effective_backend ⇒ Symbol
Get the effective backend for current context.
-
.parser_for(language_name, library_path: nil, symbol: nil, citrus_config: nil) ⇒ TreeHaver::Parser
Create a parser configured for a specific language.
-
.record_backend_usage(backend) ⇒ void
private
Record that a backend has been used.
-
.register_language(name, path: nil, symbol: nil, grammar_module: nil, gem_name: nil) ⇒ void
Register a language helper by name (backend-agnostic).
-
.registered_language(name) ⇒ Hash?
private
Fetch a registered language entry.
-
.reset_backend!(to: :auto) ⇒ void
Reset backend selection memoization.
-
.resolve_backend_module(explicit_backend = nil) ⇒ Module?
Get backend module for a specific backend (with explicit override).
-
.resolve_effective_backend(explicit_backend = nil) ⇒ Symbol
Resolve the effective backend considering explicit override.
-
.resolve_native_backend_module(explicit_backend = nil) ⇒ Module?
Resolve a native tree-sitter backend module (for from_library).
-
.with_backend(name) { ... } ⇒ Object
Execute a block with a specific backend in thread-local context.
Class Method Details
.backend ⇒ Object
347 348 349 350 351 352 353 354 355 356 357 358 359 360 |
# File 'lib/tree_haver.rb', line 347 def backend @backend ||= case (ENV["TREE_HAVER_BACKEND"] || :auto).to_s # rubocop:disable ThreadSafety/ClassInstanceVariable when "mri" then :mri when "rust" then :rust when "ffi" then :ffi when "java" then :java when "citrus" then :citrus when "prism" then :prism when "psych" then :psych when "commonmarker" then :commonmarker when "markly" then :markly else :auto end end |
.backend=(name) ⇒ Symbol?
Set the backend to use
370 371 372 |
# File 'lib/tree_haver.rb', line 370 def backend=(name) @backend = name&.to_sym # rubocop:disable ThreadSafety/ClassInstanceVariable end |
.backend_module ⇒ Module?
Determine the concrete backend module to use
This method performs backend auto-selection when backend is :auto. On JRuby, prefers Java backend if available, then FFI, then Citrus. On MRI, prefers MRI backend if available, then Rust, then FFI, then Citrus. Citrus is the final fallback as it’s pure Ruby and works everywhere.
666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 |
# File 'lib/tree_haver.rb', line 666 def backend_module case effective_backend # Changed from: backend when :mri Backends::MRI when :rust Backends::Rust when :ffi Backends::FFI when :java Backends::Java when :citrus Backends::Citrus when :prism Backends::Prism when :psych Backends::Psych when :commonmarker Backends::Commonmarker when :markly Backends::Markly else # auto-select: prefer native/fast backends, fall back to pure Ruby (Citrus) if defined?(RUBY_ENGINE) && RUBY_ENGINE == "jruby" && Backends::Java.available? Backends::Java elsif defined?(RUBY_ENGINE) && RUBY_ENGINE == "ruby" && Backends::MRI.available? Backends::MRI elsif defined?(RUBY_ENGINE) && RUBY_ENGINE == "ruby" && Backends::Rust.available? Backends::Rust elsif Backends::FFI.available? Backends::FFI elsif Backends::Citrus.available? Backends::Citrus # Pure Ruby fallback else # No backend available nil end end end |
.backend_protect ⇒ Object
Alias for backend_protect?
299 300 301 |
# File 'lib/tree_haver.rb', line 299 def backend_protect backend_protect? end |
.backend_protect=(value) ⇒ Boolean
Whether backend conflict protection is enabled
When true (default), TreeHaver will raise BackendConflict if you try to use a backend that is known to conflict with a previously used backend. For example, FFI will not work after MRI has been used.
Set to false to disable protection (useful for testing compatibility).
285 286 287 288 |
# File 'lib/tree_haver.rb', line 285 def backend_protect=(value) @backend_protect_mutex ||= Mutex.new @backend_protect_mutex.synchronize { @backend_protect = value } end |
.backend_protect? ⇒ Boolean
Check if backend conflict protection is enabled
293 294 295 296 |
# File 'lib/tree_haver.rb', line 293 def backend_protect? return @backend_protect if defined?(@backend_protect) # rubocop:disable ThreadSafety/ClassInstanceVariable true # Default is protected end |
.backends_used ⇒ Set<Symbol>
Track which backends have been used in this process
306 307 308 |
# File 'lib/tree_haver.rb', line 306 def backends_used @backends_used ||= Set.new # rubocop:disable ThreadSafety/ClassInstanceVariable end |
.capabilities ⇒ Hash{Symbol => Object}
Get capabilities of the current backend
Returns a hash describing what features the selected backend supports. Common keys include:
-
:backend - Symbol identifying the backend (:mri, :rust, :ffi, :java)
-
:parse - Whether parsing is implemented
-
:query - Whether the Query API is available
-
:bytes_field - Whether byte position fields are available
-
:incremental - Whether incremental parsing is supported
719 720 721 722 723 |
# File 'lib/tree_haver.rb', line 719 def capabilities mod = backend_module return {} unless mod mod.capabilities end |
.check_backend_conflict!(backend) ⇒ void
This method returns an undefined value.
Check if using a backend would cause a conflict
333 334 335 336 337 338 339 340 341 342 343 |
# File 'lib/tree_haver.rb', line 333 def check_backend_conflict!(backend) return unless backend_protect? conflicts = conflicting_backends_for(backend) return if conflicts.empty? raise BackendConflict, "Cannot use #{backend} backend: it is blocked by previously used backend(s): #{conflicts.join(", ")}. " \ "The #{backend} backend will segfault when #{conflicts.first} has already loaded. " \ "To disable this protection (at risk of segfaults), set TreeHaver.backend_protect = false" end |
.conflicting_backends_for(backend) ⇒ Array<Symbol>
Check if a backend would conflict with previously used backends
323 324 325 326 |
# File 'lib/tree_haver.rb', line 323 def conflicting_backends_for(backend) blockers = Backends::BLOCKED_BY[backend] || [] blockers & backends_used.to_a end |
.current_backend_context ⇒ Hash{Symbol => Object}
Thread-local backend context storage
Returns a hash containing the thread-local backend context with keys:
-
:backend - The backend name (Symbol) or nil if using global default
-
:depth - The nesting depth (Integer) for proper cleanup
399 400 401 402 403 404 |
# File 'lib/tree_haver.rb', line 399 def current_backend_context Thread.current[:tree_haver_backend_context] ||= { backend: nil, # nil means "use global default" depth: 0, # Track nesting depth for proper cleanup } end |
.effective_backend ⇒ Symbol
Get the effective backend for current context
Priority: thread-local context → global @backend → :auto
417 418 419 420 |
# File 'lib/tree_haver.rb', line 417 def effective_backend ctx = current_backend_context ctx[:backend] || backend || :auto end |
.parser_for(language_name, library_path: nil, symbol: nil, citrus_config: nil) ⇒ TreeHaver::Parser
Create a parser configured for a specific language
This is the recommended high-level API for creating a parser. It handles:
-
Checking if the language is already registered
-
Auto-discovering tree-sitter grammar via GrammarFinder
-
Falling back to Citrus grammar if tree-sitter is unavailable
-
Creating and configuring the parser
865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 |
# File 'lib/tree_haver.rb', line 865 def parser_for(language_name, library_path: nil, symbol: nil, citrus_config: nil) name = language_name.to_sym symbol ||= "tree_sitter_#{name}" # Step 1: Try to get the language (may already be registered) language = begin # Check if already registered and loadable if registered_language(name) Language.public_send(name, path: library_path, symbol: symbol) end rescue NotAvailable, ArgumentError, LoadError nil end # Step 2: If not registered, try GrammarFinder for tree-sitter unless language # Principle of Least Surprise: If user provides an explicit path, # it MUST exist. Don't silently fall back to auto-discovery. if library_path && !library_path.empty? unless File.exist?(library_path) raise NotAvailable, "Specified parser path does not exist: #{library_path}" end begin register_language(name, path: library_path, symbol: symbol) language = Language.public_send(name) rescue NotAvailable, ArgumentError, LoadError => e # Re-raise with more context since user explicitly provided this path raise NotAvailable, "Failed to load parser from specified path #{library_path}: #{e.}" end else # Auto-discover via GrammarFinder (no explicit path provided) begin finder = GrammarFinder.new(name) if finder.available? finder.register! language = Language.public_send(name) end rescue NotAvailable, ArgumentError, LoadError language = nil end end end # Step 3: Try Citrus fallback if tree-sitter failed unless language # Use explicit config, or fall back to built-in defaults for known languages citrus_config ||= CITRUS_DEFAULTS[name] || {} # Only attempt if we have the required configuration if citrus_config[:gem_name] && citrus_config[:grammar_const] begin citrus_finder = CitrusGrammarFinder.new( language: name, gem_name: citrus_config[:gem_name], grammar_const: citrus_config[:grammar_const], require_path: citrus_config[:require_path], ) if citrus_finder.available? citrus_finder.register! language = Language.public_send(name) end rescue NotAvailable, ArgumentError, LoadError, NameError, TypeError language = nil end end end # Step 4: Raise if nothing worked unless language raise NotAvailable, "No parser available for #{name}. " \ "Install tree-sitter-#{name} or the appropriate Ruby gem. " \ "Set TREE_SITTER_#{name.to_s.upcase}_PATH for custom grammar location." end # Step 5: Create and configure parser parser = Parser.new parser.language = language parser end |
.record_backend_usage(backend) ⇒ void
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
This method returns an undefined value.
Record that a backend has been used
315 316 317 |
# File 'lib/tree_haver.rb', line 315 def record_backend_usage(backend) backends_used << backend end |
.register_language(name, path: nil, symbol: nil, grammar_module: nil, gem_name: nil) ⇒ void
This method returns an undefined value.
Register a language helper by name (backend-agnostic)
After registration, you can use dynamic helpers like TreeHaver::Language.toml to load the registered language. TreeHaver will automatically use the appropriate grammar based on the active backend.
The name parameter is an arbitrary identifier you choose - it doesn’t need to match the actual language name. This is useful for:
-
Testing: Use unique names like
:toml_testto avoid collisions -
Aliasing: Register the same grammar under multiple names
-
Versioning: Register different grammar versions as
:ruby_2and:ruby_3
The actual grammar identity comes from path/symbol (tree-sitter) or grammar_module (Citrus), not from the name.
IMPORTANT: This method INTENTIONALLY allows registering BOTH a tree-sitter library AND a Citrus grammar for the same language IN A SINGLE CALL. This is achieved by using separate if statements (not elsif) and no early returns. This design is deliberate and provides significant benefits:
Why register both backends for one language?
-
Backend flexibility: Code works regardless of which backend is active
-
Performance testing: Compare tree-sitter vs Citrus performance
-
Gradual migration: Transition between backends without breaking code
-
Fallback scenarios: Use Citrus when tree-sitter library unavailable
-
Platform portability: tree-sitter on Linux/Mac, Citrus on JRuby/Windows
The active backend determines which registration is used automatically. No code changes needed to switch backends - just change TreeHaver.backend.
798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 |
# File 'lib/tree_haver.rb', line 798 def register_language(name, path: nil, symbol: nil, grammar_module: nil, gem_name: nil) # Register tree-sitter backend if path provided # Note: Uses `if` not `elsif` so both backends can be registered in one call if path LanguageRegistry.register(name, :tree_sitter, path: path, symbol: symbol) end # Register Citrus backend if grammar_module provided # Note: Uses `if` not `elsif` so both backends can be registered in one call # This allows maximum flexibility - register once, use with any backend if grammar_module unless grammar_module.respond_to?(:parse) raise ArgumentError, "Grammar module must respond to :parse" end LanguageRegistry.register(name, :citrus, grammar_module: grammar_module, gem_name: gem_name) end # Require at least one backend to be registered if path.nil? && grammar_module.nil? raise ArgumentError, "Must provide at least one of: path (tree-sitter) or grammar_module (Citrus)" end # Note: No early return! This method intentionally processes both `if` blocks # above to allow registering multiple backends for the same language. # Both tree-sitter and Citrus can be registered simultaneously for maximum # flexibility. See method documentation for rationale. nil end |
.registered_language(name) ⇒ Hash?
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Fetch a registered language entry
833 834 835 |
# File 'lib/tree_haver.rb', line 833 def registered_language(name) LanguageRegistry.registered(name) end |
.reset_backend!(to: :auto) ⇒ void
This method returns an undefined value.
Reset backend selection memoization
Primarily useful in tests to switch backends without cross-example leakage.
384 385 386 |
# File 'lib/tree_haver.rb', line 384 def reset_backend!(to: :auto) @backend = to&.to_sym # rubocop:disable ThreadSafety/ClassInstanceVariable end |
.resolve_backend_module(explicit_backend = nil) ⇒ Module?
Get backend module for a specific backend (with explicit override)
539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 |
# File 'lib/tree_haver.rb', line 539 def resolve_backend_module(explicit_backend = nil) # Temporarily override effective backend requested = resolve_effective_backend(explicit_backend) mod = case requested when :mri Backends::MRI when :rust Backends::Rust when :ffi Backends::FFI when :java Backends::Java when :citrus Backends::Citrus when :prism Backends::Prism when :psych Backends::Psych when :commonmarker Backends::Commonmarker when :markly Backends::Markly when :auto backend_module # Fall back to normal resolution for :auto else # Unknown backend name - return nil to trigger error in caller nil end # Return nil if the module doesn't exist return unless mod # Check for backend conflicts FIRST, before checking availability # This is critical because the conflict causes the backend to report unavailable # We want to raise a clear error explaining WHY it's unavailable # Use the requested backend name directly (not capabilities) because # capabilities may be empty when the backend is blocked/unavailable check_backend_conflict!(requested) if requested && requested != :auto # Now check if the backend is available # Why assume modules without available? are available? # - Some backends might be mocked in tests without an available? method # - This makes the code more defensive and test-friendly # - It allows graceful degradation if a backend module is incomplete # - Backward compatibility: if a module doesn't declare availability, assume it works return if mod.respond_to?(:available?) && !mod.available? # Record that this backend is being used record_backend_usage(requested) if requested && requested != :auto mod end |
.resolve_effective_backend(explicit_backend = nil) ⇒ Symbol
Resolve the effective backend considering explicit override
Priority: explicit > thread context > global > :auto
526 527 528 529 |
# File 'lib/tree_haver.rb', line 526 def resolve_effective_backend(explicit_backend = nil) return explicit_backend.to_sym if explicit_backend effective_backend end |
.resolve_native_backend_module(explicit_backend = nil) ⇒ Module?
Resolve a native tree-sitter backend module (for from_library)
This method is similar to resolve_backend_module but ONLY considers backends that support loading shared libraries (.so files):
-
MRI (ruby_tree_sitter C extension)
-
Rust (tree_stump)
-
FFI (ffi gem with libtree-sitter)
-
Java (jtreesitter on JRuby)
Pure Ruby backends (Citrus, Prism, Psych, Commonmarker, Markly) are NOT considered because they don’t support from_library.
613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 |
# File 'lib/tree_haver.rb', line 613 def resolve_native_backend_module(explicit_backend = nil) # Short-circuit on TruffleRuby: no native backends work # - MRI: C extension, MRI only # - Rust: magnus requires MRI's C API # - FFI: STRUCT_BY_VALUE not supported # - Java: requires JRuby's Java interop if defined?(RUBY_ENGINE) && RUBY_ENGINE == "truffleruby" return unless explicit_backend # Auto-select: no backends available # If explicit backend requested, let it fail with proper error below end # Get the effective backend (considers thread-local and global settings) requested = resolve_effective_backend(explicit_backend) # If the effective backend is a native backend, use it if NATIVE_BACKENDS.include?(requested) return resolve_backend_module(requested) end # If a specific non-native backend was explicitly requested, return nil # (from_library only works with native backends that load .so files) return if explicit_backend # If effective backend is :auto, auto-select from native backends in priority order # Note: non-native backends set via with_backend are NOT used here because # from_library only works with native backends native_priority = if defined?(RUBY_ENGINE) && RUBY_ENGINE == "jruby" %i[java ffi] # JRuby: Java first, then FFI else %i[mri rust ffi] # MRI: MRI first, then Rust, then FFI end native_priority.each do |backend| mod = resolve_backend_module(backend) return mod if mod end nil # No native backend available end |
.with_backend(name) { ... } ⇒ Object
Execute a block with a specific backend in thread-local context
This method provides temporary, thread-safe backend switching for a block of code. The backend setting is automatically restored when the block exits, even if an exception is raised. Supports nesting—inner blocks override outer blocks, and each level is properly unwound.
Thread Safety: Each thread maintains its own backend context, so concurrent threads can safely use different backends without interfering with each other.
Use Cases:
-
Testing: Test the same code path with different backends
-
Performance comparison: Benchmark parsing with different backends
-
Fallback scenarios: Try one backend, fall back to another on failure
-
Thread isolation: Different threads can use different backends safely
490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 |
# File 'lib/tree_haver.rb', line 490 def with_backend(name) raise ArgumentError, "Backend name required" if name.nil? # Get context FIRST to ensure it exists ctx = current_backend_context old_backend = ctx[:backend] old_depth = ctx[:depth] begin # Set new backend and increment depth ctx[:backend] = name.to_sym ctx[:depth] += 1 # Execute block yield ensure # Restore previous backend and depth # This ensures proper unwinding even with exceptions ctx[:backend] = old_backend ctx[:depth] = old_depth end end |