Module: TreeHaver

Defined in:
lib/tree_haver.rb,
lib/tree_haver/node.rb,
lib/tree_haver/tree.rb,
lib/tree_haver/point.rb,
lib/tree_haver/parser.rb,
lib/tree_haver/version.rb,
lib/tree_haver/language.rb,
lib/tree_haver/backend_api.rb,
lib/tree_haver/backends/ffi.rb,
lib/tree_haver/backends/mri.rb,
lib/tree_haver/backends/java.rb,
lib/tree_haver/backends/rust.rb,
lib/tree_haver/backends/prism.rb,
lib/tree_haver/backends/psych.rb,
lib/tree_haver/grammar_finder.rb,
lib/tree_haver/path_validator.rb,
lib/tree_haver/backends/citrus.rb,
lib/tree_haver/backends/markly.rb,
lib/tree_haver/language_registry.rb,
lib/tree_haver/library_path_utils.rb,
lib/tree_haver/backends/commonmarker.rb,
lib/tree_haver/citrus_grammar_finder.rb,
lib/tree_haver/rspec/dependency_tags.rb

Overview

TreeHaver is a cross-Ruby adapter for code parsing with 10 backends.

Provides a unified API for parsing source code across MRI Ruby, JRuby, and TruffleRuby using tree-sitter grammars or language-specific native parsers.

Backends

Supports 9 backends:

  • Tree-sitter: MRI ©, Rust, FFI, Java

  • Native parsers: Prism (Ruby), Psych (YAML), Commonmarker (Markdown), Markly (GFM)

  • Pure Ruby: Citrus (portable fallback)

Platform Compatibility

Not all backends work on all Ruby platforms:

| Backend      | MRI | JRuby | TruffleRuby |
|--------------|-----|-------|-------------|
| MRI (C ext)  | 
  • JRuby: Cannot load native C/Rust extensions; use FFI, Java, or pure Ruby backends

  • TruffleRuby: FFI doesn’t support STRUCT_BY_VALUE; magnus/rb-sys incompatible with C API; use Prism, Psych, Citrus, or potentially Commonmarker/Markly

Examples:

Basic usage with tree-sitter

# Load a language grammar
language = TreeHaver::Language.from_library(
  "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml"
)

# Create and configure a parser
parser = TreeHaver::Parser.new
parser.language = language

# Parse source code
tree = parser.parse("[package]\nname = \"my-app\"")
root = tree.root_node

# Use unified Position API (works across all backends)
puts root.start_line      # => 1 (1-based)
puts root.source_position # => {start_line:, end_line:, start_column:, end_column:}

Using language-specific backends

# Parse Ruby with Prism
TreeHaver.backend = :prism
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Backends::Prism::Language.ruby
tree = parser.parse("class Example; end")

# Parse YAML with Psych
TreeHaver.backend = :psych
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Backends::Psych::Language.yaml
tree = parser.parse("key: value")

# Parse Markdown with Commonmarker
TreeHaver.backend = :commonmarker
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Backends::Commonmarker::Language.markdown
tree = parser.parse("# Heading\nParagraph")

Using language registration

TreeHaver.register_language(:toml, path: "/usr/local/lib/libtree-sitter-toml.so")
language = TreeHaver::Language.toml

Using GrammarFinder for automatic discovery

# GrammarFinder automatically locates grammar libraries on the system
finder = TreeHaver::GrammarFinder.new(:toml)
finder.register! if finder.available?
language = TreeHaver::Language.toml

Selecting a backend

TreeHaver.backend = :mri          # Force MRI (ruby_tree_sitter)
TreeHaver.backend = :rust         # Force Rust (tree_stump)
TreeHaver.backend = :ffi          # Force FFI
TreeHaver.backend = :java         # Force Java (JRuby)
TreeHaver.backend = :prism        # Force Prism (Ruby)
TreeHaver.backend = :psych        # Force Psych (YAML)
TreeHaver.backend = :commonmarker # Force Commonmarker (Markdown)
TreeHaver.backend = :markly       # Force Markly (GFM)
TreeHaver.backend = :citrus       # Force Citrus (pure Ruby)
TreeHaver.backend = :auto         # Auto-select (default)

See Also:

Defined Under Namespace

Modules: BackendAPI, Backends, LanguageRegistry, LibraryPathUtils, PathValidator, RSpec, Version Classes: BackendConflict, CitrusGrammarFinder, Error, GrammarFinder, Language, Node, NotAvailable, Parser, Point, Tree

Constant Summary collapse

CITRUS_DEFAULTS =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Default Citrus configurations for known languages

These are used by parser_for when no explicit citrus_config is provided and tree-sitter backends are not available (e.g., on TruffleRuby).

{
  toml: {
    gem_name: "toml-rb",
    grammar_const: "TomlRB::Document",
    require_path: "toml-rb",
  },
}.freeze
NATIVE_BACKENDS =

Native tree-sitter backends that support loading shared libraries (.so files) These backends wrap the tree-sitter C library via various bindings. Pure Ruby backends (Citrus, Prism, Psych, Commonmarker, Markly) are excluded.

i[mri rust ffi java].freeze
VALID_NATIVE_BACKENDS =

Valid native backend names (require native extensions)

%w[mri rust ffi java].freeze
VALID_RUBY_BACKENDS =

Valid pure Ruby backend names (no native extensions)

%w[citrus prism psych commonmarker markly].freeze
VALID_BACKENDS =

All valid backend names

(VALID_NATIVE_BACKENDS + VALID_RUBY_BACKENDS + %w[auto none]).freeze
VERSION =

Traditional location for VERSION constant

Returns:

  • (String)

    the version string

Version::VERSION

Class Method Summary collapse

Class Method Details

.allowed_native_backendsArray<Symbol>

Get allowed native backends from TREE_HAVER_NATIVE_BACKEND environment variable

Supports comma-separated values like “mri,ffi”. Special values:

  • “auto” or empty/unset: automatically select from available native backends

  • “none”: no native backends allowed (pure Ruby only)

Examples:

Allow only MRI and FFI

# TREE_HAVER_NATIVE_BACKEND=mri,ffi
TreeHaver.allowed_native_backends  # => [:mri, :ffi]

Auto-select native backends (default)

# TREE_HAVER_NATIVE_BACKEND not set, empty, or "auto"
TreeHaver.allowed_native_backends  # => [:auto]

Disable all native backends

# TREE_HAVER_NATIVE_BACKEND=none
TreeHaver.allowed_native_backends  # => [:none]

Returns:

  • (Array<Symbol>)

    list of allowed native backend symbols, or [:auto] or [:none]



399
400
401
# File 'lib/tree_haver.rb', line 399

def allowed_native_backends
  @allowed_native_backends ||= parse_backend_list_env("TREE_HAVER_NATIVE_BACKEND", VALID_NATIVE_BACKENDS) # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.allowed_ruby_backendsArray<Symbol>

Get allowed Ruby backends from TREE_HAVER_RUBY_BACKEND environment variable

Supports comma-separated values like “citrus,prism”. Special values:

  • “auto” or empty/unset: automatically select from available Ruby backends

  • “none”: no Ruby backends allowed (native only)

Examples:

Allow only Citrus

# TREE_HAVER_RUBY_BACKEND=citrus
TreeHaver.allowed_ruby_backends  # => [:citrus]

Auto-select Ruby backends (default)

# TREE_HAVER_RUBY_BACKEND not set, empty, or "auto"
TreeHaver.allowed_ruby_backends  # => [:auto]

Returns:

  • (Array<Symbol>)

    list of allowed Ruby backend symbols, or [:auto] or [:none]



417
418
419
# File 'lib/tree_haver.rb', line 417

def allowed_ruby_backends
  @allowed_ruby_backends ||= parse_backend_list_env("TREE_HAVER_RUBY_BACKEND", VALID_RUBY_BACKENDS) # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.backendObject

Examples:

TreeHaver.backend  # => :auto


367
368
369
370
371
# File 'lib/tree_haver.rb', line 367

def backend
  return @backend if defined?(@backend) && @backend # rubocop:disable ThreadSafety/ClassInstanceVariable

  @backend = parse_single_backend_env # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.backend=(name) ⇒ Symbol?

Set the backend to use

Examples:

Force FFI backend

TreeHaver.backend = :ffi

Force Rust backend

TreeHaver.backend = :rust

Parameters:

  • name (Symbol, String, nil)

    backend name (:auto, :mri, :rust, :ffi, :java, :citrus)

Returns:

  • (Symbol, nil)

    the backend that was set



464
465
466
# File 'lib/tree_haver.rb', line 464

def backend=(name)
  @backend = name&.to_sym # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.backend_allowed?(backend_name) ⇒ Boolean

Check if a specific backend is allowed based on environment variables

Checks TREE_HAVER_NATIVE_BACKEND for native backends and TREE_HAVER_RUBY_BACKEND for pure Ruby backends.

Examples:

# TREE_HAVER_NATIVE_BACKEND=mri
TreeHaver.backend_allowed?(:mri)    # => true
TreeHaver.backend_allowed?(:ffi)    # => false
TreeHaver.backend_allowed?(:citrus) # => true (Ruby backends use separate env var)

Parameters:

  • backend_name (Symbol, String)

    the backend to check

Returns:

  • (Boolean)

    true if the backend is allowed



433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
# File 'lib/tree_haver.rb', line 433

def backend_allowed?(backend_name)
  backend_sym = backend_name.to_sym

  # Check if it's a native backend
  if VALID_NATIVE_BACKENDS.include?(backend_sym.to_s)
    allowed = allowed_native_backends
    return true if allowed == [:auto]
    return false if allowed == [:none]
    return allowed.include?(backend_sym)
  end

  # Check if it's a Ruby backend
  if VALID_RUBY_BACKENDS.include?(backend_sym.to_s)
    allowed = allowed_ruby_backends
    return true if allowed == [:auto]
    return false if allowed == [:none]
    return allowed.include?(backend_sym)
  end

  # Unknown backend or :auto - allow
  true
end

.backend_moduleModule?

Determine the concrete backend module to use

This method performs backend auto-selection when backend is :auto. On JRuby, prefers Java backend if available, then FFI, then Citrus. On MRI, prefers MRI backend if available, then Rust, then FFI, then Citrus. Citrus is the final fallback as it’s pure Ruby and works everywhere.

Examples:

mod = TreeHaver.backend_module
if mod
  puts "Using #{mod.capabilities[:backend]} backend"
end

Returns:

  • (Module, nil)

    the backend module (Backends::MRI, Backends::Rust, Backends::FFI, Backends::Java, or Backends::Citrus), or nil if none available



860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
# File 'lib/tree_haver.rb', line 860

def backend_module
  requested = effective_backend  # Changed from: backend

  # For explicit backends (not :auto), check for conflicts first
  # If the backend is blocked, fall through to auto-select
  if requested != :auto && backend_protect?
    conflicts = conflicting_backends_for(requested)
    unless conflicts.empty?
      # The explicitly requested backend is blocked - fall through to auto-select
      requested = :auto
    end
  end

  case requested
  when :mri
    Backends::MRI
  when :rust
    Backends::Rust
  when :ffi
    Backends::FFI
  when :java
    Backends::Java
  when :citrus
    Backends::Citrus
  when :prism
    Backends::Prism
  when :psych
    Backends::Psych
  when :commonmarker
    Backends::Commonmarker
  when :markly
    Backends::Markly
  else
    # auto-select: prefer native/fast backends, fall back to pure Ruby (Citrus)
    # Each backend must be both allowed (by ENV) and available (gem installed)
    if defined?(RUBY_ENGINE) && RUBY_ENGINE == "jruby" && backend_allowed?(:java) && Backends::Java.available?
      Backends::Java
    elsif defined?(RUBY_ENGINE) && RUBY_ENGINE == "ruby" && backend_allowed?(:mri) && Backends::MRI.available?
      Backends::MRI
    elsif defined?(RUBY_ENGINE) && RUBY_ENGINE == "ruby" && backend_allowed?(:rust) && Backends::Rust.available?
      Backends::Rust
    elsif backend_allowed?(:ffi) && Backends::FFI.available?
      Backends::FFI
    elsif backend_allowed?(:citrus) && Backends::Citrus.available?
      Backends::Citrus  # Pure Ruby fallback
    else
      # No backend available
      nil
    end
  end
end

.backend_protectObject

Alias for backend_protect?



319
320
321
# File 'lib/tree_haver.rb', line 319

def backend_protect
  backend_protect?
end

.backend_protect=(value) ⇒ Boolean

Whether backend conflict protection is enabled

When true (default), TreeHaver will raise BackendConflict if you try to use a backend that is known to conflict with a previously used backend. For example, FFI will not work after MRI has been used.

Set to false to disable protection (useful for testing compatibility).

Examples:

Disable protection for testing

TreeHaver.backend_protect = false

Returns:

  • (Boolean)


305
306
307
308
# File 'lib/tree_haver.rb', line 305

def backend_protect=(value)
  @backend_protect_mutex ||= Mutex.new
  @backend_protect_mutex.synchronize { @backend_protect = value }
end

.backend_protect?Boolean

Check if backend conflict protection is enabled

Returns:

  • (Boolean)

    true if protection is enabled (default)



313
314
315
316
# File 'lib/tree_haver.rb', line 313

def backend_protect?
  return @backend_protect if defined?(@backend_protect) # rubocop:disable ThreadSafety/ClassInstanceVariable
  true  # Default is protected
end

.backends_usedSet<Symbol>

Track which backends have been used in this process

Returns:

  • (Set<Symbol>)

    set of backend symbols that have been used



326
327
328
# File 'lib/tree_haver.rb', line 326

def backends_used
  @backends_used ||= Set.new # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.builtin_backends_registered?Boolean

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Check if built-in backends have been registered

Returns:

  • (Boolean)


519
520
521
# File 'lib/tree_haver.rb', line 519

def builtin_backends_registered?
  @builtin_backends_registered ||= false # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.capabilitiesHash{Symbol => Object}

Get capabilities of the current backend

Returns a hash describing what features the selected backend supports. Common keys include:

  • :backend - Symbol identifying the backend (:mri, :rust, :ffi, :java)

  • :parse - Whether parsing is implemented

  • :query - Whether the Query API is available

  • :bytes_field - Whether byte position fields are available

  • :incremental - Whether incremental parsing is supported

Examples:

TreeHaver.capabilities
# => { backend: :mri, query: true, bytes_field: true }

Returns:

  • (Hash{Symbol => Object})

    capability map, or empty hash if no backend available



926
927
928
929
930
# File 'lib/tree_haver.rb', line 926

def capabilities
  mod = backend_module
  return {} unless mod
  mod.capabilities
end

.check_backend_conflict!(backend) ⇒ void

This method returns an undefined value.

Check if using a backend would cause a conflict

Parameters:

  • backend (Symbol)

    the backend to check

Raises:



353
354
355
356
357
358
359
360
361
362
363
# File 'lib/tree_haver.rb', line 353

def check_backend_conflict!(backend)
  return unless backend_protect?

  conflicts = conflicting_backends_for(backend)
  return if conflicts.empty?

  raise BackendConflict,
    "Cannot use #{backend} backend: it is blocked by previously used backend(s): #{conflicts.join(", ")}. " \
      "The #{backend} backend will segfault when #{conflicts.first} has already loaded. " \
      "To disable this protection (at risk of segfaults), set TreeHaver.backend_protect = false"
end

.conflicting_backends_for(backend) ⇒ Array<Symbol>

Check if a backend would conflict with previously used backends

Parameters:

  • backend (Symbol)

    the backend to check

Returns:

  • (Array<Symbol>)

    list of previously used backends that block this one



343
344
345
346
# File 'lib/tree_haver.rb', line 343

def conflicting_backends_for(backend)
  blockers = Backends::BLOCKED_BY[backend] || []
  blockers & backends_used.to_a
end

.current_backend_contextHash{Symbol => Object}

Thread-local backend context storage

Returns a hash containing the thread-local backend context with keys:

  • :backend - The backend name (Symbol) or nil if using global default

  • :depth - The nesting depth (Integer) for proper cleanup

Examples:

ctx = TreeHaver.current_backend_context
ctx[:backend]  # => nil or :ffi, :mri, etc.
ctx[:depth]    # => 0, 1, 2, etc.

Returns:

  • (Hash{Symbol => Object})

    context hash with :backend and :depth keys



588
589
590
591
592
593
# File 'lib/tree_haver.rb', line 588

def current_backend_context
  Thread.current[:tree_haver_backend_context] ||= {
    backend: nil,  # nil means "use global default"
    depth: 0,       # Track nesting depth for proper cleanup
  }
end

.effective_backendSymbol

Get the effective backend for current context

Priority: thread-local context → global @backend → :auto

Examples:

TreeHaver.effective_backend  # => :auto (default)

With thread-local context

TreeHaver.with_backend(:ffi) do
  TreeHaver.effective_backend  # => :ffi
end

Returns:

  • (Symbol)

    the backend to use



606
607
608
609
# File 'lib/tree_haver.rb', line 606

def effective_backend
  ctx = current_backend_context
  ctx[:backend] || backend || :auto
end

.ensure_builtin_backends_registered!void

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

This method returns an undefined value.

Ensure built-in backends are registered (idempotent)



527
528
529
530
531
# File 'lib/tree_haver.rb', line 527

def ensure_builtin_backends_registered!
  return if builtin_backends_registered?
  register_builtin_backends!
  @builtin_backends_registered = true # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.parse_backend_list_env(env_var, valid_backends) ⇒ Array<Symbol>

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Parse a backend list environment variable

Parameters:

  • env_var (String)

    the environment variable name

  • valid_backends (Array<String>)

    list of valid backend names

Returns:

  • (Array<Symbol>)

    list of backend symbols, or [:auto] or [:none]



553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
# File 'lib/tree_haver.rb', line 553

def parse_backend_list_env(env_var, valid_backends)
  env_value = ENV[env_var]

  # Empty or unset means "auto"
  return [:auto] if env_value.nil? || env_value.strip.empty?

  normalized = env_value.strip.downcase

  # Handle special values
  return [:auto] if normalized == "auto"
  return [:none] if normalized == "none"

  # Split on comma and parse each backend
  backends = normalized.split(",").map(&:strip).uniq

  # Convert to symbols, filtering out invalid ones
  parsed = backends.filter_map do |name|
    valid_backends.include?(name) ? name.to_sym : nil
  end

  # Return :auto if no valid backends found
  parsed.empty? ? [:auto] : parsed
end

.parse_single_backend_envSymbol

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Parse TREE_HAVER_BACKEND environment variable (single backend)

Returns:

  • (Symbol)

    the backend symbol (:auto if not set or invalid)



537
538
539
540
541
542
543
544
545
# File 'lib/tree_haver.rb', line 537

def parse_single_backend_env
  env_value = ENV["TREE_HAVER_BACKEND"]
  return :auto if env_value.nil? || env_value.strip.empty?

  name = env_value.strip.downcase
  return :auto unless VALID_BACKENDS.include?(name) && name != "all" && name != "none"

  name.to_sym
end

.parser_for(language_name, library_path: nil, symbol: nil, citrus_config: nil) ⇒ TreeHaver::Parser

Create a parser configured for a specific language

Respects the effective backend setting (via TREE_HAVER_BACKEND env var, TreeHaver.backend=, or with_backend block).

Supports three types of backends:

  1. Tree-sitter native backends (auto-discovered or explicit path)

  2. Citrus grammars (pure Ruby, via CITRUS_DEFAULTS or explicit config)

  3. Pure Ruby backends (registered via backend_module, e.g., Prism, Psych, RBS)

Examples:

Basic usage (auto-discovers grammar)

parser = TreeHaver.parser_for(:toml)

Force Citrus backend

TreeHaver.with_backend(:citrus) { TreeHaver.parser_for(:toml) }

Use registered pure Ruby backend (e.g., RBS)

# First, rbs-merge registers its backend:
# TreeHaver.register_language(:rbs, backend_module: Rbs::Merge::RbsBackend, backend_type: :rbs)
parser = TreeHaver.parser_for(:rbs)

Parameters:

  • language_name (Symbol, String)

    the language to parse (e.g., :toml, :json, :ruby, :yaml, :rbs)

  • library_path (String, nil) (defaults to: nil)

    optional explicit path to tree-sitter grammar library

  • symbol (String, nil) (defaults to: nil)

    optional tree-sitter symbol name (defaults to “tree_sitter_<name>”)

  • citrus_config (Hash, nil) (defaults to: nil)

    optional Citrus fallback configuration

Returns:

Raises:



1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
# File 'lib/tree_haver.rb', line 1088

def parser_for(language_name, library_path: nil, symbol: nil, citrus_config: nil)
  # Ensure built-in pure Ruby backends are registered
  ensure_builtin_backends_registered!

  name = language_name.to_sym
  symbol ||= "tree_sitter_#{name}"
  requested = effective_backend

  # Determine which backends to try based on effective_backend
  try_tree_sitter = (requested == :auto) || NATIVE_BACKENDS.include?(requested)
  try_citrus = (requested == :auto) || (requested == :citrus)

  language = nil
  parser = nil

  # First, check for registered pure Ruby backends
  # These take precedence when explicitly requested or when no other backend is available
  registration = registered_language(name)
  # Find any registered backend_module (not tree_sitter or citrus)
  registration&.each do |backend_type, config|
    next if i[tree_sitter citrus].include?(backend_type)
    next unless config[:backend_module]

    backend_mod = config[:backend_module]
    # Check if this backend is available
    next unless backend_mod.respond_to?(:available?) && backend_mod.available?

    # If a specific backend was requested, only use if it matches
    next if requested != :auto && requested != backend_type

    # Create parser from the backend module
    if backend_mod.const_defined?(:Parser)
      parser = backend_mod::Parser.new
      if backend_mod.const_defined?(:Language)
        lang_class = backend_mod::Language
        # Try to get language by name (e.g., Language.ruby, Language.yaml, Language.rbs)
        if lang_class.respond_to?(name)
          parser.language = lang_class.public_send(name)
        elsif lang_class.respond_to?(:from_library)
          parser.language = lang_class.from_library(nil, name: name)
        end
      end
      return parser
    end
  end

  # Try tree-sitter if applicable
  if try_tree_sitter && !language
    language = load_tree_sitter_language(name, library_path: library_path, symbol: symbol)
  end

  # Try Citrus if applicable
  if try_citrus && !language
    language = load_citrus_language(name, citrus_config: citrus_config)
  end

  # Raise if nothing worked
  raise NotAvailable, "No parser available for #{name}. " \
    "Install tree-sitter-#{name} or configure a Citrus grammar." unless language

  # Create and configure parser
  parser = Parser.new
  parser.language = language
  parser
end

.record_backend_usage(backend) ⇒ void

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

This method returns an undefined value.

Record that a backend has been used

Parameters:

  • backend (Symbol)

    the backend that was used



335
336
337
# File 'lib/tree_haver.rb', line 335

def record_backend_usage(backend)
  backends_used << backend
end

.register_builtin_backends!void

This method returns an undefined value.

Register built-in pure Ruby backends in the LanguageRegistry

This registers Prism, Psych, Commonmarker, and Markly using the same registration API that external backends use. This ensures consistent behavior whether a backend is built-in or provided by an external gem.

Called automatically when TreeHaver is first used, but can be called manually in tests or when reset! has cleared the registry.

Examples:

Manual registration (usually not needed)

TreeHaver.register_builtin_backends!


496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
# File 'lib/tree_haver.rb', line 496

def register_builtin_backends!
  Backends::PURE_RUBY_BACKENDS.each do |backend_type, info|
    language = info[:language]
    module_name = info[:module_name]

    # Get the backend module
    backend_mod = Backends.const_get(module_name)
    next unless backend_mod

    # Register if available (lazy check - doesn't require the gem yet)
    LanguageRegistry.register(
      language,
      backend_type,
      backend_module: backend_mod,
      gem_name: module_name.downcase,
    )
  end
end

.register_language(name, path: nil, symbol: nil, grammar_module: nil, backend_module: nil, backend_type: nil, gem_name: nil) ⇒ void

This method returns an undefined value.

Register a language helper by name (backend-agnostic)

After registration, you can use dynamic helpers like TreeHaver::Language.toml to load the registered language. TreeHaver will automatically use the appropriate grammar based on the active backend.

The name parameter is an arbitrary identifier you choose - it doesn’t need to match the actual language name. This is useful for:

  • Testing: Use unique names like :toml_test to avoid collisions

  • Aliasing: Register the same grammar under multiple names

  • Versioning: Register different grammar versions as :ruby_2 and :ruby_3

The actual grammar identity comes from path/symbol (tree-sitter) or grammar_module (Citrus), not from the name.

IMPORTANT: This method INTENTIONALLY allows registering BOTH a tree-sitter library AND a Citrus grammar for the same language IN A SINGLE CALL. This is achieved by using separate if statements (not elsif) and no early returns. This design is deliberate and provides significant benefits:

Why register both backends for one language?

  • Backend flexibility: Code works regardless of which backend is active

  • Performance testing: Compare tree-sitter vs Citrus performance

  • Gradual migration: Transition between backends without breaking code

  • Fallback scenarios: Use Citrus when tree-sitter library unavailable

  • Platform portability: tree-sitter on Linux/Mac, Citrus on JRuby/Windows

The active backend determines which registration is used automatically. No code changes needed to switch backends - just change TreeHaver.backend.

Examples:

Register tree-sitter grammar only

TreeHaver.register_language(
  :toml,
  path: "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml"
)

Register Citrus grammar only

TreeHaver.register_language(
  :toml,
  grammar_module: TomlRB::Document,
  gem_name: "toml-rb"
)

Register pure Ruby backend (external gem like rbs-merge)

TreeHaver.register_language(
  :rbs,
  backend_module: Rbs::Merge::Backends::RbsBackend,
  backend_type: :rbs,
  gem_name: "rbs"
)

Register BOTH backends in separate calls

TreeHaver.register_language(
  :toml,
  path: "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml"
)
TreeHaver.register_language(
  :toml,
  grammar_module: TomlRB::Document,
  gem_name: "toml-rb"
)

Register BOTH backends in ONE call (recommended for maximum flexibility)

TreeHaver.register_language(
  :toml,
  path: "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml",
  grammar_module: TomlRB::Document,
  gem_name: "toml-rb"
)
# Now TreeHaver::Language.toml works with ANY backend!

Parameters:

  • name (Symbol, String)

    identifier for this registration (can be any name you choose)

  • path (String, nil) (defaults to: nil)

    absolute path to the language shared library (for tree-sitter)

  • symbol (String, nil) (defaults to: nil)

    optional exported factory symbol (e.g., “tree_sitter_toml”)

  • grammar_module (Module, nil) (defaults to: nil)

    Citrus grammar module that responds to .parse(source)

  • backend_module (Module, nil) (defaults to: nil)

    pure Ruby backend module with Language/Parser classes

  • backend_type (Symbol, nil) (defaults to: nil)

    backend type for backend_module (defaults to module name)

  • gem_name (String, nil) (defaults to: nil)

    optional gem name for error messages



1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
# File 'lib/tree_haver.rb', line 1014

def register_language(name, path: nil, symbol: nil, grammar_module: nil, backend_module: nil, backend_type: nil, gem_name: nil)
  # Register tree-sitter backend if path provided
  # Note: Uses `if` not `elsif` so both backends can be registered in one call
  if path
    LanguageRegistry.register(name, :tree_sitter, path: path, symbol: symbol)
  end

  # Register Citrus backend if grammar_module provided
  # Note: Uses `if` not `elsif` so both backends can be registered in one call
  # This allows maximum flexibility - register once, use with any backend
  if grammar_module
    unless grammar_module.respond_to?(:parse)
      raise ArgumentError, "Grammar module must respond to :parse"
    end

    LanguageRegistry.register(name, :citrus, grammar_module: grammar_module, gem_name: gem_name)
  end

  # Register pure Ruby backend if backend_module provided
  # This is used by external gems (like rbs-merge) to register their own backends
  if backend_module
    # Derive backend_type from module name if not provided
    type = backend_type || backend_module.name.split("::").last.downcase.to_sym
    LanguageRegistry.register(name, type, backend_module: backend_module, gem_name: gem_name)
  end

  # Require at least one backend to be registered
  if path.nil? && grammar_module.nil? && backend_module.nil?
    raise ArgumentError, "Must provide at least one of: path (tree-sitter), grammar_module (Citrus), or backend_module (pure Ruby)"
  end

  # Note: No early return! This method intentionally processes both `if` blocks
  # above to allow registering multiple backends for the same language.
  # Both tree-sitter and Citrus can be registered simultaneously for maximum
  # flexibility. See method documentation for rationale.
  nil
end

.registered_language(name) ⇒ Hash?

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Fetch a registered language entry

Parameters:

  • name (Symbol, String)

    language identifier

Returns:

  • (Hash, nil)

    registration hash with keys :path and :symbol, or nil if not registered



1057
1058
1059
# File 'lib/tree_haver.rb', line 1057

def registered_language(name)
  LanguageRegistry.registered(name)
end

.reset_backend!(to: :auto) ⇒ void

This method returns an undefined value.

Reset backend selection memoization

Primarily useful in tests to switch backends without cross-example leakage.

Examples:

Reset to auto-selection

TreeHaver.reset_backend!

Reset to specific backend

TreeHaver.reset_backend!(to: :ffi)

Parameters:

  • to (Symbol, String, nil) (defaults to: :auto)

    backend name or nil to clear (defaults to :auto)



478
479
480
481
482
# File 'lib/tree_haver.rb', line 478

def reset_backend!(to: :auto)
  @backend = to&.to_sym # rubocop:disable ThreadSafety/ClassInstanceVariable
  @allowed_native_backends = nil # rubocop:disable ThreadSafety/ClassInstanceVariable
  @allowed_ruby_backends = nil # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.resolve_backend_module(explicit_backend = nil) ⇒ Module?

Get backend module for a specific backend (with explicit override)

Examples:

mod = TreeHaver.resolve_backend_module(:ffi)
mod.capabilities[:backend]  # => :ffi

Parameters:

  • explicit_backend (Symbol, String, nil) (defaults to: nil)

    explicitly requested backend

Returns:

  • (Module, nil)

    the backend module or nil if not available

Raises:

  • (BackendConflict)

    if the backend conflicts with previously used backends



728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
# File 'lib/tree_haver.rb', line 728

def resolve_backend_module(explicit_backend = nil)
  # Temporarily override effective backend
  requested = resolve_effective_backend(explicit_backend)

  mod = case requested
  when :mri
    Backends::MRI
  when :rust
    Backends::Rust
  when :ffi
    Backends::FFI
  when :java
    Backends::Java
  when :citrus
    Backends::Citrus
  when :prism
    Backends::Prism
  when :psych
    Backends::Psych
  when :commonmarker
    Backends::Commonmarker
  when :markly
    Backends::Markly
  when :auto
    backend_module  # Fall back to normal resolution for :auto
  else
    # Unknown backend name - return nil to trigger error in caller
    nil
  end

  # Return nil if the module doesn't exist
  return unless mod

  # Check if the backend is allowed by environment variables FIRST
  # This enforces TREE_HAVER_NATIVE_BACKEND and TREE_HAVER_RUBY_BACKEND as hard restrictions
  return if requested && requested != :auto && !backend_allowed?(requested)

  # Check for backend conflicts, before checking availability
  # This is critical because the conflict causes the backend to report unavailable
  # We want to raise a clear error explaining WHY it's unavailable
  # Use the requested backend name directly (not capabilities) because
  # capabilities may be empty when the backend is blocked/unavailable
  check_backend_conflict!(requested) if requested && requested != :auto

  # Now check if the backend is available
  # Why assume modules without available? are available?
  # - Some backends might be mocked in tests without an available? method
  # - This makes the code more defensive and test-friendly
  # - It allows graceful degradation if a backend module is incomplete
  # - Backward compatibility: if a module doesn't declare availability, assume it works
  return if mod.respond_to?(:available?) && !mod.available?

  # Record that this backend is being used
  record_backend_usage(requested) if requested && requested != :auto

  mod
end

.resolve_effective_backend(explicit_backend = nil) ⇒ Symbol

Resolve the effective backend considering explicit override

Priority: explicit > thread context > global > :auto

Examples:

TreeHaver.resolve_effective_backend(:ffi)  # => :ffi

With thread-local context

TreeHaver.with_backend(:mri) do
  TreeHaver.resolve_effective_backend(nil)  # => :mri
  TreeHaver.resolve_effective_backend(:ffi)  # => :ffi (explicit wins)
end

Parameters:

  • explicit_backend (Symbol, String, nil) (defaults to: nil)

    explicitly requested backend

Returns:

  • (Symbol)

    the backend to use



715
716
717
718
# File 'lib/tree_haver.rb', line 715

def resolve_effective_backend(explicit_backend = nil)
  return explicit_backend.to_sym if explicit_backend
  effective_backend
end

.resolve_native_backend_module(explicit_backend = nil) ⇒ Module?

Resolve a native tree-sitter backend module (for from_library)

This method is similar to resolve_backend_module but ONLY considers backends that support loading shared libraries (.so files):

  • MRI (ruby_tree_sitter C extension)

  • Rust (tree_stump)

  • FFI (ffi gem with libtree-sitter)

  • Java (jtreesitter on JRuby)

Pure Ruby backends (Citrus, Prism, Psych, Commonmarker, Markly) are NOT considered because they don’t support from_library.

Parameters:

  • explicit_backend (Symbol, String, nil) (defaults to: nil)

    explicitly requested backend

Returns:

  • (Module, nil)

    the backend module or nil if none available

Raises:

  • (BackendConflict)

    if the backend conflicts with previously used backends



801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
# File 'lib/tree_haver.rb', line 801

def resolve_native_backend_module(explicit_backend = nil)
  # Short-circuit on TruffleRuby: no native backends work
  # - MRI: C extension, MRI only
  # - Rust: magnus requires MRI's C API
  # - FFI: STRUCT_BY_VALUE not supported
  # - Java: requires JRuby's Java interop
  if defined?(RUBY_ENGINE) && RUBY_ENGINE == "truffleruby"
    return unless explicit_backend # Auto-select: no backends available
    # If explicit backend requested, let it fail with proper error below
  end

  # Get the effective backend (considers thread-local and global settings)
  requested = resolve_effective_backend(explicit_backend)

  # If the effective backend is a native backend, use it
  if NATIVE_BACKENDS.include?(requested)
    return resolve_backend_module(requested)
  end

  # If a specific non-native backend was explicitly requested, return nil
  # (from_library only works with native backends that load .so files)
  return if explicit_backend

  # If effective backend is :auto, auto-select from native backends in priority order
  # Note: non-native backends set via with_backend are NOT used here because
  # from_library only works with native backends
  native_priority = if defined?(RUBY_ENGINE) && RUBY_ENGINE == "jruby"
    i[java ffi] # JRuby: Java first, then FFI
  else
    i[mri rust ffi] # MRI: MRI first, then Rust, then FFI
  end

  native_priority.each do |backend|
    # Rescue BackendConflict to allow iteration to continue
    # This enables graceful fallback when a backend is blocked

    mod = resolve_backend_module(backend)
    return mod if mod
  rescue BackendConflict
    # This backend is blocked by a previously used backend, try the next one
    next
  end

  nil # No native backend available
end

.with_backend(name) { ... } ⇒ Object

Execute a block with a specific backend in thread-local context

This method provides temporary, thread-safe backend switching for a block of code. The backend setting is automatically restored when the block exits, even if an exception is raised. Supports nesting—inner blocks override outer blocks, and each level is properly unwound.

Thread Safety: Each thread maintains its own backend context, so concurrent threads can safely use different backends without interfering with each other.

Use Cases:

  • Testing: Test the same code path with different backends

  • Performance comparison: Benchmark parsing with different backends

  • Fallback scenarios: Try one backend, fall back to another on failure

  • Thread isolation: Different threads can use different backends safely

Examples:

Basic usage

TreeHaver.with_backend(:mri) do
  parser = TreeHaver::Parser.new
  tree = parser.parse(source)
end
# Backend is automatically restored here

Nested blocks (inner overrides outer)

TreeHaver.with_backend(:rust) do
  parser1 = TreeHaver::Parser.new  # Uses :rust
  TreeHaver.with_backend(:citrus) do
    parser2 = TreeHaver::Parser.new  # Uses :citrus
  end
  parser3 = TreeHaver::Parser.new  # Back to :rust
end

Testing multiple backends

[:mri, :rust, :citrus].each do |backend_name|
  TreeHaver.with_backend(backend_name) do
    parser = TreeHaver::Parser.new
    result = parser.parse(source)
    puts "#{backend_name}: #{result.root_node.type}"
  end
end

Exception safety (backend restored even on error)

TreeHaver.with_backend(:mri) do
  raise "Something went wrong"
rescue
  # Handle error
end
# Backend is still restored to its previous value

Thread isolation

threads = [:mri, :rust].map do |backend_name|
  Thread.new do
    TreeHaver.with_backend(backend_name) do
      # Each thread uses its own backend independently
      TreeHaver::Parser.new
    end
  end
end
threads.each(&:join)

Parameters:

  • name (Symbol, String)

    backend name (:mri, :rust, :ffi, :java, :citrus, :auto)

Yields:

  • block to execute with the specified backend

Returns:

  • (Object)

    the return value of the block

Raises:

  • (ArgumentError)

    if backend name is nil

  • (BackendConflict)

    if the requested backend conflicts with a previously used backend

See Also:

  • #effective_backend
  • #current_backend_context


679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
# File 'lib/tree_haver.rb', line 679

def with_backend(name)
  raise ArgumentError, "Backend name required" if name.nil?

  # Get context FIRST to ensure it exists
  ctx = current_backend_context
  old_backend = ctx[:backend]
  old_depth = ctx[:depth]

  begin
    # Set new backend and increment depth
    ctx[:backend] = name.to_sym
    ctx[:depth] += 1

    # Execute block
    yield
  ensure
    # Restore previous backend and depth
    # This ensures proper unwinding even with exceptions
    ctx[:backend] = old_backend
    ctx[:depth] = old_depth
  end
end