Module: TreeHaver

Defined in:
lib/tree_haver.rb,
lib/tree_haver/base.rb,
lib/tree_haver/node.rb,
lib/tree_haver/tree.rb,
lib/tree_haver/point.rb,
lib/tree_haver/parser.rb,
lib/tree_haver/version.rb,
lib/tree_haver/language.rb,
lib/tree_haver/base/node.rb,
lib/tree_haver/base/tree.rb,
lib/tree_haver/base/point.rb,
lib/tree_haver/backend_api.rb,
lib/tree_haver/base/parser.rb,
lib/tree_haver/backends/ffi.rb,
lib/tree_haver/backends/mri.rb,
lib/tree_haver/backends/java.rb,
lib/tree_haver/backends/rust.rb,
lib/tree_haver/base/language.rb,
lib/tree_haver/backends/prism.rb,
lib/tree_haver/backends/psych.rb,
lib/tree_haver/grammar_finder.rb,
lib/tree_haver/path_validator.rb,
lib/tree_haver/backends/citrus.rb,
lib/tree_haver/backend_registry.rb,
lib/tree_haver/backends/parslet.rb,
lib/tree_haver/language_registry.rb,
lib/tree_haver/library_path_utils.rb,
lib/tree_haver/rspec/testable_node.rb,
lib/tree_haver/citrus_grammar_finder.rb,
lib/tree_haver/rspec/dependency_tags.rb,
lib/tree_haver/parslet_grammar_finder.rb

Overview

TreeHaver is a cross-Ruby adapter for code parsing with 10 backends.

Provides a unified API for parsing source code across MRI Ruby, JRuby, and TruffleRuby using tree-sitter grammars or language-specific native parsers.

Backends

Supports 9 backends:

  • Tree-sitter: MRI ©, Rust, FFI, Java

  • Native parsers: Prism (Ruby), Psych (YAML), Commonmarker (Markdown), Markly (GFM)

  • Pure Ruby: Citrus (portable fallback)

Platform Compatibility

Not all backends work on all Ruby platforms:

| Backend      | MRI | JRuby | TruffleRuby |
|--------------|-----|-------|-------------|
| MRI (C ext)  | ✓   | ✗     | ✗           |
| Rust         | ✓   | ✗     | ✗           |
| FFI          | ✓   | ✓     | ✗           |
| Java         | ✗   | ✓     | ✗           |
| Prism        | ✓   | ✓     | ✓           |
| Psych        | ✓   | ✓     | ✓           |
| Citrus       | ✓   | ✓     | ✓           |
| Commonmarker | ✓   | ✗     | ?           |
| Markly       | ✓   | ✗     | ?           |
  • JRuby: Cannot load native C/Rust extensions; use FFI, Java, or pure Ruby backends

  • TruffleRuby: FFI doesn’t support STRUCT_BY_VALUE; magnus/rb-sys incompatible with C API; use Prism, Psych, Citrus, or potentially Commonmarker/Markly

Examples:

Basic usage with tree-sitter

# Load a language grammar
language = TreeHaver::Language.from_library(
  "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml"
)

# Create and configure a parser
parser = TreeHaver::Parser.new
parser.language = language

# Parse source code
tree = parser.parse("[package]\nname = \"my-app\"")
root = tree.root_node

# Use unified Position API (works across all backends)
puts root.start_line      # => 1 (1-based)
puts root.source_position # => {start_line:, end_line:, start_column:, end_column:}

Using language-specific backends

# Parse Ruby with Prism
TreeHaver.backend = :prism
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Backends::Prism::Language.ruby
tree = parser.parse("class Example; end")

# Parse YAML with Psych
TreeHaver.backend = :psych
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Backends::Psych::Language.yaml
tree = parser.parse("key: value")

# Parse Markdown with Commonmarker
TreeHaver.backend = :commonmarker
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Backends::Commonmarker::Language.markdown
tree = parser.parse("# Heading\nParagraph")

Using language registration

TreeHaver.register_language(:toml, path: "/usr/local/lib/libtree-sitter-toml.so")
language = TreeHaver::Language.toml

Using GrammarFinder for automatic discovery

# GrammarFinder automatically locates grammar libraries on the system
finder = TreeHaver::GrammarFinder.new(:toml)
finder.register! if finder.available?
language = TreeHaver::Language.toml

Selecting a backend

TreeHaver.backend = :mri          # Force MRI (ruby_tree_sitter)
TreeHaver.backend = :rust         # Force Rust (tree_stump)
TreeHaver.backend = :ffi          # Force FFI
TreeHaver.backend = :java         # Force Java (JRuby)
TreeHaver.backend = :prism        # Force Prism (Ruby)
TreeHaver.backend = :psych        # Force Psych (YAML)
TreeHaver.backend = :commonmarker # Force Commonmarker (Markdown)
TreeHaver.backend = :markly       # Force Markly (GFM)
TreeHaver.backend = :citrus       # Force Citrus (pure Ruby)
TreeHaver.backend = :auto         # Auto-select (default)

See Also:

Defined Under Namespace

Modules: BackendAPI, BackendRegistry, Backends, Base, Language, LanguageRegistry, LibraryPathUtils, PathValidator, RSpec, Version Classes: BackendConflict, CitrusGrammarFinder, Error, GrammarFinder, Node, NotAvailable, Parser, ParsletGrammarFinder, Tree

Constant Summary collapse

CITRUS_DEFAULTS =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Default Citrus configurations for known languages

These are used by parser_for when no explicit citrus_config is provided and tree-sitter backends are not available (e.g., on TruffleRuby).

{
  toml: {
    gem_name: "toml-rb",
    grammar_const: "TomlRB::Document",
    require_path: "toml-rb",
  },
}.freeze
PARSLET_DEFAULTS =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

Default Parslet configurations for known languages

These are used by parser_for when no explicit parslet_config is provided and tree-sitter backends are not available (e.g., on TruffleRuby).

{
  toml: {
    gem_name: "toml",
    grammar_const: "TOML::Parslet",
    require_path: "toml",
  },
}.freeze
NATIVE_BACKENDS =

Native tree-sitter backends that support loading shared libraries (.so files) These backends wrap the tree-sitter C library via various bindings. Pure Ruby backends (Citrus, Prism, Psych, Commonmarker, Markly) are excluded.

%i[mri rust ffi java].freeze
VALID_NATIVE_BACKENDS =

Valid native backend names (require native extensions)

%w[mri rust ffi java].freeze
VALID_RUBY_BACKENDS =

Valid pure Ruby backend names (no native extensions)

%w[citrus parslet prism psych commonmarker markly].freeze
VALID_BACKENDS =

All valid backend names

(VALID_NATIVE_BACKENDS + VALID_RUBY_BACKENDS + %w[auto none]).freeze
Point =

Point class that works as both a Hash and an object with row/column accessors

This provides compatibility with code expecting either:

  • Hash access: point, point

  • Method access: point.row, point.column

TreeHaver::Point is an alias for TreeHaver::Base::Point, which is a Struct providing all the necessary functionality.

Examples:

Method access

point = TreeHaver::Point.new(5, 10)
point.row    # => 5
point.column # => 10

Hash-like access

point[:row]    # => 5
point[:column] # => 10

Converting to hash

point.to_h # => {row: 5, column: 10}

See Also:

Base::Point
VERSION =

Traditional location for VERSION constant

Version::VERSION

Class Method Summary collapse

Class Method Details

.allowed_native_backendsArray<Symbol>

Get allowed native backends from TREE_HAVER_NATIVE_BACKEND environment variable

Supports comma-separated values like “mri,ffi”. Special values:

  • “auto” or empty/unset: automatically select from available native backends

  • “none”: no native backends allowed (pure Ruby only)

Examples:

Allow only MRI and FFI

# TREE_HAVER_NATIVE_BACKEND=mri,ffi
TreeHaver.allowed_native_backends  # => [:mri, :ffi]

Auto-select native backends (default)

# TREE_HAVER_NATIVE_BACKEND not set, empty, or "auto"
TreeHaver.allowed_native_backends  # => [:auto]

Disable all native backends

# TREE_HAVER_NATIVE_BACKEND=none
TreeHaver.allowed_native_backends  # => [:none]


423
424
425
# File 'lib/tree_haver.rb', line 423

def allowed_native_backends
  @allowed_native_backends ||= parse_backend_list_env("TREE_HAVER_NATIVE_BACKEND", VALID_NATIVE_BACKENDS) # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.allowed_ruby_backendsArray<Symbol>

Get allowed Ruby backends from TREE_HAVER_RUBY_BACKEND environment variable

Supports comma-separated values like “citrus,prism”. Special values:

  • “auto” or empty/unset: automatically select from available Ruby backends

  • “none”: no Ruby backends allowed (native only)

Examples:

Allow only Citrus

# TREE_HAVER_RUBY_BACKEND=citrus
TreeHaver.allowed_ruby_backends  # => [:citrus]

Auto-select Ruby backends (default)

# TREE_HAVER_RUBY_BACKEND not set, empty, or "auto"
TreeHaver.allowed_ruby_backends  # => [:auto]


441
442
443
# File 'lib/tree_haver.rb', line 441

def allowed_ruby_backends
  @allowed_ruby_backends ||= parse_backend_list_env("TREE_HAVER_RUBY_BACKEND", VALID_RUBY_BACKENDS) # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.backendObject

Examples:

TreeHaver.backend  # => :auto


391
392
393
394
395
# File 'lib/tree_haver.rb', line 391

def backend
  return @backend if defined?(@backend) && @backend # rubocop:disable ThreadSafety/ClassInstanceVariable

  @backend = parse_single_backend_env # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.backend=(name) ⇒ Symbol?

Set the backend to use

Examples:

Force FFI backend

TreeHaver.backend = :ffi

Force Rust backend

TreeHaver.backend = :rust


488
489
490
# File 'lib/tree_haver.rb', line 488

def backend=(name)
  @backend = name&.to_sym # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.backend_allowed?(backend_name) ⇒ Boolean

Check if a specific backend is allowed based on environment variables

Checks TREE_HAVER_NATIVE_BACKEND for native backends and TREE_HAVER_RUBY_BACKEND for pure Ruby backends.

Examples:

# TREE_HAVER_NATIVE_BACKEND=mri
TreeHaver.backend_allowed?(:mri)    # => true
TreeHaver.backend_allowed?(:ffi)    # => false
TreeHaver.backend_allowed?(:citrus) # => true (Ruby backends use separate env var)


457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
# File 'lib/tree_haver.rb', line 457

def backend_allowed?(backend_name)
  backend_sym = backend_name.to_sym

  # Check if it's a native backend
  if VALID_NATIVE_BACKENDS.include?(backend_sym.to_s)
    allowed = allowed_native_backends
    return true if allowed == [:auto]
    return false if allowed == [:none]
    return allowed.include?(backend_sym)
  end

  # Check if it's a Ruby backend
  if VALID_RUBY_BACKENDS.include?(backend_sym.to_s)
    allowed = allowed_ruby_backends
    return true if allowed == [:auto]
    return false if allowed == [:none]
    return allowed.include?(backend_sym)
  end

  # Unknown backend or :auto - allow
  true
end

.backend_moduleModule?

Determine the concrete backend module to use

This method performs backend auto-selection when backend is :auto. On JRuby, prefers Java backend if available, then FFI, then Citrus. On MRI, prefers MRI backend if available, then Rust, then FFI, then Citrus. Citrus is the final fallback as it’s pure Ruby and works everywhere.

Examples:

mod = TreeHaver.backend_module
if mod
  puts "Using #{mod.capabilities[:backend]} backend"
end


886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
# File 'lib/tree_haver.rb', line 886

def backend_module
  requested = effective_backend  # Changed from: backend

  # For explicit backends (not :auto), check for conflicts first
  # If the backend is blocked, fall through to auto-select
  if requested != :auto && backend_protect?
    conflicts = conflicting_backends_for(requested)
    unless conflicts.empty?
      # The explicitly requested backend is blocked - fall through to auto-select
      requested = :auto
    end
  end

  case requested
  when :mri
    Backends::MRI
  when :rust
    Backends::Rust
  when :ffi
    Backends::FFI
  when :java
    Backends::Java
  when :citrus
    Backends::Citrus
  when :parslet
    Backends::Parslet
  when :prism
    Backends::Prism
  when :psych
    Backends::Psych
  else
    # auto-select: prefer native/fast backends, fall back to pure Ruby (Citrus)
    # Each backend must be both allowed (by ENV) and available (gem installed)
    if defined?(RUBY_ENGINE) && RUBY_ENGINE == "jruby" && backend_allowed?(:java) && Backends::Java.available?
      Backends::Java
    elsif defined?(RUBY_ENGINE) && RUBY_ENGINE == "ruby" && backend_allowed?(:mri) && Backends::MRI.available?
      Backends::MRI
    elsif defined?(RUBY_ENGINE) && RUBY_ENGINE == "ruby" && backend_allowed?(:rust) && Backends::Rust.available?
      Backends::Rust
    elsif backend_allowed?(:ffi) && Backends::FFI.available?
      Backends::FFI
    elsif backend_allowed?(:citrus) && Backends::Citrus.available?
      Backends::Citrus  # Pure Ruby fallback
    else
      # No backend available
      nil
    end
  end
end

.backend_protectObject

Alias for backend_protect?



343
344
345
# File 'lib/tree_haver.rb', line 343

def backend_protect
  backend_protect?
end

.backend_protect=(value) ⇒ Boolean

Whether backend conflict protection is enabled

When true (default), TreeHaver will raise BackendConflict if you try to use a backend that is known to conflict with a previously used backend. For example, FFI will not work after MRI has been used.

Set to false to disable protection (useful for testing compatibility).

Examples:

Disable protection for testing

TreeHaver.backend_protect = false


329
330
331
332
# File 'lib/tree_haver.rb', line 329

def backend_protect=(value)
  @backend_protect_mutex ||= Mutex.new
  @backend_protect_mutex.synchronize { @backend_protect = value }
end

.backend_protect?Boolean

Check if backend conflict protection is enabled



337
338
339
340
# File 'lib/tree_haver.rb', line 337

def backend_protect?
  return @backend_protect if defined?(@backend_protect) # rubocop:disable ThreadSafety/ClassInstanceVariable
  true  # Default is protected
end

.backends_usedSet<Symbol>

Track which backends have been used in this process



350
351
352
# File 'lib/tree_haver.rb', line 350

def backends_used
  @backends_used ||= Set.new # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.builtin_backends_registered?Boolean

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Check if built-in backends have been registered



543
544
545
# File 'lib/tree_haver.rb', line 543

def builtin_backends_registered?
  @builtin_backends_registered ||= false # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.capabilitiesHash{Symbol => Object}

Get capabilities of the current backend

Returns a hash describing what features the selected backend supports. Common keys include:

  • :backend - Symbol identifying the backend (:mri, :rust, :ffi, :java)

  • :parse - Whether parsing is implemented

  • :query - Whether the Query API is available

  • :bytes_field - Whether byte position fields are available

  • :incremental - Whether incremental parsing is supported

Examples:

TreeHaver.capabilities
# => { backend: :mri, query: true, bytes_field: true }


950
951
952
953
954
# File 'lib/tree_haver.rb', line 950

def capabilities
  mod = backend_module
  return {} unless mod
  mod.capabilities
end

.check_backend_conflict!(backend) ⇒ void

This method returns an undefined value.

Check if using a backend would cause a conflict

Raises:



377
378
379
380
381
382
383
384
385
386
387
# File 'lib/tree_haver.rb', line 377

def check_backend_conflict!(backend)
  return unless backend_protect?

  conflicts = conflicting_backends_for(backend)
  return if conflicts.empty?

  raise BackendConflict,
    "Cannot use #{backend} backend: it is blocked by previously used backend(s): #{conflicts.join(", ")}. " \
      "The #{backend} backend will segfault when #{conflicts.first} has already loaded. " \
      "To disable this protection (at risk of segfaults), set TreeHaver.backend_protect = false"
end

.conflicting_backends_for(backend) ⇒ Array<Symbol>

Check if a backend would conflict with previously used backends



367
368
369
370
# File 'lib/tree_haver.rb', line 367

def conflicting_backends_for(backend)
  blockers = Backends::BLOCKED_BY[backend] || []
  blockers & backends_used.to_a
end

.current_backend_contextHash{Symbol => Object}

Thread-local backend context storage

Returns a hash containing the thread-local backend context with keys:

  • :backend - The backend name (Symbol) or nil if using global default

  • :depth - The nesting depth (Integer) for proper cleanup

Examples:

ctx = TreeHaver.current_backend_context
ctx[:backend]  # => nil or :ffi, :mri, etc.
ctx[:depth]    # => 0, 1, 2, etc.


612
613
614
615
616
617
# File 'lib/tree_haver.rb', line 612

def current_backend_context
  Thread.current[:tree_haver_backend_context] ||= {
    backend: nil,  # nil means "use global default"
    depth: 0,       # Track nesting depth for proper cleanup
  }
end

.effective_backendSymbol

Get the effective backend for current context

Priority: thread-local context → global @backend → :auto

Examples:

TreeHaver.effective_backend  # => :auto (default)

With thread-local context

TreeHaver.with_backend(:ffi) do
  TreeHaver.effective_backend  # => :ffi
end


630
631
632
633
# File 'lib/tree_haver.rb', line 630

def effective_backend
  ctx = current_backend_context
  ctx[:backend] || backend || :auto
end

.ensure_builtin_backends_registered!void

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

This method returns an undefined value.

Ensure built-in backends are registered (idempotent)



551
552
553
554
555
# File 'lib/tree_haver.rb', line 551

def ensure_builtin_backends_registered!
  return if builtin_backends_registered?
  register_builtin_backends!
  @builtin_backends_registered = true # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.parse_backend_list_env(env_var, valid_backends) ⇒ Array<Symbol>

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Parse a backend list environment variable



577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
# File 'lib/tree_haver.rb', line 577

def parse_backend_list_env(env_var, valid_backends)
  env_value = ENV[env_var]

  # Empty or unset means "auto"
  return [:auto] if env_value.nil? || env_value.strip.empty?

  normalized = env_value.strip.downcase

  # Handle special values
  return [:auto] if normalized == "auto"
  return [:none] if normalized == "none"

  # Split on comma and parse each backend
  backends = normalized.split(",").map(&:strip).uniq

  # Convert to symbols, filtering out invalid ones
  parsed = backends.filter_map do |name|
    valid_backends.include?(name) ? name.to_sym : nil
  end

  # Return :auto if no valid backends found
  parsed.empty? ? [:auto] : parsed
end

.parse_single_backend_envSymbol

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Parse TREE_HAVER_BACKEND environment variable (single backend)



561
562
563
564
565
566
567
568
569
# File 'lib/tree_haver.rb', line 561

def parse_single_backend_env
  env_value = ENV["TREE_HAVER_BACKEND"]
  return :auto if env_value.nil? || env_value.strip.empty?

  name = env_value.strip.downcase
  return :auto unless VALID_BACKENDS.include?(name) && name != "all" && name != "none"

  name.to_sym
end

.parser_for(language_name, library_path: nil, symbol: nil, citrus_config: nil, parslet_config: nil) ⇒ TreeHaver::Parser

Create a parser configured for a specific language

Respects the effective backend setting (via TREE_HAVER_BACKEND env var, TreeHaver.backend=, or with_backend block).

Supports four types of backends:

  1. Tree-sitter native backends (auto-discovered or explicit path)

  2. Citrus grammars (pure Ruby, via CITRUS_DEFAULTS or explicit config)

  3. Parslet grammars (pure Ruby, via PARSLET_DEFAULTS or explicit config)

  4. Pure Ruby backends (registered via backend_module, e.g., Prism, Psych, RBS)

Examples:

Basic usage (auto-discovers grammar)

parser = TreeHaver.parser_for(:toml)

Force Citrus backend

TreeHaver.with_backend(:citrus) { TreeHaver.parser_for(:toml) }

Force Parslet backend

TreeHaver.with_backend(:parslet) { TreeHaver.parser_for(:toml) }

Use registered pure Ruby backend (e.g., RBS)

# First, rbs-merge registers its backend:
# TreeHaver.register_language(:rbs, backend_module: Rbs::Merge::RbsBackend, backend_type: :rbs)
parser = TreeHaver.parser_for(:rbs)

Raises:



1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
# File 'lib/tree_haver.rb', line 1156

def parser_for(language_name, library_path: nil, symbol: nil, citrus_config: nil, parslet_config: nil)
  # Ensure built-in pure Ruby backends are registered
  ensure_builtin_backends_registered!

  name = language_name.to_sym
  symbol ||= "tree_sitter_#{name}"
  requested = effective_backend

  # Determine which backends to try based on effective_backend
  # When a specific backend is requested, only try that backend
  try_tree_sitter = (requested == :auto) || NATIVE_BACKENDS.include?(requested)
  try_citrus = (requested == :auto) || (requested == :citrus)
  try_parslet = (requested == :auto) || (requested == :parslet)

  # When Citrus or Parslet is explicitly requested, don't try tree-sitter
  if requested == :citrus || requested == :parslet
    try_tree_sitter = false
  end

  language = nil

  # First, check for registered pure Ruby backends
  # These take precedence when explicitly requested or when no other backend is available
  registration = registered_language(name)
  # Find any registered backend_module (not tree_sitter, citrus, or parslet)
  registration&.each do |backend_type, config|
    next if %i[tree_sitter citrus parslet].include?(backend_type)
    next unless config[:backend_module]

    backend_mod = config[:backend_module]
    # Check if this backend is available
    next unless backend_mod.respond_to?(:available?) && backend_mod.available?

    # If a specific backend was requested, only use if it matches
    next if requested != :auto && requested != backend_type

    # Create parser from the backend module
    if backend_mod.const_defined?(:Parser)
      parser = backend_mod::Parser.new
      if backend_mod.const_defined?(:Language)
        lang_class = backend_mod::Language
        # Try to get language by name (e.g., Language.ruby, Language.yaml, Language.rbs)
        if lang_class.respond_to?(name)
          parser.language = lang_class.public_send(name)
        elsif lang_class.respond_to?(:from_library)
          parser.language = lang_class.from_library(nil, name: name)
        end
      end
      return parser
    end
  end

  # Try tree-sitter if applicable
  if try_tree_sitter && !language
    language = load_tree_sitter_language(name, library_path: library_path, symbol: symbol)
  end

  # Try Citrus if applicable
  if try_citrus && !language
    language = load_citrus_language(name, citrus_config: citrus_config)
  end

  # Try Parslet if applicable
  if try_parslet && !language
    language = load_parslet_language(name, parslet_config: parslet_config)
  end

  # Raise if nothing worked
  raise NotAvailable, "No parser available for #{name}. " \
    "Install tree-sitter-#{name} or configure a Citrus/Parslet grammar." unless language

  # Create and configure parser
  parser = Parser.new
  parser.language = language
  parser
end

.record_backend_usage(backend) ⇒ void

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

This method returns an undefined value.

Record that a backend has been used



359
360
361
# File 'lib/tree_haver.rb', line 359

def record_backend_usage(backend)
  backends_used << backend
end

.register_backend(name, mod) ⇒ void

This method returns an undefined value.

Register a backend module

Allows external gems to register their backend implementation so it can be found by TreeHaver.backend = :name and other lookup methods.



1101
1102
1103
1104
# File 'lib/tree_haver.rb', line 1101

def register_backend(name, mod)
  @backend_registry ||= {}
  @backend_registry[name.to_sym] = mod
end

.register_builtin_backends!void

This method returns an undefined value.

Register built-in pure Ruby backends in the LanguageRegistry

This registers Prism, Psych, Commonmarker, and Markly using the same registration API that external backends use. This ensures consistent behavior whether a backend is built-in or provided by an external gem.

Called automatically when TreeHaver is first used, but can be called manually in tests or when reset! has cleared the registry.

Examples:

Manual registration (usually not needed)

TreeHaver.register_builtin_backends!


520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
# File 'lib/tree_haver.rb', line 520

def register_builtin_backends!
  Backends::PURE_RUBY_BACKENDS.each do |backend_type, info|
    language = info[:language]
    module_name = info[:module_name]

    # Get the backend module
    backend_mod = Backends.const_get(module_name)
    next unless backend_mod

    # Register if available (lazy check - doesn't require the gem yet)
    LanguageRegistry.register(
      language,
      backend_type,
      backend_module: backend_mod,
      gem_name: module_name.downcase,
    )
  end
end

.register_language(name, path: nil, symbol: nil, grammar_module: nil, grammar_class: nil, backend_module: nil, backend_type: nil, gem_name: nil) ⇒ void

This method returns an undefined value.

Register a language helper by name (backend-agnostic)

After registration, you can use dynamic helpers like TreeHaver::Language.toml to load the registered language. TreeHaver will automatically use the appropriate grammar based on the active backend.

The name parameter is an arbitrary identifier you choose - it doesn’t need to match the actual language name. This is useful for:

  • Testing: Use unique names like :toml_test to avoid collisions

  • Aliasing: Register the same grammar under multiple names

  • Versioning: Register different grammar versions as :ruby_2 and :ruby_3

The actual grammar identity comes from path/symbol (tree-sitter) or grammar_module (Citrus), not from the name.

IMPORTANT: This method INTENTIONALLY allows registering BOTH a tree-sitter library AND a Citrus grammar for the same language IN A SINGLE CALL. This is achieved by using separate if statements (not elsif) and no early returns. This design is deliberate and provides significant benefits:

Why register both backends for one language?

  • Backend flexibility: Code works regardless of which backend is active

  • Performance testing: Compare tree-sitter vs Citrus performance

  • Gradual migration: Transition between backends without breaking code

  • Fallback scenarios: Use Citrus when tree-sitter library unavailable

  • Platform portability: tree-sitter on Linux/Mac, Citrus on JRuby/Windows

The active backend determines which registration is used automatically. No code changes needed to switch backends - just change TreeHaver.backend.

Examples:

Register tree-sitter grammar only

TreeHaver.register_language(
  :toml,
  path: "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml"
)

Register Citrus grammar only

TreeHaver.register_language(
  :toml,
  grammar_module: TomlRB::Document,
  gem_name: "toml-rb"
)

Register Parslet grammar only

TreeHaver.register_language(
  :toml,
  grammar_class: TOML::Parslet,
  gem_name: "toml"
)

Register pure Ruby backend (external gem like rbs-merge)

TreeHaver.register_language(
  :rbs,
  backend_module: Rbs::Merge::Backends::RbsBackend,
  backend_type: :rbs,
  gem_name: "rbs"
)

Register BOTH backends in separate calls

TreeHaver.register_language(
  :toml,
  path: "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml"
)
TreeHaver.register_language(
  :toml,
  grammar_module: TomlRB::Document,
  gem_name: "toml-rb"
)

Register BOTH backends in ONE call (recommended for maximum flexibility)

TreeHaver.register_language(
  :toml,
  path: "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml",
  grammar_module: TomlRB::Document,
  gem_name: "toml-rb"
)
# Now TreeHaver::Language.toml works with ANY backend!


1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
# File 'lib/tree_haver.rb', line 1045

def register_language(name, path: nil, symbol: nil, grammar_module: nil, grammar_class: nil, backend_module: nil, backend_type: nil, gem_name: nil)
  # Register tree-sitter backend if path provided
  # Note: Uses `if` not `elsif` so both backends can be registered in one call
  if path
    LanguageRegistry.register(name, :tree_sitter, path: path, symbol: symbol)
  end

  # Register Citrus backend if grammar_module provided
  # Note: Uses `if` not `elsif` so both backends can be registered in one call
  # This allows maximum flexibility - register once, use with any backend
  if grammar_module
    unless grammar_module.respond_to?(:parse)
      raise ArgumentError, "Grammar module must respond to :parse"
    end

    LanguageRegistry.register(name, :citrus, grammar_module: grammar_module, gem_name: gem_name)
  end

  # Register Parslet backend if grammar_class provided
  # Note: Uses `if` not `elsif` so multiple backends can be registered in one call
  if grammar_class
    unless grammar_class.respond_to?(:new)
      raise ArgumentError, "Grammar class must respond to :new"
    end

    LanguageRegistry.register(name, :parslet, grammar_class: grammar_class, gem_name: gem_name)
  end

  # Register pure Ruby backend if backend_module provided
  # This is used by external gems (like rbs-merge) to register their own backends
  if backend_module
    # Derive backend_type from module name if not provided
    type = backend_type || backend_module.name.split("::").last.downcase.to_sym
    LanguageRegistry.register(name, type, backend_module: backend_module, gem_name: gem_name)
  end

  # Require at least one backend to be registered
  if path.nil? && grammar_module.nil? && grammar_class.nil? && backend_module.nil?
    raise ArgumentError, "Must provide at least one of: path (tree-sitter), grammar_module (Citrus), grammar_class (Parslet), or backend_module (pure Ruby)"
  end

  # Note: No early return! This method intentionally processes all `if` blocks
  # above to allow registering multiple backends for the same language.
  # tree-sitter, Citrus, and Parslet can be registered simultaneously for maximum
  # flexibility. See method documentation for rationale.
  nil
end

.registered_backend(name) ⇒ Module?

Get a registered backend module



1110
1111
1112
1113
# File 'lib/tree_haver.rb', line 1110

def registered_backend(name)
  @backend_registry ||= {}
  @backend_registry[name.to_sym]
end

.registered_language(name) ⇒ Hash?

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Fetch a registered language entry



1120
1121
1122
# File 'lib/tree_haver.rb', line 1120

def registered_language(name)
  LanguageRegistry.registered(name)
end

.reset_backend!(to: :auto) ⇒ void

This method returns an undefined value.

Reset backend selection memoization

Primarily useful in tests to switch backends without cross-example leakage.

Examples:

Reset to auto-selection

TreeHaver.reset_backend!

Reset to specific backend

TreeHaver.reset_backend!(to: :ffi)


502
503
504
505
506
# File 'lib/tree_haver.rb', line 502

def reset_backend!(to: :auto)
  @backend = to&.to_sym # rubocop:disable ThreadSafety/ClassInstanceVariable
  @allowed_native_backends = nil # rubocop:disable ThreadSafety/ClassInstanceVariable
  @allowed_ruby_backends = nil # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.resolve_backend_module(explicit_backend = nil) ⇒ Module?

Get backend module for a specific backend (with explicit override)

Examples:

mod = TreeHaver.resolve_backend_module(:ffi)
mod.capabilities[:backend]  # => :ffi

Raises:

  • (BackendConflict)

    if the backend conflicts with previously used backends



752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
# File 'lib/tree_haver.rb', line 752

def resolve_backend_module(explicit_backend = nil)
  # Temporarily override effective backend
  requested = resolve_effective_backend(explicit_backend)

  mod = case requested
  when :mri
    Backends::MRI
  when :rust
    Backends::Rust
  when :ffi
    Backends::FFI
  when :java
    Backends::Java
  when :citrus
    Backends::Citrus
  when :parslet
    Backends::Parslet
  when :prism
    Backends::Prism
  when :psych
    Backends::Psych
  when :auto
    backend_module  # Fall back to normal resolution for :auto
  else
    # Check if this is a registered plugin backend
    registered = registered_backend(requested)
    return registered if registered

    # Unknown backend name - return nil to trigger error in caller
    nil
  end

  # Return nil if the module doesn't exist
  return unless mod

  # Check if the backend is allowed by environment variables FIRST
  # This enforces TREE_HAVER_NATIVE_BACKEND and TREE_HAVER_RUBY_BACKEND as hard restrictions
  return if requested && requested != :auto && !backend_allowed?(requested)

  # Check for backend conflicts, before checking availability
  # This is critical because the conflict causes the backend to report unavailable
  # We want to raise a clear error explaining WHY it's unavailable
  # Use the requested backend name directly (not capabilities) because
  # capabilities may be empty when the backend is blocked/unavailable
  check_backend_conflict!(requested) if requested && requested != :auto

  # Now check if the backend is available
  # Why assume modules without available? are available?
  # - Some backends might be mocked in tests without an available? method
  # - This makes the code more defensive and test-friendly
  # - It allows graceful degradation if a backend module is incomplete
  # - Backward compatibility: if a module doesn't declare availability, assume it works
  return if mod.respond_to?(:available?) && !mod.available?

  # Record that this backend is being used
  record_backend_usage(requested) if requested && requested != :auto

  mod
end

.resolve_effective_backend(explicit_backend = nil) ⇒ Symbol

Resolve the effective backend considering explicit override

Priority: explicit > thread context > global > :auto

Examples:

TreeHaver.resolve_effective_backend(:ffi)  # => :ffi

With thread-local context

TreeHaver.with_backend(:mri) do
  TreeHaver.resolve_effective_backend(nil)  # => :mri
  TreeHaver.resolve_effective_backend(:ffi)  # => :ffi (explicit wins)
end


739
740
741
742
# File 'lib/tree_haver.rb', line 739

def resolve_effective_backend(explicit_backend = nil)
  return explicit_backend.to_sym if explicit_backend
  effective_backend
end

.resolve_native_backend_module(explicit_backend = nil) ⇒ Module?

Resolve a native tree-sitter backend module (for from_library)

This method is similar to resolve_backend_module but ONLY considers backends that support loading shared libraries (.so files):

  • MRI (ruby_tree_sitter C extension)

  • Rust (tree_stump)

  • FFI (ffi gem with libtree-sitter)

  • Java (jtreesitter on JRuby)

Pure Ruby backends (Citrus, Prism, Psych, Commonmarker, Markly) are NOT considered because they don’t support from_library.

Raises:

  • (BackendConflict)

    if the backend conflicts with previously used backends



827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
# File 'lib/tree_haver.rb', line 827

def resolve_native_backend_module(explicit_backend = nil)
  # Short-circuit on TruffleRuby: no native backends work
  # - MRI: C extension, MRI only
  # - Rust: magnus requires MRI's C API
  # - FFI: STRUCT_BY_VALUE not supported
  # - Java: requires JRuby's Java interop
  if defined?(RUBY_ENGINE) && RUBY_ENGINE == "truffleruby"
    return unless explicit_backend # Auto-select: no backends available
    # If explicit backend requested, let it fail with proper error below
  end

  # Get the effective backend (considers thread-local and global settings)
  requested = resolve_effective_backend(explicit_backend)

  # If the effective backend is a native backend, use it
  if NATIVE_BACKENDS.include?(requested)
    return resolve_backend_module(requested)
  end

  # If a specific non-native backend was explicitly requested, return nil
  # (from_library only works with native backends that load .so files)
  return if explicit_backend

  # If effective backend is :auto, auto-select from native backends in priority order
  # Note: non-native backends set via with_backend are NOT used here because
  # from_library only works with native backends
  native_priority = if defined?(RUBY_ENGINE) && RUBY_ENGINE == "jruby"
    %i[java ffi] # JRuby: Java first, then FFI
  else
    %i[mri rust ffi] # MRI: MRI first, then Rust, then FFI
  end

  native_priority.each do |backend|
    # Rescue BackendConflict to allow iteration to continue
    # This enables graceful fallback when a backend is blocked

    mod = resolve_backend_module(backend)
    return mod if mod
  rescue BackendConflict
    # This backend is blocked by a previously used backend, try the next one
    next
  end

  nil # No native backend available
end

.with_backend(name) { ... } ⇒ Object

Execute a block with a specific backend in thread-local context

This method provides temporary, thread-safe backend switching for a block of code. The backend setting is automatically restored when the block exits, even if an exception is raised. Supports nesting—inner blocks override outer blocks, and each level is properly unwound.

Thread Safety: Each thread maintains its own backend context, so concurrent threads can safely use different backends without interfering with each other.

Use Cases:

  • Testing: Test the same code path with different backends

  • Performance comparison: Benchmark parsing with different backends

  • Fallback scenarios: Try one backend, fall back to another on failure

  • Thread isolation: Different threads can use different backends safely

Examples:

Basic usage

TreeHaver.with_backend(:mri) do
  parser = TreeHaver::Parser.new
  tree = parser.parse(source)
end
# Backend is automatically restored here

Nested blocks (inner overrides outer)

TreeHaver.with_backend(:rust) do
  parser1 = TreeHaver::Parser.new  # Uses :rust
  TreeHaver.with_backend(:citrus) do
    parser2 = TreeHaver::Parser.new  # Uses :citrus
  end
  parser3 = TreeHaver::Parser.new  # Back to :rust
end

Testing multiple backends

[:mri, :rust, :citrus].each do |backend_name|
  TreeHaver.with_backend(backend_name) do
    parser = TreeHaver::Parser.new
    result = parser.parse(source)
    puts "#{backend_name}: #{result.root_node.type}"
  end
end

Exception safety (backend restored even on error)

TreeHaver.with_backend(:mri) do
  raise "Something went wrong"
rescue
  # Handle error
end
# Backend is still restored to its previous value

Thread isolation

threads = [:mri, :rust].map do |backend_name|
  Thread.new do
    TreeHaver.with_backend(backend_name) do
      # Each thread uses its own backend independently
      TreeHaver::Parser.new
    end
  end
end
threads.each(&:join)

Yields:

  • block to execute with the specified backend

Raises:

  • (ArgumentError)

    if backend name is nil

  • (BackendConflict)

    if the requested backend conflicts with a previously used backend

See Also:

  • #effective_backend
  • #current_backend_context


703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
# File 'lib/tree_haver.rb', line 703

def with_backend(name)
  raise ArgumentError, "Backend name required" if name.nil?

  # Get context FIRST to ensure it exists
  ctx = current_backend_context
  old_backend = ctx[:backend]
  old_depth = ctx[:depth]

  begin
    # Set new backend and increment depth
    ctx[:backend] = name.to_sym
    ctx[:depth] += 1

    # Execute block
    yield
  ensure
    # Restore previous backend and depth
    # This ensures proper unwinding even with exceptions
    ctx[:backend] = old_backend
    ctx[:depth] = old_depth
  end
end