Module: TreeHaver

Defined in:
lib/tree_haver.rb,
lib/tree_haver/node.rb,
lib/tree_haver/tree.rb,
lib/tree_haver/version.rb,
lib/tree_haver/backends/ffi.rb,
lib/tree_haver/backends/mri.rb,
lib/tree_haver/backends/java.rb,
lib/tree_haver/backends/rust.rb,
lib/tree_haver/backends/prism.rb,
lib/tree_haver/backends/psych.rb,
lib/tree_haver/grammar_finder.rb,
lib/tree_haver/path_validator.rb,
lib/tree_haver/backends/citrus.rb,
lib/tree_haver/backends/markly.rb,
lib/tree_haver/language_registry.rb,
lib/tree_haver/backends/commonmarker.rb,
lib/tree_haver/citrus_grammar_finder.rb

Overview

TreeHaver is a cross-Ruby adapter for code parsing with 10 backends.

Provides a unified API for parsing source code across MRI Ruby, JRuby, and TruffleRuby using tree-sitter grammars or language-specific native parsers.

Supports 10 backends:

  • Tree-sitter: MRI ©, Rust, FFI, Java

  • Native parsers: Prism (Ruby), Psych (YAML), Commonmarker (Markdown), Markly (GFM)

  • Pure Ruby: Citrus (portable fallback)

Examples:

Basic usage with tree-sitter

# Load a language grammar
language = TreeHaver::Language.from_library(
  "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml"
)

# Create and configure a parser
parser = TreeHaver::Parser.new
parser.language = language

# Parse source code
tree = parser.parse("[package]\nname = \"my-app\"")
root = tree.root_node

# Use unified Position API (works across all backends)
puts root.start_line      # => 1 (1-based)
puts root.source_position # => {start_line:, end_line:, start_column:, end_column:}

Using language-specific backends

# Parse Ruby with Prism
TreeHaver.backend = :prism
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Backends::Prism::Language.ruby
tree = parser.parse("class Example; end")

# Parse YAML with Psych
TreeHaver.backend = :psych
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Backends::Psych::Language.yaml
tree = parser.parse("key: value")

# Parse Markdown with Commonmarker
TreeHaver.backend = :commonmarker
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Backends::Commonmarker::Language.markdown
tree = parser.parse("# Heading\nParagraph")

Using language registration

TreeHaver.register_language(:toml, path: "/usr/local/lib/libtree-sitter-toml.so")
language = TreeHaver::Language.toml

Using GrammarFinder for automatic discovery

# GrammarFinder automatically locates grammar libraries on the system
finder = TreeHaver::GrammarFinder.new(:toml)
finder.register! if finder.available?
language = TreeHaver::Language.toml

Selecting a backend

TreeHaver.backend = :mri          # Force MRI (ruby_tree_sitter)
TreeHaver.backend = :rust         # Force Rust (tree_stump)
TreeHaver.backend = :ffi          # Force FFI
TreeHaver.backend = :java         # Force Java (JRuby)
TreeHaver.backend = :prism        # Force Prism (Ruby)
TreeHaver.backend = :psych        # Force Psych (YAML)
TreeHaver.backend = :commonmarker # Force Commonmarker (Markdown)
TreeHaver.backend = :markly       # Force Markly (GFM)
TreeHaver.backend = :citrus       # Force Citrus (pure Ruby)
TreeHaver.backend = :auto         # Auto-select (default)

See Also:

Defined Under Namespace

Modules: Backends, LanguageRegistry, PathValidator, Version Classes: BackendConflict, CitrusGrammarFinder, Error, GrammarFinder, Language, Node, NotAvailable, Parser, Point, Tree

Constant Summary collapse

VERSION =

Traditional location for VERSION constant

Returns:

  • (String)

    the version string

Version::VERSION

Class Method Summary collapse

Class Method Details

.backendObject

Examples:

TreeHaver.backend  # => :auto


299
300
301
302
303
304
305
306
307
308
309
310
311
312
# File 'lib/tree_haver.rb', line 299

def backend
  @backend ||= case (ENV["TREE_HAVER_BACKEND"] || :auto).to_s # rubocop:disable ThreadSafety/ClassInstanceVariable
  when "mri" then :mri
  when "rust" then :rust
  when "ffi" then :ffi
  when "java" then :java
  when "citrus" then :citrus
  when "prism" then :prism
  when "psych" then :psych
  when "commonmarker" then :commonmarker
  when "markly" then :markly
  else :auto
  end
end

.backend=(name) ⇒ Symbol?

Set the backend to use

Examples:

Force FFI backend

TreeHaver.backend = :ffi

Force Rust backend

TreeHaver.backend = :rust

Parameters:

  • name (Symbol, String, nil)

    backend name (:auto, :mri, :rust, :ffi, :java, :citrus)

Returns:

  • (Symbol, nil)

    the backend that was set



322
323
324
# File 'lib/tree_haver.rb', line 322

def backend=(name)
  @backend = name&.to_sym # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.backend_moduleModule?

Determine the concrete backend module to use

This method performs backend auto-selection when backend is :auto. On JRuby, prefers Java backend if available, then FFI, then Citrus. On MRI, prefers MRI backend if available, then Rust, then FFI, then Citrus. Citrus is the final fallback as it’s pure Ruby and works everywhere.

Examples:

mod = TreeHaver.backend_module
if mod
  puts "Using #{mod.capabilities[:backend]} backend"
end

Returns:

  • (Module, nil)

    the backend module (Backends::MRI, Backends::Rust, Backends::FFI, Backends::Java, or Backends::Citrus), or nil if none available



558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
# File 'lib/tree_haver.rb', line 558

def backend_module
  case effective_backend  # Changed from: backend
  when :mri
    Backends::MRI
  when :rust
    Backends::Rust
  when :ffi
    Backends::FFI
  when :java
    Backends::Java
  when :citrus
    Backends::Citrus
  when :prism
    Backends::Prism
  when :psych
    Backends::Psych
  when :commonmarker
    Backends::Commonmarker
  when :markly
    Backends::Markly
  else
    # auto-select: prefer native/fast backends, fall back to pure Ruby (Citrus)
    if defined?(RUBY_ENGINE) && RUBY_ENGINE == "jruby" && Backends::Java.available?
      Backends::Java
    elsif defined?(RUBY_ENGINE) && RUBY_ENGINE == "ruby" && Backends::MRI.available?
      Backends::MRI
    elsif defined?(RUBY_ENGINE) && RUBY_ENGINE == "ruby" && Backends::Rust.available?
      Backends::Rust
    elsif Backends::FFI.available?
      Backends::FFI
    elsif Backends::Citrus.available?
      Backends::Citrus  # Pure Ruby fallback
    else
      # No backend available
      nil
    end
  end
end

.backend_protectObject

Alias for backend_protect?



251
252
253
# File 'lib/tree_haver.rb', line 251

def backend_protect
  backend_protect?
end

.backend_protect=(value) ⇒ Boolean

Whether backend conflict protection is enabled

When true (default), TreeHaver will raise BackendConflict if you try to use a backend that is known to conflict with a previously used backend. For example, FFI will not work after MRI has been used.

Set to false to disable protection (useful for testing compatibility).

Examples:

Disable protection for testing

TreeHaver.backend_protect = false

Returns:

  • (Boolean)


237
238
239
240
# File 'lib/tree_haver.rb', line 237

def backend_protect=(value)
  @backend_protect_mutex ||= Mutex.new
  @backend_protect_mutex.synchronize { @backend_protect = value }
end

.backend_protect?Boolean

Check if backend conflict protection is enabled

Returns:

  • (Boolean)

    true if protection is enabled (default)



245
246
247
248
# File 'lib/tree_haver.rb', line 245

def backend_protect?
  return @backend_protect if defined?(@backend_protect) # rubocop:disable ThreadSafety/ClassInstanceVariable
  true  # Default is protected
end

.backends_usedSet<Symbol>

Track which backends have been used in this process

Returns:

  • (Set<Symbol>)

    set of backend symbols that have been used



258
259
260
# File 'lib/tree_haver.rb', line 258

def backends_used
  @backends_used ||= Set.new # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.capabilitiesHash{Symbol => Object}

Get capabilities of the current backend

Returns a hash describing what features the selected backend supports. Common keys include:

  • :backend - Symbol identifying the backend (:mri, :rust, :ffi, :java)

  • :parse - Whether parsing is implemented

  • :query - Whether the Query API is available

  • :bytes_field - Whether byte position fields are available

  • :incremental - Whether incremental parsing is supported

Examples:

TreeHaver.capabilities
# => { backend: :mri, query: true, bytes_field: true }

Returns:

  • (Hash{Symbol => Object})

    capability map, or empty hash if no backend available



611
612
613
614
615
# File 'lib/tree_haver.rb', line 611

def capabilities
  mod = backend_module
  return {} unless mod
  mod.capabilities
end

.check_backend_conflict!(backend) ⇒ void

This method returns an undefined value.

Check if using a backend would cause a conflict

Parameters:

  • backend (Symbol)

    the backend to check

Raises:



285
286
287
288
289
290
291
292
293
294
295
# File 'lib/tree_haver.rb', line 285

def check_backend_conflict!(backend)
  return unless backend_protect?

  conflicts = conflicting_backends_for(backend)
  return if conflicts.empty?

  raise BackendConflict,
    "Cannot use #{backend} backend: it is blocked by previously used backend(s): #{conflicts.join(", ")}. " \
      "The #{backend} backend will segfault when #{conflicts.first} has already loaded. " \
      "To disable this protection (at risk of segfaults), set TreeHaver.backend_protect = false"
end

.conflicting_backends_for(backend) ⇒ Array<Symbol>

Check if a backend would conflict with previously used backends

Parameters:

  • backend (Symbol)

    the backend to check

Returns:

  • (Array<Symbol>)

    list of previously used backends that block this one



275
276
277
278
# File 'lib/tree_haver.rb', line 275

def conflicting_backends_for(backend)
  blockers = Backends::BLOCKED_BY[backend] || []
  blockers & backends_used.to_a
end

.current_backend_contextHash{Symbol => Object}

Thread-local backend context storage

Returns a hash containing the thread-local backend context with keys:

  • :backend - The backend name (Symbol) or nil if using global default

  • :depth - The nesting depth (Integer) for proper cleanup

Examples:

ctx = TreeHaver.current_backend_context
ctx[:backend]  # => nil or :ffi, :mri, etc.
ctx[:depth]    # => 0, 1, 2, etc.

Returns:

  • (Hash{Symbol => Object})

    context hash with :backend and :depth keys



351
352
353
354
355
356
# File 'lib/tree_haver.rb', line 351

def current_backend_context
  Thread.current[:tree_haver_backend_context] ||= {
    backend: nil,  # nil means "use global default"
    depth: 0,       # Track nesting depth for proper cleanup
  }
end

.effective_backendSymbol

Get the effective backend for current context

Priority: thread-local context → global @backend → :auto

Examples:

TreeHaver.effective_backend  # => :auto (default)

With thread-local context

TreeHaver.with_backend(:ffi) do
  TreeHaver.effective_backend  # => :ffi
end

Returns:

  • (Symbol)

    the backend to use



369
370
371
372
# File 'lib/tree_haver.rb', line 369

def effective_backend
  ctx = current_backend_context
  ctx[:backend] || backend || :auto
end

.record_backend_usage(backend) ⇒ void

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

This method returns an undefined value.

Record that a backend has been used

Parameters:

  • backend (Symbol)

    the backend that was used



267
268
269
# File 'lib/tree_haver.rb', line 267

def record_backend_usage(backend)
  backends_used << backend
end

.register_language(name, path: nil, symbol: nil, grammar_module: nil, gem_name: nil) ⇒ void

This method returns an undefined value.

Register a language helper by name (backend-agnostic)

After registration, you can use dynamic helpers like ‘TreeHaver::Language.toml` to load the registered language. TreeHaver will automatically use the appropriate grammar based on the active backend.

The ‘name` parameter is an arbitrary identifier you choose - it doesn’t need to match the actual language name. This is useful for:

  • Testing: Use unique names like ‘:toml_test` to avoid collisions

  • Aliasing: Register the same grammar under multiple names

  • Versioning: Register different grammar versions as ‘:ruby_2` and `:ruby_3`

The actual grammar identity comes from ‘path`/`symbol` (tree-sitter) or `grammar_module` (Citrus), not from the name.

IMPORTANT: This method INTENTIONALLY allows registering BOTH a tree-sitter library AND a Citrus grammar for the same language IN A SINGLE CALL. This is achieved by using separate ‘if` statements (not `elsif`) and no early returns. This design is deliberate and provides significant benefits:

Why register both backends for one language?

  • Backend flexibility: Code works regardless of which backend is active

  • Performance testing: Compare tree-sitter vs Citrus performance

  • Gradual migration: Transition between backends without breaking code

  • Fallback scenarios: Use Citrus when tree-sitter library unavailable

  • Platform portability: tree-sitter on Linux/Mac, Citrus on JRuby/Windows

The active backend determines which registration is used automatically. No code changes needed to switch backends - just change TreeHaver.backend.

Examples:

Register tree-sitter grammar only

TreeHaver.register_language(
  :toml,
  path: "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml"
)

Register Citrus grammar only

TreeHaver.register_language(
  :toml,
  grammar_module: TomlRB::Document,
  gem_name: "toml-rb"
)

Register BOTH backends in separate calls

TreeHaver.register_language(
  :toml,
  path: "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml"
)
TreeHaver.register_language(
  :toml,
  grammar_module: TomlRB::Document,
  gem_name: "toml-rb"
)

Register BOTH backends in ONE call (recommended for maximum flexibility)

TreeHaver.register_language(
  :toml,
  path: "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml",
  grammar_module: TomlRB::Document,
  gem_name: "toml-rb"
)
# Now TreeHaver::Language.toml works with ANY backend!

Parameters:

  • name (Symbol, String)

    identifier for this registration (can be any name you choose)

  • path (String, nil) (defaults to: nil)

    absolute path to the language shared library (for tree-sitter)

  • symbol (String, nil) (defaults to: nil)

    optional exported factory symbol (e.g., “tree_sitter_toml”)

  • grammar_module (Module, nil) (defaults to: nil)

    Citrus grammar module that responds to .parse(source)

  • gem_name (String, nil) (defaults to: nil)

    optional gem name for error messages



690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
# File 'lib/tree_haver.rb', line 690

def register_language(name, path: nil, symbol: nil, grammar_module: nil, gem_name: nil)
  # Register tree-sitter backend if path provided
  # Note: Uses `if` not `elsif` so both backends can be registered in one call
  if path
    LanguageRegistry.register(name, :tree_sitter, path: path, symbol: symbol)
  end

  # Register Citrus backend if grammar_module provided
  # Note: Uses `if` not `elsif` so both backends can be registered in one call
  # This allows maximum flexibility - register once, use with any backend
  if grammar_module
    unless grammar_module.respond_to?(:parse)
      raise ArgumentError, "Grammar module must respond to :parse"
    end

    LanguageRegistry.register(name, :citrus, grammar_module: grammar_module, gem_name: gem_name)
  end

  # Require at least one backend to be registered
  if path.nil? && grammar_module.nil?
    raise ArgumentError, "Must provide at least one of: path (tree-sitter) or grammar_module (Citrus)"
  end

  # Note: No early return! This method intentionally processes both `if` blocks
  # above to allow registering multiple backends for the same language.
  # Both tree-sitter and Citrus can be registered simultaneously for maximum
  # flexibility. See method documentation for rationale.
  nil
end

.registered_language(name) ⇒ Hash?

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Fetch a registered language entry

Parameters:

  • name (Symbol, String)

    language identifier

Returns:

  • (Hash, nil)

    registration hash with keys :path and :symbol, or nil if not registered



725
726
727
# File 'lib/tree_haver.rb', line 725

def registered_language(name)
  LanguageRegistry.registered(name)
end

.reset_backend!(to: :auto) ⇒ void

This method returns an undefined value.

Reset backend selection memoization

Primarily useful in tests to switch backends without cross-example leakage.

Examples:

Reset to auto-selection

TreeHaver.reset_backend!

Reset to specific backend

TreeHaver.reset_backend!(to: :ffi)

Parameters:

  • to (Symbol, String, nil) (defaults to: :auto)

    backend name or nil to clear (defaults to :auto)



336
337
338
# File 'lib/tree_haver.rb', line 336

def reset_backend!(to: :auto)
  @backend = to&.to_sym # rubocop:disable ThreadSafety/ClassInstanceVariable
end

.resolve_backend_module(explicit_backend = nil) ⇒ Module?

Get backend module for a specific backend (with explicit override)

Examples:

mod = TreeHaver.resolve_backend_module(:ffi)
mod.capabilities[:backend]  # => :ffi

Parameters:

  • explicit_backend (Symbol, String, nil) (defaults to: nil)

    explicitly requested backend

Returns:

  • (Module, nil)

    the backend module or nil if not available

Raises:

  • (BackendConflict)

    if the backend conflicts with previously used backends



491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
# File 'lib/tree_haver.rb', line 491

def resolve_backend_module(explicit_backend = nil)
  # Temporarily override effective backend
  requested = resolve_effective_backend(explicit_backend)

  mod = case requested
  when :mri
    Backends::MRI
  when :rust
    Backends::Rust
  when :ffi
    Backends::FFI
  when :java
    Backends::Java
  when :citrus
    Backends::Citrus
  when :prism
    Backends::Prism
  when :psych
    Backends::Psych
  when :commonmarker
    Backends::Commonmarker
  when :markly
    Backends::Markly
  when :auto
    backend_module  # Fall back to normal resolution for :auto
  else
    # Unknown backend name - return nil to trigger error in caller
    nil
  end

  # Return nil if the module doesn't exist
  return unless mod

  # Check for backend conflicts FIRST, before checking availability
  # This is critical because the conflict causes the backend to report unavailable
  # We want to raise a clear error explaining WHY it's unavailable
  # Use the requested backend name directly (not capabilities) because
  # capabilities may be empty when the backend is blocked/unavailable
  check_backend_conflict!(requested) if requested && requested != :auto

  # Now check if the backend is available
  # Why assume modules without available? are available?
  # - Some backends might be mocked in tests without an available? method
  # - This makes the code more defensive and test-friendly
  # - It allows graceful degradation if a backend module is incomplete
  # - Backward compatibility: if a module doesn't declare availability, assume it works
  return if mod.respond_to?(:available?) && !mod.available?

  # Record that this backend is being used
  record_backend_usage(requested) if requested && requested != :auto

  mod
end

.resolve_effective_backend(explicit_backend = nil) ⇒ Symbol

Resolve the effective backend considering explicit override

Priority: explicit > thread context > global > :auto

Examples:

TreeHaver.resolve_effective_backend(:ffi)  # => :ffi

With thread-local context

TreeHaver.with_backend(:mri) do
  TreeHaver.resolve_effective_backend(nil)  # => :mri
  TreeHaver.resolve_effective_backend(:ffi)  # => :ffi (explicit wins)
end

Parameters:

  • explicit_backend (Symbol, String, nil) (defaults to: nil)

    explicitly requested backend

Returns:

  • (Symbol)

    the backend to use



478
479
480
481
# File 'lib/tree_haver.rb', line 478

def resolve_effective_backend(explicit_backend = nil)
  return explicit_backend.to_sym if explicit_backend
  effective_backend
end

.with_backend(name) { ... } ⇒ Object

Execute a block with a specific backend in thread-local context

This method provides temporary, thread-safe backend switching for a block of code. The backend setting is automatically restored when the block exits, even if an exception is raised. Supports nesting—inner blocks override outer blocks, and each level is properly unwound.

Thread Safety: Each thread maintains its own backend context, so concurrent threads can safely use different backends without interfering with each other.

Use Cases:

  • Testing: Test the same code path with different backends

  • Performance comparison: Benchmark parsing with different backends

  • Fallback scenarios: Try one backend, fall back to another on failure

  • Thread isolation: Different threads can use different backends safely

Examples:

Basic usage

TreeHaver.with_backend(:mri) do
  parser = TreeHaver::Parser.new
  tree = parser.parse(source)
end
# Backend is automatically restored here

Nested blocks (inner overrides outer)

TreeHaver.with_backend(:rust) do
  parser1 = TreeHaver::Parser.new  # Uses :rust
  TreeHaver.with_backend(:citrus) do
    parser2 = TreeHaver::Parser.new  # Uses :citrus
  end
  parser3 = TreeHaver::Parser.new  # Back to :rust
end

Testing multiple backends

[:mri, :rust, :citrus].each do |backend_name|
  TreeHaver.with_backend(backend_name) do
    parser = TreeHaver::Parser.new
    result = parser.parse(source)
    puts "#{backend_name}: #{result.root_node.type}"
  end
end

Exception safety (backend restored even on error)

TreeHaver.with_backend(:mri) do
  raise "Something went wrong"
rescue
  # Handle error
end
# Backend is still restored to its previous value

Thread isolation

threads = [:mri, :rust].map do |backend_name|
  Thread.new do
    TreeHaver.with_backend(backend_name) do
      # Each thread uses its own backend independently
      TreeHaver::Parser.new
    end
  end
end
threads.each(&:join)

Parameters:

  • name (Symbol, String)

    backend name (:mri, :rust, :ffi, :java, :citrus, :auto)

Yields:

  • block to execute with the specified backend

Returns:

  • (Object)

    the return value of the block

Raises:

  • (ArgumentError)

    if backend name is nil

  • (BackendConflict)

    if the requested backend conflicts with a previously used backend

See Also:

  • #effective_backend
  • #current_backend_context


442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
# File 'lib/tree_haver.rb', line 442

def with_backend(name)
  raise ArgumentError, "Backend name required" if name.nil?

  # Get context FIRST to ensure it exists
  ctx = current_backend_context
  old_backend = ctx[:backend]
  old_depth = ctx[:depth]

  begin
    # Set new backend and increment depth
    ctx[:backend] = name.to_sym
    ctx[:depth] += 1

    # Execute block
    yield
  ensure
    # Restore previous backend and depth
    # This ensures proper unwinding even with exceptions
    ctx[:backend] = old_backend
    ctx[:depth] = old_depth
  end
end