Class: TreeHaver::Parser

Inherits:
Base::Parser
  • Object
show all
Defined in:
lib/tree_haver/parser.rb

Overview

Unified Parser facade providing a consistent API across all backends

This class acts as a facade/adapter that delegates to backend-specific parser implementations. It automatically selects the appropriate backend and provides a unified interface regardless of which parser is being used.

Backend Selection

The parser automatically selects a backend based on:

  1. Explicit backend: parameter in constructor

  2. TreeHaver.backend global setting

  3. TREE_HAVER_BACKEND environment variable

  4. Auto-detection (tries available backends in order)

Supported Backends

**Tree-sitter backends** (native, high-performance):

  • :mri - ruby_tree_sitter gem (C extension, MRI only)

  • :rust - tree_stump gem (Rust via magnus, MRI only)

  • :ffi - FFI bindings to libtree-sitter (MRI, JRuby)

  • :java - java-tree-sitter (JRuby only)

**Pure Ruby backends** (portable, no native dependencies):

  • :citrus - Citrus PEG parser (e.g., toml-rb)

  • :parslet - Parslet PEG parser (e.g., toml gem)

  • :prism - Ruby’s official parser (Ruby only)

  • :psych - YAML parser (stdlib)

Wrapping/Unwrapping Responsibility

TreeHaver::Parser handles ALL object wrapping and unwrapping:

**Language objects:**

  • Unwraps Language wrappers before passing to backend.language=

  • MRI backend receives ::TreeSitter::Language

  • Rust backend receives String (language name)

  • FFI backend receives wrapped Language (needs to_ptr)

  • Citrus backend receives grammar module

  • Parslet backend receives grammar class

**Tree objects:**

  • parse() receives raw source, backend returns raw tree, Parser wraps it

  • parse_string() unwraps old_tree before passing to backend, wraps returned tree

  • Backends always work with raw backend trees, never TreeHaver::Tree

**Node objects:**

  • Backends return raw nodes, TreeHaver::Tree and TreeHaver::Node wrap them

This design ensures:

  • Principle of Least Surprise: wrapping happens at boundaries, consistently

  • Backends are simple: they don’t need to know about TreeHaver wrappers

  • Single Responsibility: wrapping logic is only in TreeHaver::Parser

Examples:

Basic parsing

parser = TreeHaver::Parser.new
parser.language = TreeHaver::Language.toml
tree = parser.parse("[package]\nname = \"foo\"")

Explicit backend selection

parser = TreeHaver::Parser.new(backend: :citrus)
parser.language = TreeHaver::Language.toml
tree = parser.parse(toml_source)

See Also:

Instance Method Summary collapse

Constructor Details

#initialize(backend: nil) ⇒ Parser

Create a new parser instance

The parser automatically selects the best available backend unless explicitly specified. Use the backend: parameter to force a specific backend.

Examples:

Default (auto-selects best available backend)

parser = TreeHaver::Parser.new

Explicit backend

parser = TreeHaver::Parser.new(backend: :citrus)

Parameters:

  • backend (Symbol, String, nil) (defaults to: nil)

    optional backend to use (overrides context/global) Valid values: :auto, :mri, :rust, :ffi, :java, :citrus, :parslet, :prism, :psych

Raises:

  • (NotAvailable)

    if no backend is available or requested backend is unavailable



84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
# File 'lib/tree_haver/parser.rb', line 84

def initialize(backend: nil)
  super()  # Initialize @language from Base::Parser

  # Convert string backend names to symbols for consistency
  backend = backend.to_sym if backend.is_a?(String)

  mod = TreeHaver.resolve_backend_module(backend)

  if mod.nil?
    if backend
      raise NotAvailable, "Requested backend #{backend.inspect} is not available"
    else
      raise NotAvailable, "No TreeHaver backend is available"
    end
  end

  # Try to create the parser, with fallback to pure Ruby if tree-sitter fails
  # This enables auto-fallback when tree-sitter runtime isn't available
  begin
    @impl = mod::Parser.new
    @explicit_backend = backend  # Remember for introspection (always a Symbol or nil)
  rescue NoMethodError, LoadError => e
    # Note: FFI::NotFoundError inherits from LoadError, so it's caught here too
    handle_parser_creation_failure(e, backend)
  end
end

Instance Method Details

#backendSymbol

Get the backend this parser is using (for introspection)

Returns the actual backend in use, resolving :auto to the concrete backend.

Returns:

  • (Symbol)

    the backend name (:mri, :rust, :ffi, :java, :citrus, or :parslet)



143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
# File 'lib/tree_haver/parser.rb', line 143

def backend
  if @explicit_backend && @explicit_backend != :auto
    @explicit_backend
  else
    # Determine actual backend from the implementation class
    case @impl.class.name
    when /MRI/
      :mri
    when /Rust/
      :rust
    when /FFI/
      :ffi
    when /Java/
      :java
    when /Citrus/
      :citrus
    when /Parslet/
      :parslet
    else
      # Fallback to effective_backend if we can't determine from class name
      TreeHaver.effective_backend
    end
  end
end

#handle_parser_creation_failure(error, backend) ⇒ Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Handle parser creation failure with optional Citrus/Parslet fallback

Parameters:

  • error (Exception)

    the error that caused parser creation to fail

  • backend (Symbol, nil)

    the requested backend

Raises:



117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# File 'lib/tree_haver/parser.rb', line 117

def handle_parser_creation_failure(error, backend)
  # Tree-sitter backend failed (likely missing runtime library)
  # Try Citrus or Parslet as fallback if we weren't explicitly asked for a specific backend
  if backend.nil? || backend == :auto
    if Backends::Citrus.available?
      @impl = Backends::Citrus::Parser.new
      @explicit_backend = :citrus
    elsif Backends::Parslet.available?
      @impl = Backends::Parslet::Parser.new
      @explicit_backend = :parslet
    else
      # No fallback available, re-raise original error
      raise NotAvailable, "Tree-sitter backend failed: #{error.message}. " \
        "Citrus/Parslet fallback not available. Install tree-sitter runtime, citrus gem, or parslet gem."
    end
  else
    # Explicit backend was requested, don't fallback
    raise error
  end
end

#language=(lang) ⇒ Language

Set the language grammar for this parser

The language must be compatible with the parser’s backend. If a mismatch is detected (e.g., Citrus language on tree-sitter parser), the parser will automatically switch to the correct backend.

Examples:

parser.language = TreeHaver::Language.from_library("/path/to/grammar.so")

Parameters:

  • lang (Language)

    the language to use for parsing

Returns:

  • (Language)

    the language that was set



178
179
180
181
182
183
184
185
186
187
188
189
190
191
# File 'lib/tree_haver/parser.rb', line 178

def language=(lang)
  # Auto-switch backend if language type doesn't match current parser
  # This handles the case where Language.toml returns a Citrus/Parslet language
  # but the parser was initialized with a tree-sitter backend
  switch_backend_for_language(lang)

  # Unwrap the language before passing to backend
  # Backends receive raw language objects, never TreeHaver wrappers
  inner_lang = unwrap_language(lang)
  @impl.language = inner_lang

  # Store on base class for API compatibility
  @language = lang
end

#parse(source) ⇒ Tree

Parse source code into a syntax tree

Examples:

tree = parser.parse("x = 1")
puts tree.root_node.type

Parameters:

  • source (String)

    the source code to parse (should be UTF-8)

Returns:

  • (Tree)

    the parsed syntax tree



200
201
202
203
204
# File 'lib/tree_haver/parser.rb', line 200

def parse(source)
  tree_impl = @impl.parse(source)
  # Wrap backend tree with source so Node#text works
  Tree.new(tree_impl, source: source)
end

#parse_string(old_tree, source) ⇒ Tree

Parse source code into a syntax tree (with optional incremental parsing)

This method provides API compatibility with ruby_tree_sitter which uses ‘parse_string(old_tree, source)`.

Incremental Parsing

tree-sitter supports **incremental parsing** where you can pass a previously parsed tree along with edit information to efficiently re-parse only the changed portions of source code. This is a major performance optimization for editors and IDEs that need to re-parse on every keystroke.

The workflow for incremental parsing is:

  1. Parse the initial source: ‘tree = parser.parse_string(nil, source)`

  2. User edits the source (e.g., inserts a character)

  3. Call ‘tree.edit(…)` to update the tree’s position data

  4. Re-parse with the old tree: ‘new_tree = parser.parse_string(tree, new_source)`

  5. tree-sitter reuses unchanged nodes, only re-parsing affected regions

TreeHaver passes through to the underlying backend if it supports incremental parsing (MRI and Rust backends do). Check TreeHaver.capabilities[:incremental] to see if the current backend supports it.

Examples:

First parse (no old tree)

tree = parser.parse_string(nil, "x = 1")

Incremental parse

tree.edit(start_byte: 4, old_end_byte: 5, new_end_byte: 6, ...)
new_tree = parser.parse_string(tree, "x = 42")

Parameters:

  • old_tree (Tree, nil)

    previously parsed tree for incremental parsing, or nil for fresh parse

  • source (String)

    the source code to parse (should be UTF-8)

Returns:

  • (Tree)

    the parsed syntax tree

See Also:



239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
# File 'lib/tree_haver/parser.rb', line 239

def parse_string(old_tree, source)
  # Pass through to backend if it supports incremental parsing
  if old_tree && @impl.respond_to?(:parse_string)
    # Extract the underlying implementation from our Tree wrapper
    old_impl = if old_tree.respond_to?(:inner_tree)
      old_tree.inner_tree
    elsif old_tree.respond_to?(:instance_variable_get)
      # Fallback for compatibility
      old_tree.instance_variable_get(:@inner_tree) || old_tree.instance_variable_get(:@impl) || old_tree
    else
      old_tree
    end
    tree_impl = @impl.parse_string(old_impl, source)
    # Wrap backend tree with source so Node#text works
    Tree.new(tree_impl, source: source)
  elsif @impl.respond_to?(:parse_string)
    tree_impl = @impl.parse_string(nil, source)
    # Wrap backend tree with source so Node#text works
    Tree.new(tree_impl, source: source)
  else
    # Fallback for backends that don't support parse_string
    parse(source)
  end
end