Class: TreeHaver::Language

Inherits:
Object
  • Object
show all
Defined in:
lib/tree_haver/language.rb

Overview

Represents a language grammar for parsing source code

Language is the entry point for loading and using grammars. It provides a unified interface that works across all backends (MRI, Rust, FFI, Java, Citrus).

For tree-sitter backends, languages are loaded from shared library files (.so/.dylib/.dll). For pure-Ruby backends (Citrus, Prism, Psych), languages are built-in or provided by gems.

Loading Languages

The primary way to load a language is via registration:

TreeHaver.register_language(:toml, path: "/path/to/libtree-sitter-toml.so")
language = TreeHaver::Language.toml

For explicit loading without registration:

language = TreeHaver::Language.from_library(
  "/path/to/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml"
)

For ruby_tree_sitter compatibility:

language = TreeHaver::Language.load("toml", "/path/to/libtree-sitter-toml.so")

Examples:

Register and load a language

TreeHaver.register_language(:toml, path: "/path/to/grammar.so")
language = TreeHaver::Language.toml

Class Method Summary collapse

Class Method Details

.from_library(path, symbol: nil, name: nil, validate: true, backend: nil) ⇒ Language Also known as: from_path

Load a language grammar from a shared library

The library must export a function that returns a pointer to a TSLanguage struct. By default, TreeHaver looks for a symbol named “tree_sitter_<name>”.

Security

By default, paths are validated using PathValidator to prevent path traversal and other attacks. Set ‘validate: false` to skip validation (not recommended unless you’ve already validated the path).

Examples:

language = TreeHaver::Language.from_library(
  "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml",
  name: "toml"
)

With explicit backend

language = TreeHaver::Language.from_library(
  "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml",
  backend: :ffi
)

Parameters:

  • absolute path to the language shared library (.so/.dylib/.dll)

  • (defaults to: nil)

    name of the exported function (defaults to auto-detection)

  • (defaults to: nil)

    logical name for the language (used in caching)

  • (defaults to: true)

    if true, validates path and symbol for safety (default: true)

  • (defaults to: nil)

    optional backend to use (overrides context/global)

Returns:

  • loaded language handle

Raises:

  • if the library cannot be loaded or the symbol is not found

  • if path or symbol fails security validation



83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# File 'lib/tree_haver/language.rb', line 83

def from_library(path, symbol: nil, name: nil, validate: true, backend: nil)
  if validate
    unless PathValidator.safe_library_path?(path)
      errors = PathValidator.validation_errors(path)
      raise ArgumentError, "Unsafe library path: #{path.inspect}. Errors: #{errors.join("; ")}"
    end

    if symbol && !PathValidator.safe_symbol_name?(symbol)
      raise ArgumentError, "Unsafe symbol name: #{symbol.inspect}. " \
        "Symbol names must be valid C identifiers."
    end
  end

  # from_library only works with tree-sitter backends that support .so files
  # Pure Ruby backends (Citrus, Prism, Psych, Commonmarker, Markly) don't support from_library
  mod = TreeHaver.resolve_native_backend_module(backend)

  if mod.nil?
    if backend
      raise NotAvailable, "Requested backend #{backend.inspect} is not available or does not support shared libraries"
    else
      raise NotAvailable,
        "No native tree-sitter backend is available for loading shared libraries. " \
          "Available native backends (MRI, Rust, FFI, Java) require platform-specific setup. " \
          "For pure-Ruby parsing, use backend-specific Language classes directly (e.g., Prism, Psych, Citrus)."
    end
  end

  # Backend must implement .from_library; fallback to .from_path for older impls
  # Include effective backend AND ENV vars in cache key since they affect loading
  effective_b = TreeHaver.resolve_effective_backend(backend)
  key = [effective_b, path, symbol, name, ENV["TREE_SITTER_LANG_SYMBOL"]]
  LanguageRegistry.fetch(key) do
    if mod::Language.respond_to?(:from_library)
      mod::Language.from_library(path, symbol: symbol, name: name)
    else
      mod::Language.from_path(path)
    end
  end
end

.load(name, path, validate: true) ⇒ Language

Load a language grammar from a shared library (ruby_tree_sitter compatibility)

This method provides API compatibility with ruby_tree_sitter which uses ‘Language.load(name, path)`.

Examples:

language = TreeHaver::Language.load("toml", "/usr/local/lib/libtree-sitter-toml.so")

Parameters:

  • the language name (e.g., “toml”)

  • absolute path to the language shared library

  • (defaults to: true)

    if true, validates the path for safety (default: true)

Returns:

  • loaded language handle

Raises:

  • if the library cannot be loaded

  • if the path fails security validation



48
49
50
# File 'lib/tree_haver/language.rb', line 48

def load(name, path, validate: true)
  from_library(path, symbol: "tree_sitter_#{name}", name: name, validate: validate)
end

.method_missing(method_name, *args, **kwargs, &block) ⇒ Language

Dynamic helper to load a registered language by name

After registering a language with TreeHaver.register_language, you can load it using a method call. The appropriate backend will be used based on registration and current backend.

Examples:

With tree-sitter

TreeHaver.register_language(:toml, path: "/path/to/libtree-sitter-toml.so")
language = TreeHaver::Language.toml

With both backends

TreeHaver.register_language(:toml,
  path: "/path/to/libtree-sitter-toml.so", symbol: "tree_sitter_toml")
TreeHaver.register_language(:toml,
  grammar_module: TomlRB::Document)
language = TreeHaver::Language.toml  # Uses appropriate grammar for active backend

Parameters:

  • the registered language name

  • positional arguments

  • keyword arguments

Returns:

  • loaded language handle

Raises:

  • if the language name is not registered



149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
# File 'lib/tree_haver/language.rb', line 149

def method_missing(method_name, *args, **kwargs, &block)
  # Resolve only if the language name was registered
  all_backends = TreeHaver.registered_language(method_name)
  return super unless all_backends

  # Check current backend
  current_backend = TreeHaver.backend_module

  # Determine which backend type to use
  backend_type = if current_backend == Backends::Citrus
    :citrus
  else
    :tree_sitter  # MRI, Rust, FFI, Java all use tree-sitter
  end

  # Get backend-specific registration
  reg = all_backends[backend_type]

  # If Citrus backend is active
  if backend_type == :citrus
    if reg && reg[:grammar_module]
      return Backends::Citrus::Language.new(reg[:grammar_module])
    end

    # Fall back to error if no Citrus grammar registered
    raise NotAvailable,
      "Citrus backend is active but no Citrus grammar registered for :#{method_name}. " \
        "Either register a Citrus grammar or use a tree-sitter backend. " \
        "Registered backends: #{all_backends.keys.inspect}"
  end

  # For tree-sitter backends, try to load from path
  # If that fails, fall back to Citrus if available
  if reg && reg[:path]
    path = kwargs[:path] || args.first || reg[:path]
    # Symbol priority: kwargs override > registration > derive from method_name
    symbol = if kwargs.key?(:symbol)
      kwargs[:symbol]
    elsif reg[:symbol]
      reg[:symbol]
    else
      "tree_sitter_#{method_name}"
    end
    # Name priority: kwargs override > derive from symbol (strip tree_sitter_ prefix)
    # Using symbol-derived name ensures ruby_tree_sitter gets the correct language name
    # e.g., "toml" not "toml_both" when symbol is "tree_sitter_toml"
    name = kwargs[:name] || symbol&.sub(/\Atree_sitter_/, "")

    begin
      return from_library(path, symbol: symbol, name: name)
    rescue NotAvailable, ArgumentError, LoadError => e
      # Tree-sitter failed to load - check for Citrus fallback
      handle_tree_sitter_load_failure(e, all_backends)
    rescue => e
      # Also catch FFI::NotFoundError if FFI is loaded (can't reference directly as FFI may not exist)
      if defined?(::FFI::NotFoundError) && e.is_a?(::FFI::NotFoundError)
        handle_tree_sitter_load_failure(e, all_backends)
      else
        raise
      end
    end
  end

  # No tree-sitter path registered - check for Citrus fallback
  # This enables auto-fallback when tree-sitter grammar is not installed
  # but a Citrus grammar (pure Ruby) is available
  citrus_reg = all_backends[:citrus]
  if citrus_reg && citrus_reg[:grammar_module]
    return Backends::Citrus::Language.new(citrus_reg[:grammar_module])
  end

  # No appropriate registration found
  raise ArgumentError,
    "No grammar registered for :#{method_name} compatible with #{backend_type} backend. " \
      "Registered backends: #{all_backends.keys.inspect}"
end

.respond_to_missing?(method_name, include_private = false) ⇒ Boolean

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Returns:

API:

  • private



227
228
229
# File 'lib/tree_haver/language.rb', line 227

def respond_to_missing?(method_name, include_private = false)
  !!TreeHaver.registered_language(method_name) || super
end