Class: TreeHaver::Language

Inherits:
Object
  • Object
show all
Defined in:
lib/tree_haver.rb

Overview

Represents a tree-sitter language grammar

A Language object is an opaque handle to a TSLanguage* that defines the grammar rules for parsing a specific programming language.

Examples:

Load a language from a shared library

language = TreeHaver::Language.from_library(
  "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml"
)

Use a registered language

TreeHaver.register_language(:toml, path: "/path/to/libtree-sitter-toml.so")
language = TreeHaver::Language.toml

Class Method Summary collapse

Class Method Details

.from_library(path, symbol: nil, name: nil, validate: true, backend: nil) ⇒ Language Also known as: from_path

Load a language grammar from a shared library

The library must export a function that returns a pointer to a TSLanguage struct. By default, TreeHaver looks for a symbol named “tree_sitter_<name>”.

Security

By default, paths are validated using PathValidator to prevent path traversal and other attacks. Set ‘validate: false` to skip validation (not recommended unless you’ve already validated the path).

Examples:

language = TreeHaver::Language.from_library(
  "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml",
  name: "toml"
)

With explicit backend

language = TreeHaver::Language.from_library(
  "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml",
  backend: :ffi
)

Parameters:

  • path (String)

    absolute path to the language shared library (.so/.dylib/.dll)

  • symbol (String, nil) (defaults to: nil)

    name of the exported function (defaults to auto-detection)

  • name (String, nil) (defaults to: nil)

    logical name for the language (used in caching)

  • validate (Boolean) (defaults to: true)

    if true, validates path and symbol for safety (default: true)

  • backend (Symbol, String, nil) (defaults to: nil)

    optional backend to use (overrides context/global)

Returns:

Raises:

  • (NotAvailable)

    if the library cannot be loaded or the symbol is not found

  • (ArgumentError)

    if path or symbol fails security validation



794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
# File 'lib/tree_haver.rb', line 794

def from_library(path, symbol: nil, name: nil, validate: true, backend: nil)
  if validate
    unless PathValidator.safe_library_path?(path)
      errors = PathValidator.validation_errors(path)
      raise ArgumentError, "Unsafe library path: #{path.inspect}. Errors: #{errors.join("; ")}"
    end

    if symbol && !PathValidator.safe_symbol_name?(symbol)
      raise ArgumentError, "Unsafe symbol name: #{symbol.inspect}. " \
        "Symbol names must be valid C identifiers."
    end
  end

  mod = TreeHaver.resolve_backend_module(backend)

  if mod.nil?
    if backend
      raise NotAvailable, "Requested backend #{backend.inspect} is not available"
    else
      raise NotAvailable, "No TreeHaver backend is available"
    end
  end

  # Backend must implement .from_library; fallback to .from_path for older impls
  # Include effective backend AND ENV vars in cache key since they affect loading
  effective_b = TreeHaver.resolve_effective_backend(backend)
  key = [effective_b, path, symbol, name, ENV["TREE_SITTER_LANG_SYMBOL"]]
  LanguageRegistry.fetch(key) do
    if mod::Language.respond_to?(:from_library)
      mod::Language.from_library(path, symbol: symbol, name: name)
    else
      mod::Language.from_path(path)
    end
  end
end

.load(name, path, validate: true) ⇒ Language

Load a language grammar from a shared library (ruby_tree_sitter compatibility)

This method provides API compatibility with ruby_tree_sitter which uses ‘Language.load(name, path)`.

Examples:

language = TreeHaver::Language.load("toml", "/usr/local/lib/libtree-sitter-toml.so")

Parameters:

  • name (String)

    the language name (e.g., “toml”)

  • path (String)

    absolute path to the language shared library

  • validate (Boolean) (defaults to: true)

    if true, validates the path for safety (default: true)

Returns:

Raises:

  • (NotAvailable)

    if the library cannot be loaded

  • (ArgumentError)

    if the path fails security validation



759
760
761
# File 'lib/tree_haver.rb', line 759

def load(name, path, validate: true)
  from_library(path, symbol: "tree_sitter_#{name}", name: name, validate: validate)
end

.method_missing(method_name, *args, **kwargs, &block) ⇒ Language

Dynamic helper to load a registered language by name

After registering a language with TreeHaver.register_language, you can load it using a method call. The appropriate backend will be used based on registration and current backend.

Examples:

With tree-sitter

TreeHaver.register_language(:toml, path: "/path/to/libtree-sitter-toml.so")
language = TreeHaver::Language.toml

With both backends

TreeHaver.register_language(:toml,
  path: "/path/to/libtree-sitter-toml.so", symbol: "tree_sitter_toml")
TreeHaver.register_language(:toml,
  grammar_module: TomlRB::Document)
language = TreeHaver::Language.toml  # Uses appropriate grammar for active backend

Parameters:

  • method_name (Symbol)

    the registered language name

  • args (Array)

    positional arguments

  • kwargs (Hash)

    keyword arguments

Returns:

Raises:

  • (NoMethodError)

    if the language name is not registered



855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
# File 'lib/tree_haver.rb', line 855

def method_missing(method_name, *args, **kwargs, &block)
  # Resolve only if the language name was registered
  all_backends = TreeHaver.registered_language(method_name)
  return super unless all_backends

  # Check current backend
  current_backend = TreeHaver.backend_module

  # Determine which backend type to use
  backend_type = if current_backend == Backends::Citrus
    :citrus
  else
    :tree_sitter  # MRI, Rust, FFI, Java all use tree-sitter
  end

  # Get backend-specific registration
  reg = all_backends[backend_type]

  # If Citrus backend is active
  if backend_type == :citrus
    if reg && reg[:grammar_module]
      return Backends::Citrus::Language.new(reg[:grammar_module])
    end

    # Fall back to error if no Citrus grammar registered
    raise NotAvailable,
      "Citrus backend is active but no Citrus grammar registered for :#{method_name}. " \
        "Either register a Citrus grammar or use a tree-sitter backend. " \
        "Registered backends: #{all_backends.keys.inspect}"
  end

  # For tree-sitter backends, try to load from path
  # If that fails, fall back to Citrus if available
  if reg && reg[:path]
    path = kwargs[:path] || args.first || reg[:path]
    # Symbol priority: kwargs override > registration > derive from method_name
    symbol = if kwargs.key?(:symbol)
      kwargs[:symbol]
    elsif reg[:symbol]
      reg[:symbol]
    else
      "tree_sitter_#{method_name}"
    end
    # Name priority: kwargs override > derive from symbol (strip tree_sitter_ prefix)
    # Using symbol-derived name ensures ruby_tree_sitter gets the correct language name
    # e.g., "toml" not "toml_both" when symbol is "tree_sitter_toml"
    name = kwargs[:name] || symbol&.sub(/\Atree_sitter_/, "")

    begin
      return from_library(path, symbol: symbol, name: name)
    rescue NotAvailable, ArgumentError, LoadError, FFI::NotFoundError => _e
      # Tree-sitter failed to load - check for Citrus fallback
      # This handles cases where:
      # - The .so file doesn't exist or can't be loaded (NotAvailable, LoadError)
      # - FFI can't find required symbols like ts_parser_new (FFI::NotFoundError)
      # - Invalid arguments were provided (ArgumentError)
      citrus_reg = all_backends[:citrus]
      if citrus_reg && citrus_reg[:grammar_module]
        return Backends::Citrus::Language.new(citrus_reg[:grammar_module])
      end
      # No Citrus fallback available, re-raise the original error
      raise
    end
  end

  # No tree-sitter path registered - check for Citrus fallback
  # This enables auto-fallback when tree-sitter grammar is not installed
  # but a Citrus grammar (pure Ruby) is available
  citrus_reg = all_backends[:citrus]
  if citrus_reg && citrus_reg[:grammar_module]
    return Backends::Citrus::Language.new(citrus_reg[:grammar_module])
  end

  # No appropriate registration found
  raise ArgumentError,
    "No grammar registered for :#{method_name} compatible with #{backend_type} backend. " \
      "Registered backends: #{all_backends.keys.inspect}"
end

.respond_to_missing?(method_name, include_private = false) ⇒ Boolean

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Returns:

  • (Boolean)


935
936
937
# File 'lib/tree_haver.rb', line 935

def respond_to_missing?(method_name, include_private = false)
  !!TreeHaver.registered_language(method_name) || super
end