Class: TreeHaver::GrammarFinder

Inherits:
Object
  • Object
show all
Defined in:
lib/tree_haver/grammar_finder.rb

Overview

Generic utility for finding tree-sitter grammar shared libraries.

GrammarFinder provides platform-aware discovery of tree-sitter grammar libraries. Given a language name, it searches common installation paths and supports environment variable overrides.

This class is designed to be used by language-specific merge gems (toml-merge, json-merge, bash-merge, etc.) without requiring TreeHaver to have knowledge of each specific language.

Security Considerations

Loading shared libraries is inherently dangerous as it executes arbitrary native code. GrammarFinder performs the following security validations:

  • Language names are validated to contain only safe characters

  • Paths from environment variables are validated before use

  • Path traversal attempts (../) are rejected

  • Only files with expected extensions (.so, .dylib, .dll) are accepted

For additional security, use #find_library_path_safe which only returns paths from trusted system directories.

Examples:

Basic usage

finder = TreeHaver::GrammarFinder.new(:toml)
path = finder.find_library_path
# => "/usr/lib/libtree-sitter-toml.so"

Check availability

finder = TreeHaver::GrammarFinder.new(:json)
if finder.available?
  language = TreeHaver::Language.load(finder.language_name, finder.find_library_path)
end

Register with TreeHaver

finder = TreeHaver::GrammarFinder.new(:bash)
finder.register! if finder.available?
# Now you can use: TreeHaver::Language.bash

With custom search paths

finder = TreeHaver::GrammarFinder.new(:toml, extra_paths: ["/opt/custom/lib"])

Secure mode (trusted directories only)

finder = TreeHaver::GrammarFinder.new(:toml)
path = finder.find_library_path_safe  # Only returns paths in trusted dirs

See Also:

Constant Summary collapse

BASE_SEARCH_DIRS =

Common base directories where tree-sitter libraries are installed Platform-specific extensions are appended automatically

[
  "/usr/lib",
  "/usr/lib64",
  "/usr/local/lib",
  "/opt/homebrew/lib",
].freeze
TREE_SITTER_BACKENDS =

Backends that use tree-sitter (require native runtime libraries) Other backends (Citrus, Prism, Psych, etc.) don’t use tree-sitter

[
  TreeHaver::Backends::MRI,
  TreeHaver::Backends::FFI,
  TreeHaver::Backends::Rust,
  TreeHaver::Backends::Java,
].freeze

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(language_name, extra_paths: [], validate: true) ⇒ GrammarFinder

Initialize a grammar finder for a specific language

Parameters:

  • language_name (Symbol, String)

    the tree-sitter language name (e.g., :toml, :json, :bash)

  • extra_paths (Array<String>) (defaults to: [])

    additional paths to search (searched first after ENV)

  • validate (Boolean) (defaults to: true)

    if true, validates the language name (default: true)

Raises:

  • (ArgumentError)

    if language_name is invalid and validate is true



75
76
77
78
79
80
81
82
83
84
85
# File 'lib/tree_haver/grammar_finder.rb', line 75

def initialize(language_name, extra_paths: [], validate: true)
  name_str = language_name.to_s.downcase

  if validate && !PathValidator.safe_language_name?(name_str)
    raise ArgumentError, "Invalid language name: #{language_name.inspect}. " \
      "Language names must start with a letter and contain only lowercase letters, numbers, and underscores."
  end

  @language_name = name_str.to_sym
  @extra_paths = Array(extra_paths)
end

Instance Attribute Details

#extra_pathsArray<String> (readonly)

Returns additional search paths provided at initialization.

Returns:

  • (Array<String>)

    additional search paths provided at initialization



67
68
69
# File 'lib/tree_haver/grammar_finder.rb', line 67

def extra_paths
  @extra_paths
end

#language_nameSymbol (readonly)

Returns the language identifier.

Returns:

  • (Symbol)

    the language identifier



64
65
66
# File 'lib/tree_haver/grammar_finder.rb', line 64

def language_name
  @language_name
end

Class Method Details

.reset_runtime_check!Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Reset the cached tree-sitter runtime check (for testing)



262
263
264
# File 'lib/tree_haver/grammar_finder.rb', line 262

def reset_runtime_check!
  remove_instance_variable(:@tree_sitter_runtime_usable) if defined?(@tree_sitter_runtime_usable)
end

.tree_sitter_runtime_usable?Boolean

Check if the tree-sitter runtime is usable

Tests whether we can actually create a tree-sitter parser. Result is cached since this is expensive and won’t change during runtime.

Returns:

  • (Boolean)

    true if tree-sitter runtime is functional



239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
# File 'lib/tree_haver/grammar_finder.rb', line 239

def tree_sitter_runtime_usable?
  return @tree_sitter_runtime_usable if defined?(@tree_sitter_runtime_usable)

  @tree_sitter_runtime_usable = begin
    # Try to create a parser using the current backend
    mod = TreeHaver.resolve_backend_module(nil)

    # Only tree-sitter backends are relevant here
    # Non-tree-sitter backends (Citrus, Prism, Psych, etc.) don't use grammar files
    return false if mod.nil?
    return false unless TREE_SITTER_BACKENDS.include?(mod)

    # Try to instantiate a parser - this will fail if runtime isn't available
    mod::Parser.new
    true
  rescue NoMethodError, FFI::NotFoundError, LoadError, NotAvailable => _e
    false
  end
end

Instance Method Details

#available?Boolean

Check if the grammar library is available AND usable

This checks:

  1. The grammar library file exists

  2. The tree-sitter runtime is functional (can create a parser)

This prevents registering grammars when tree-sitter isn’t actually usable, allowing clean fallback to alternative backends like Citrus.

Returns:

  • (Boolean)

    true if the library can be found AND tree-sitter runtime works



214
215
216
217
218
219
220
221
# File 'lib/tree_haver/grammar_finder.rb', line 214

def available?
  path = find_library_path
  return false if path.nil?

  # Check if tree-sitter runtime is actually functional
  # This is cached at the class level since it's the same for all grammars
  self.class.tree_sitter_runtime_usable?
end

#available_safe?Boolean

Check if the grammar library is available in a trusted directory

Returns:

  • (Boolean)

    true if the library can be found in a trusted directory

See Also:



271
272
273
# File 'lib/tree_haver/grammar_finder.rb', line 271

def available_safe?
  !find_library_path_safe.nil?
end

#env_var_nameString

Get the environment variable name for this language

Returns:

  • (String)

    the ENV var name (e.g., “TREE_SITTER_TOML_PATH”)



90
91
92
# File 'lib/tree_haver/grammar_finder.rb', line 90

def env_var_name
  "TREE_SITTER_#{@language_name.to_s.upcase}_PATH"
end

#find_library_pathString?

Note:

Paths from ENV are validated using PathValidator.safe_library_path? to prevent path traversal and other attacks. Invalid ENV paths are ignored.

Note:

Setting the ENV variable to an empty string explicitly disables this grammar. This allows fallback to alternative backends (e.g., Citrus).

Find the grammar library path

Searches in order:

  1. Environment variable override (validated for safety)

  2. Extra paths provided at initialization

  3. Common system installation paths

Returns:

  • (String, nil)

    the path to the library, or nil if not found

See Also:



145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
# File 'lib/tree_haver/grammar_finder.rb', line 145

def find_library_path
  # Check environment variable first (highest priority)
  # Use key? to distinguish between "not set" and "set to empty"
  if ENV.key?(env_var_name)
    env_path = ENV[env_var_name]

    # Empty string means "explicitly skip this grammar"
    # This allows users to disable tree-sitter for specific languages
    # and fall back to alternative backends like Citrus
    if env_path.empty?
      @env_rejection_reason = "explicitly disabled (set to empty string)"
      return
    end

    # Store why env path was rejected for better error messages
    @env_rejection_reason = validate_env_path(env_path)
    return env_path if @env_rejection_reason.nil?
  end

  # Search all paths (these are constructed from trusted base dirs)
  search_paths.find { |path| File.exist?(path) }
end

#find_library_path_safeString?

Find the grammar library path with strict security validation

This method only returns paths that are in trusted system directories. Use this when you want maximum security and don’t need to support custom installation locations.

Returns:

  • (String, nil)

    the path to the library, or nil if not found

See Also:

  • For the list of trusted directories


197
198
199
200
201
202
# File 'lib/tree_haver/grammar_finder.rb', line 197

def find_library_path_safe
  # Environment variable is NOT checked in safe mode - only trusted system paths
  search_paths.find do |path|
    File.exist?(path) && PathValidator.in_trusted_directory?(path)
  end
end

#library_filenameString

Get the library filename for the current platform

Returns:

  • (String)

    the library filename (e.g., “libtree-sitter-toml.so”)



104
105
106
107
# File 'lib/tree_haver/grammar_finder.rb', line 104

def library_filename
  ext = platform_extension
  "libtree-sitter-#{@language_name}#{ext}"
end

#not_found_messageString

Get a human-readable error message when library is not found

Returns:

  • (String)

    error message with installation hints



317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
# File 'lib/tree_haver/grammar_finder.rb', line 317

def not_found_message
  msg = "tree-sitter #{@language_name} grammar not found."

  # Check if env var is set but rejected
  env_value = ENV[env_var_name]
  msg += if env_value && @env_rejection_reason
    " #{env_var_name} is set to #{env_value.inspect} but #{@env_rejection_reason}."
  elsif env_value
    " #{env_var_name} is set but was not used (file may have been removed)."
  else
    " Searched: #{search_paths.join(", ")}."
  end

  msg + " Install tree-sitter-#{@language_name} or set #{env_var_name} to a valid path."
end

#register!(raise_on_missing: false) ⇒ Boolean

Register this language with TreeHaver

After registration, the language can be loaded via dynamic method (e.g., ‘TreeHaver::Language.toml`).

Parameters:

  • raise_on_missing (Boolean) (defaults to: false)

    if true, raises when library not found

Returns:

  • (Boolean)

    true if registration succeeded

Raises:

  • (NotAvailable)

    if library not found and raise_on_missing is true



283
284
285
286
287
288
289
290
291
292
293
294
# File 'lib/tree_haver/grammar_finder.rb', line 283

def register!(raise_on_missing: false)
  path = find_library_path
  unless path
    if raise_on_missing
      raise NotAvailable, not_found_message
    end
    return false
  end

  TreeHaver.register_language(@language_name, path: path, symbol: symbol_name)
  true
end

#search_infoHash

Get debug information about the search

Returns:

  • (Hash)

    diagnostic information



299
300
301
302
303
304
305
306
307
308
309
310
311
312
# File 'lib/tree_haver/grammar_finder.rb', line 299

def search_info
  found = find_library_path # This populates @env_rejection_reason
  {
    language: @language_name,
    env_var: env_var_name,
    env_value: ENV[env_var_name],
    env_rejection_reason: @env_rejection_reason,
    symbol: symbol_name,
    library_filename: library_filename,
    search_paths: search_paths,
    found_path: found,
    available: !found.nil?,
  }
end

#search_pathsArray<String>

Generate the full list of search paths for this language

Order: ENV override, extra_paths, then common system paths

Returns:

  • (Array<String>)

    all paths to search



114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# File 'lib/tree_haver/grammar_finder.rb', line 114

def search_paths
  paths = []

  # Extra paths provided at initialization (searched after ENV)
  @extra_paths.each do |dir|
    paths << File.join(dir, library_filename)
  end

  # Common system paths with platform-appropriate extension
  BASE_SEARCH_DIRS.each do |dir|
    paths << File.join(dir, library_filename)
  end

  paths
end

#symbol_nameString

Get the expected symbol name exported by the grammar library

Returns:

  • (String)

    the symbol name (e.g., “tree_sitter_toml”)



97
98
99
# File 'lib/tree_haver/grammar_finder.rb', line 97

def symbol_name
  "tree_sitter_#{@language_name}"
end

#validate_env_path(path) ⇒ String?

Validate an environment variable path and return reason if invalid

Returns:

  • (String, nil)

    rejection reason or nil if valid



170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
# File 'lib/tree_haver/grammar_finder.rb', line 170

def validate_env_path(path)
  # Check for leading/trailing whitespace
  if path != path.strip
    return "contains leading or trailing whitespace (use #{path.strip.inspect})"
  end

  # Check if path is safe
  unless PathValidator.safe_library_path?(path)
    return "failed security validation (may contain path traversal or suspicious characters)"
  end

  # Check if file exists
  unless File.exist?(path)
    return "file does not exist"
  end

  nil # Valid!
end