Class: TreeHaver::GrammarFinder
- Inherits:
-
Object
- Object
- TreeHaver::GrammarFinder
- Defined in:
- lib/tree_haver/grammar_finder.rb
Overview
Generic utility for finding tree-sitter grammar shared libraries.
GrammarFinder provides platform-aware discovery of tree-sitter grammar libraries. Given a language name, it searches common installation paths and supports environment variable overrides.
This class is designed to be used by language-specific merge gems (toml-merge, json-merge, bash-merge, etc.) without requiring TreeHaver to have knowledge of each specific language.
Security Considerations
Loading shared libraries is inherently dangerous as it executes arbitrary native code. GrammarFinder performs the following security validations:
-
Language names are validated to contain only safe characters
-
Paths from environment variables are validated before use
-
Path traversal attempts (../) are rejected
-
Only files with expected extensions (.so, .dylib, .dll) are accepted
For additional security, use #find_library_path_safe which only returns paths from trusted system directories.
Constant Summary collapse
- BASE_SEARCH_DIRS =
Common base directories where tree-sitter libraries are installed Platform-specific extensions are appended automatically
[ "/usr/lib", "/usr/lib64", "/usr/local/lib", "/opt/homebrew/lib", ].freeze
- TREE_SITTER_BACKENDS =
Backends that use tree-sitter (require native runtime libraries) Other backends (Citrus, Prism, Psych, etc.) don’t use tree-sitter
[ TreeHaver::Backends::MRI, TreeHaver::Backends::FFI, TreeHaver::Backends::Rust, TreeHaver::Backends::Java, ].freeze
Instance Attribute Summary collapse
-
#extra_paths ⇒ Array<String>
readonly
Additional search paths provided at initialization.
-
#language_name ⇒ Symbol
readonly
The language identifier.
Class Method Summary collapse
-
.reset_runtime_check! ⇒ Object
private
Reset the cached tree-sitter runtime check (for testing).
-
.tree_sitter_runtime_usable? ⇒ Boolean
Check if the tree-sitter runtime is usable.
Instance Method Summary collapse
-
#available? ⇒ Boolean
Check if the grammar library is available AND usable.
-
#available_safe? ⇒ Boolean
Check if the grammar library is available in a trusted directory.
-
#env_var_name ⇒ String
Get the environment variable name for this language.
-
#find_library_path ⇒ String?
Find the grammar library path.
-
#find_library_path_safe ⇒ String?
Find the grammar library path with strict security validation.
-
#initialize(language_name, extra_paths: [], validate: true) ⇒ GrammarFinder
constructor
Initialize a grammar finder for a specific language.
-
#library_filename ⇒ String
Get the library filename for the current platform.
-
#not_found_message ⇒ String
Get a human-readable error message when library is not found.
-
#register!(raise_on_missing: false) ⇒ Boolean
Register this language with TreeHaver.
-
#search_info ⇒ Hash
Get debug information about the search.
-
#search_paths ⇒ Array<String>
Generate the full list of search paths for this language.
-
#symbol_name ⇒ String
Get the expected symbol name exported by the grammar library.
-
#validate_env_path(path) ⇒ String?
Validate an environment variable path and return reason if invalid.
Constructor Details
#initialize(language_name, extra_paths: [], validate: true) ⇒ GrammarFinder
Initialize a grammar finder for a specific language
75 76 77 78 79 80 81 82 83 84 85 |
# File 'lib/tree_haver/grammar_finder.rb', line 75 def initialize(language_name, extra_paths: [], validate: true) name_str = language_name.to_s.downcase if validate && !PathValidator.safe_language_name?(name_str) raise ArgumentError, "Invalid language name: #{language_name.inspect}. " \ "Language names must start with a letter and contain only lowercase letters, numbers, and underscores." end @language_name = name_str.to_sym @extra_paths = Array(extra_paths) end |
Instance Attribute Details
#extra_paths ⇒ Array<String> (readonly)
Returns additional search paths provided at initialization.
67 68 69 |
# File 'lib/tree_haver/grammar_finder.rb', line 67 def extra_paths @extra_paths end |
#language_name ⇒ Symbol (readonly)
Returns the language identifier.
64 65 66 |
# File 'lib/tree_haver/grammar_finder.rb', line 64 def language_name @language_name end |
Class Method Details
.reset_runtime_check! ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Reset the cached tree-sitter runtime check (for testing)
262 263 264 |
# File 'lib/tree_haver/grammar_finder.rb', line 262 def reset_runtime_check! remove_instance_variable(:@tree_sitter_runtime_usable) if defined?(@tree_sitter_runtime_usable) end |
.tree_sitter_runtime_usable? ⇒ Boolean
Check if the tree-sitter runtime is usable
Tests whether we can actually create a tree-sitter parser. Result is cached since this is expensive and won’t change during runtime.
239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 |
# File 'lib/tree_haver/grammar_finder.rb', line 239 def tree_sitter_runtime_usable? return @tree_sitter_runtime_usable if defined?(@tree_sitter_runtime_usable) @tree_sitter_runtime_usable = begin # Try to create a parser using the current backend mod = TreeHaver.resolve_backend_module(nil) # Only tree-sitter backends are relevant here # Non-tree-sitter backends (Citrus, Prism, Psych, etc.) don't use grammar files return false if mod.nil? return false unless TREE_SITTER_BACKENDS.include?(mod) # Try to instantiate a parser - this will fail if runtime isn't available mod::Parser.new true rescue NoMethodError, FFI::NotFoundError, LoadError, NotAvailable => _e false end end |
Instance Method Details
#available? ⇒ Boolean
Check if the grammar library is available AND usable
This checks:
-
The grammar library file exists
-
The tree-sitter runtime is functional (can create a parser)
This prevents registering grammars when tree-sitter isn’t actually usable, allowing clean fallback to alternative backends like Citrus.
214 215 216 217 218 219 220 221 |
# File 'lib/tree_haver/grammar_finder.rb', line 214 def available? path = find_library_path return false if path.nil? # Check if tree-sitter runtime is actually functional # This is cached at the class level since it's the same for all grammars self.class.tree_sitter_runtime_usable? end |
#available_safe? ⇒ Boolean
Check if the grammar library is available in a trusted directory
271 272 273 |
# File 'lib/tree_haver/grammar_finder.rb', line 271 def available_safe? !find_library_path_safe.nil? end |
#env_var_name ⇒ String
Get the environment variable name for this language
90 91 92 |
# File 'lib/tree_haver/grammar_finder.rb', line 90 def env_var_name "TREE_SITTER_#{@language_name.to_s.upcase}_PATH" end |
#find_library_path ⇒ String?
Paths from ENV are validated using PathValidator.safe_library_path? to prevent path traversal and other attacks. Invalid ENV paths are ignored.
Setting the ENV variable to an empty string explicitly disables this grammar. This allows fallback to alternative backends (e.g., Citrus).
Find the grammar library path
Searches in order:
-
Environment variable override (validated for safety)
-
Extra paths provided at initialization
-
Common system installation paths
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
# File 'lib/tree_haver/grammar_finder.rb', line 145 def find_library_path # Check environment variable first (highest priority) # Use key? to distinguish between "not set" and "set to empty" if ENV.key?(env_var_name) env_path = ENV[env_var_name] # Empty string means "explicitly skip this grammar" # This allows users to disable tree-sitter for specific languages # and fall back to alternative backends like Citrus if env_path.empty? @env_rejection_reason = "explicitly disabled (set to empty string)" return end # Store why env path was rejected for better error messages @env_rejection_reason = validate_env_path(env_path) return env_path if @env_rejection_reason.nil? end # Search all paths (these are constructed from trusted base dirs) search_paths.find { |path| File.exist?(path) } end |
#find_library_path_safe ⇒ String?
Find the grammar library path with strict security validation
This method only returns paths that are in trusted system directories. Use this when you want maximum security and don’t need to support custom installation locations.
197 198 199 200 201 202 |
# File 'lib/tree_haver/grammar_finder.rb', line 197 def find_library_path_safe # Environment variable is NOT checked in safe mode - only trusted system paths search_paths.find do |path| File.exist?(path) && PathValidator.in_trusted_directory?(path) end end |
#library_filename ⇒ String
Get the library filename for the current platform
104 105 106 107 |
# File 'lib/tree_haver/grammar_finder.rb', line 104 def library_filename ext = platform_extension "libtree-sitter-#{@language_name}#{ext}" end |
#not_found_message ⇒ String
Get a human-readable error message when library is not found
317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 |
# File 'lib/tree_haver/grammar_finder.rb', line 317 def msg = "tree-sitter #{@language_name} grammar not found." # Check if env var is set but rejected env_value = ENV[env_var_name] msg += if env_value && @env_rejection_reason " #{env_var_name} is set to #{env_value.inspect} but #{@env_rejection_reason}." elsif env_value " #{env_var_name} is set but was not used (file may have been removed)." else " Searched: #{search_paths.join(", ")}." end msg + " Install tree-sitter-#{@language_name} or set #{env_var_name} to a valid path." end |
#register!(raise_on_missing: false) ⇒ Boolean
Register this language with TreeHaver
After registration, the language can be loaded via dynamic method (e.g., ‘TreeHaver::Language.toml`).
283 284 285 286 287 288 289 290 291 292 293 294 |
# File 'lib/tree_haver/grammar_finder.rb', line 283 def register!(raise_on_missing: false) path = find_library_path unless path if raise_on_missing raise NotAvailable, end return false end TreeHaver.register_language(@language_name, path: path, symbol: symbol_name) true end |
#search_info ⇒ Hash
Get debug information about the search
299 300 301 302 303 304 305 306 307 308 309 310 311 312 |
# File 'lib/tree_haver/grammar_finder.rb', line 299 def search_info found = find_library_path # This populates @env_rejection_reason { language: @language_name, env_var: env_var_name, env_value: ENV[env_var_name], env_rejection_reason: @env_rejection_reason, symbol: symbol_name, library_filename: library_filename, search_paths: search_paths, found_path: found, available: !found.nil?, } end |
#search_paths ⇒ Array<String>
Generate the full list of search paths for this language
Order: ENV override, extra_paths, then common system paths
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
# File 'lib/tree_haver/grammar_finder.rb', line 114 def search_paths paths = [] # Extra paths provided at initialization (searched after ENV) @extra_paths.each do |dir| paths << File.join(dir, library_filename) end # Common system paths with platform-appropriate extension BASE_SEARCH_DIRS.each do |dir| paths << File.join(dir, library_filename) end paths end |
#symbol_name ⇒ String
Get the expected symbol name exported by the grammar library
97 98 99 |
# File 'lib/tree_haver/grammar_finder.rb', line 97 def symbol_name "tree_sitter_#{@language_name}" end |
#validate_env_path(path) ⇒ String?
Validate an environment variable path and return reason if invalid
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
# File 'lib/tree_haver/grammar_finder.rb', line 170 def validate_env_path(path) # Check for leading/trailing whitespace if path != path.strip return "contains leading or trailing whitespace (use #{path.strip.inspect})" end # Check if path is safe unless PathValidator.safe_library_path?(path) return "failed security validation (may contain path traversal or suspicious characters)" end # Check if file exists unless File.exist?(path) return "file does not exist" end nil # Valid! end |