Module: TreeHaver::PathValidator

Defined in:
lib/tree_haver/path_validator.rb

Overview

Note:

These validations provide defense-in-depth but cannot guarantee safety. Loading shared libraries from untrusted sources is always risky.

Security utilities for validating paths and inputs before loading shared libraries.

Loading shared libraries (.so/.dylib/.dll) is inherently dangerous as it executes arbitrary native code. This module provides defense-in-depth validations to reduce the attack surface when paths come from potentially untrusted sources like environment variables or user input.

Examples:

Validate a path before loading

path = ENV["TREE_SITTER_TOML_PATH"]
if TreeHaver::PathValidator.safe_library_path?(path)
  language = TreeHaver::Language.from_library(path)
else
  raise "Unsafe path: #{path}"
end

Register custom trusted directories

# For Homebrew on Linux (linuxbrew)
TreeHaver::PathValidator.add_trusted_directory("/home/linuxbrew/.linuxbrew/Cellar")

# For luarocks-installed grammars
TreeHaver::PathValidator.add_trusted_directory("~/.local/share/mise/installs/lua")

# Or via environment variable (comma-separated)
# export TREE_HAVER_TRUSTED_DIRS="/home/linuxbrew/.linuxbrew/Cellar,~/.local/share/mise"

Constant Summary collapse

ALLOWED_EXTENSIONS =

Allowed shared library extensions by platform

%w[.so .dylib .dll].freeze
DEFAULT_TRUSTED_DIRECTORIES =

Default directories that are generally trusted for system libraries These are searched by the dynamic linker anyway

[
  "/usr/lib",
  "/usr/lib64",
  "/usr/lib/x86_64-linux-gnu",
  "/usr/lib/aarch64-linux-gnu",
  "/usr/local/lib",
  "/opt/homebrew/lib",
  "/opt/local/lib",
].freeze
TRUSTED_DIRS_ENV_VAR =

Environment variable for adding trusted directories (comma-separated)

"TREE_HAVER_TRUSTED_DIRS"
MAX_PATH_LENGTH =

Maximum reasonable path length (prevents DoS via extremely long paths)

4096
VALID_FILENAME_PATTERN =

Pattern for valid library filenames (alphanumeric, hyphens, underscores, dots) This prevents shell metacharacters and other injection attempts

/\A[a-zA-Z0-9][a-zA-Z0-9._-]*\z/
VALID_LANGUAGE_PATTERN =

Pattern for valid language names (lowercase alphanumeric and underscores)

/\A[a-z][a-z0-9_]*\z/
VALID_SYMBOL_PATTERN =

Pattern for valid symbol names (C identifier format)

/\A[a-zA-Z_][a-zA-Z0-9_]*\z/

Class Method Summary collapse

Class Method Details

.add_trusted_directory(directory) ⇒ void

This method returns an undefined value.

Register a custom trusted directory

Use this to add directories where you install tree-sitter grammars, such as Homebrew locations, luarocks paths, or other package managers.

Examples:

Register linuxbrew directory

TreeHaver::PathValidator.add_trusted_directory("/home/linuxbrew/.linuxbrew/Cellar")

Register user’s luarocks directory

TreeHaver::PathValidator.add_trusted_directory("~/.local/share/mise/installs/lua")

Parameters:

  • directory (String)

    absolute path to trust (~ is expanded)

Raises:

  • (ArgumentError)

    if directory is not an absolute path



106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# File 'lib/tree_haver/path_validator.rb', line 106

def add_trusted_directory(directory)
  expanded = File.expand_path(directory)

  # :nocov:
  # File.expand_path always returns absolute paths on Unix/macOS.
  # This guard exists for defensive programming on exotic platforms
  # where expand_path might behave differently, but cannot be tested
  # in standard CI environments.
  unless expanded.start_with?("/")
    raise ArgumentError, "Trusted directory must be an absolute path: #{directory.inspect}"
  end
  # :nocov:

  @mutex.synchronize do
    @custom_trusted_directories << expanded unless @custom_trusted_directories.include?(expanded)
  end
  nil
end

.clear_custom_trusted_directories!void

This method returns an undefined value.

Clear all custom trusted directories

Does not affect DEFAULT_TRUSTED_DIRECTORIES or ENV-based directories. Primarily useful for testing.



141
142
143
144
# File 'lib/tree_haver/path_validator.rb', line 141

def clear_custom_trusted_directories!
  @mutex.synchronize { @custom_trusted_directories.clear }
  nil
end

.custom_trusted_directoriesArray<String>

Get the list of custom trusted directories (for debugging)

Returns:

  • (Array<String>)

    list of custom registered directories



149
150
151
# File 'lib/tree_haver/path_validator.rb', line 149

def custom_trusted_directories
  @mutex.synchronize { @custom_trusted_directories.dup }
end

.has_valid_extension?(path) ⇒ Boolean

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Check if path has a valid library extension Allows: .so, .dylib, .dll, and versioned .so files like .so.0, .so.14

Returns:

  • (Boolean)


342
343
344
345
346
347
348
349
350
351
# File 'lib/tree_haver/path_validator.rb', line 342

def has_valid_extension?(path)
  # Check for exact matches first (.so, .dylib, .dll)
  return true if ALLOWED_EXTENSIONS.any? { |ext| path.end_with?(ext) }

  # Check for versioned .so files (Linux convention)
  # e.g., libtree-sitter.so.0, libtree-sitter.so.14
  return true if path.match?(/\.so\.\d+\z/)

  false
end

.in_trusted_directory?(path) ⇒ Boolean

Check if a path is within a trusted directory

Checks against DEFAULT_TRUSTED_DIRECTORIES, custom registered directories, and directories from TREE_HAVER_TRUSTED_DIRS environment variable.

Parameters:

  • path (String)

    the path to check

Returns:

  • (Boolean)

    true if the path is in a trusted directory



208
209
210
211
212
213
214
215
216
# File 'lib/tree_haver/path_validator.rb', line 208

def in_trusted_directory?(path)
  return false if path.nil?

  # Resolve the real path to handle symlinks
  check_path = resolve_check_path(path)
  return false if check_path.nil?

  trusted_directories.any? { |trusted| check_path.start_with?(trusted) }
end

.remove_trusted_directory(directory) ⇒ void

This method returns an undefined value.

Remove a custom trusted directory

Parameters:

  • directory (String)

    the directory to remove



129
130
131
132
133
# File 'lib/tree_haver/path_validator.rb', line 129

def remove_trusted_directory(directory)
  expanded = File.expand_path(directory)
  @mutex.synchronize { @custom_trusted_directories.delete(expanded) }
  nil
end

.resolve_check_path(path) ⇒ String?

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Resolve a path to its real path for trust checking

Parameters:

  • path (String)

    the path to resolve

Returns:

  • (String, nil)

    the resolved path or nil if unresolvable



223
224
225
226
227
228
229
230
231
232
233
# File 'lib/tree_haver/path_validator.rb', line 223

def resolve_check_path(path)
  File.realpath(path)
rescue Errno::ENOENT
  # File doesn't exist yet, check the directory
  dir = File.dirname(path)
  begin
    File.realpath(dir)
  rescue Errno::ENOENT
    nil
  end
end

.safe_backend_name?(backend) ⇒ Boolean

Validate a backend name

Parameters:

  • backend (String, Symbol, nil)

    the backend name

Returns:

  • (Boolean)

    true if it’s a valid backend name



279
280
281
282
283
# File 'lib/tree_haver/path_validator.rb', line 279

def safe_backend_name?(backend)
  return true if backend.nil? # nil means :auto

  i[auto mri rust ffi java].include?(backend.to_s.to_sym)
end

.safe_language_name?(name) ⇒ Boolean

Validate a language name is safe

Language names are used to construct:

  • Environment variable names (TREE_SITTER_<LANG>_PATH)

  • Library filenames (libtree-sitter-<lang>.so)

  • Symbol names (tree_sitter_<lang>)

Examples:

PathValidator.safe_language_name?(:toml)  # => true
PathValidator.safe_language_name?("json") # => true
PathValidator.safe_language_name?("../../etc") # => false

Parameters:

  • name (String, Symbol, nil)

    the language name to validate

Returns:

  • (Boolean)

    true if the name is safe



249
250
251
252
253
254
255
256
257
# File 'lib/tree_haver/path_validator.rb', line 249

def safe_language_name?(name)
  return false if name.nil?

  name_str = name.to_s
  return false if name_str.empty?
  return false if name_str.length > 64 # Reasonable limit

  name_str.match?(VALID_LANGUAGE_PATTERN)
end

.safe_library_path?(path, require_trusted_dir: false) ⇒ Boolean

Validate a path is safe for loading as a shared library

Checks performed:

  • Path is not nil or empty

  • Path length is reasonable

  • Path is absolute (no relative path traversal)

  • Path has an allowed extension

  • Path does not contain null bytes

  • Filename portion matches safe pattern

Examples:

PathValidator.safe_library_path?("/usr/lib/libtree-sitter-toml.so")
# => true

PathValidator.safe_library_path?("../../../tmp/evil.so")
# => false

Parameters:

  • path (String, nil)

    the path to validate

  • require_trusted_dir (Boolean) (defaults to: false)

    if true, path must be in a trusted directory

Returns:

  • (Boolean)

    true if the path passes all safety checks



173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
# File 'lib/tree_haver/path_validator.rb', line 173

def safe_library_path?(path, require_trusted_dir: false)
  return false if path.nil? || path.empty?
  return false if path.length > MAX_PATH_LENGTH
  return false if path.include?("\0") # Null byte injection

  # Must be absolute path (prevents relative path traversal)
  return false unless path.start_with?("/") || windows_absolute_path?(path)

  # Check for path traversal attempts
  return false if path.include?("/../") || path.end_with?("/..")
  return false if path.include?("/./") || path.end_with?("/.")

  # Validate extension
  # Allow versioned .so files like .so.0, .so.14, etc. (common on Linux)
  return false unless has_valid_extension?(path)

  # Validate filename portion
  filename = File.basename(path)
  return false unless filename.match?(VALID_FILENAME_PATTERN)

  # Optionally require the path to be in a trusted directory
  if require_trusted_dir
    return false unless in_trusted_directory?(path)
  end

  true
end

.safe_symbol_name?(symbol) ⇒ Boolean

Validate a symbol name is safe for dlsym lookup

Examples:

PathValidator.safe_symbol_name?("tree_sitter_toml") # => true
PathValidator.safe_symbol_name?("evil; rm -rf /")   # => false

Parameters:

  • symbol (String, nil)

    the symbol name to validate

Returns:

  • (Boolean)

    true if the symbol name is safe



267
268
269
270
271
272
273
# File 'lib/tree_haver/path_validator.rb', line 267

def safe_symbol_name?(symbol)
  return false if symbol.nil?
  return false if symbol.empty?
  return false if symbol.length > 256 # Reasonable limit

  symbol.match?(VALID_SYMBOL_PATTERN)
end

.sanitize_language_name(name) ⇒ Symbol?

Sanitize a language name for safe use

Examples:

PathValidator.sanitize_language_name("TOML")  # => :toml
PathValidator.sanitize_language_name("c++")   # => nil (invalid)

Parameters:

  • name (String, Symbol)

    the language name

Returns:

  • (Symbol, nil)

    sanitized name or nil if invalid



293
294
295
296
297
298
299
300
301
# File 'lib/tree_haver/path_validator.rb', line 293

def sanitize_language_name(name)
  return if name.nil?

  sanitized = name.to_s.downcase.gsub(/[^a-z0-9_]/, "")
  return if sanitized.empty?
  return unless sanitized.match?(/\A[a-z]/)

  sanitized.to_sym
end

.trusted_directoriesArray<String>

Get all trusted directories (default + custom + from ENV)

Returns:

  • (Array<String>)

    list of all trusted directory prefixes



71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# File 'lib/tree_haver/path_validator.rb', line 71

def trusted_directories
  dirs = DEFAULT_TRUSTED_DIRECTORIES.dup

  # Add custom registered directories
  @mutex.synchronize { dirs.concat(@custom_trusted_directories) }

  # Add directories from environment variable
  ENV[TRUSTED_DIRS_ENV_VAR]&.split(",")&.each do |dir|
    expanded = File.expand_path(dir.strip)
    # :nocov:
    # File.expand_path always returns absolute paths on Unix/macOS.
    # This guard exists for defensive programming on exotic platforms
    # where expand_path might behave differently, but cannot be tested
    # in standard CI environments.
    dirs << expanded if expanded.start_with?("/")
    # :nocov:
  end

  dirs.uniq
end

.validation_errors(path) ⇒ Array<String>

Get validation errors for a path (for debugging/error messages)

Parameters:

  • path (String, nil)

    the path to validate

Returns:

  • (Array<String>)

    list of validation errors (empty if valid)



307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
# File 'lib/tree_haver/path_validator.rb', line 307

def validation_errors(path)
  errors = []

  if path.nil? || path.empty?
    errors << "Path is nil or empty"
    return errors
  end

  errors << "Path exceeds maximum length (#{MAX_PATH_LENGTH})" if path.length > MAX_PATH_LENGTH
  errors << "Path contains null byte" if path.include?("\0")
  errors << "Path is not absolute" unless path.start_with?("/") || windows_absolute_path?(path)
  errors << "Path contains traversal sequence (/../)" if path.include?("/../") || path.end_with?("/..")
  errors << "Path contains traversal sequence (/./)" if path.include?("/./") || path.end_with?("/.")

  unless has_valid_extension?(path)
    errors << "Path does not have allowed extension (.so, .so.X, .dylib, .dll)"
  end

  filename = File.basename(path)
  unless filename.match?(VALID_FILENAME_PATTERN)
    errors << "Filename contains invalid characters"
  end

  errors
end

.windows_absolute_path?(path) ⇒ Boolean

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Returns:

  • (Boolean)


334
335
336
337
# File 'lib/tree_haver/path_validator.rb', line 334

def windows_absolute_path?(path)
  # Match Windows absolute paths like C:\path or D:/path
  path.match?(/\A[A-Za-z]:[\\\/]/)
end