Module: TreeHaver::PathValidator

Defined in:
lib/tree_haver/path_validator.rb

Overview

Note:

These validations provide defense-in-depth but cannot guarantee safety. Loading shared libraries from untrusted sources is always risky.

Security utilities for validating paths and inputs before loading shared libraries.

Loading shared libraries (.so/.dylib/.dll) is inherently dangerous as it executes arbitrary native code. This module provides defense-in-depth validations to reduce the attack surface when paths come from potentially untrusted sources like environment variables or user input.

Examples:

Validate a path before loading

path = ENV["TREE_SITTER_TOML_PATH"]
if TreeHaver::PathValidator.safe_library_path?(path)
  language = TreeHaver::Language.from_library(path)
else
  raise "Unsafe path: #{path}"
end

Register custom trusted directories

# For Homebrew on Linux (linuxbrew)
TreeHaver::PathValidator.add_trusted_directory("/home/linuxbrew/.linuxbrew/Cellar")

# For luarocks-installed grammars
TreeHaver::PathValidator.add_trusted_directory("~/.local/share/mise/installs/lua")

# Or via environment variable (comma-separated)
# export TREE_HAVER_TRUSTED_DIRS="/home/linuxbrew/.linuxbrew/Cellar,~/.local/share/mise"

Constant Summary collapse

ALLOWED_EXTENSIONS =

Allowed shared library extensions by platform

%w[.so .dylib .dll].freeze
DEFAULT_TRUSTED_DIRECTORIES =

Default directories that are generally trusted for system libraries These are searched by the dynamic linker anyway

[
  "/usr/lib",
  "/usr/lib64",
  "/usr/lib/x86_64-linux-gnu",
  "/usr/lib/aarch64-linux-gnu",
  "/usr/local/lib",
  "/opt/homebrew/lib",
  "/opt/local/lib",
].freeze
TRUSTED_DIRS_ENV_VAR =

Environment variable for adding trusted directories (comma-separated)

"TREE_HAVER_TRUSTED_DIRS"
MAX_PATH_LENGTH =

Maximum reasonable path length (prevents DoS via extremely long paths)

4096
VALID_FILENAME_PATTERN =

Pattern for valid library filenames (alphanumeric, hyphens, underscores, dots) This prevents shell metacharacters and other injection attempts

/\A[a-zA-Z0-9][a-zA-Z0-9._-]*\z/
VALID_LANGUAGE_PATTERN =

Pattern for valid language names (lowercase alphanumeric and underscores)

/\A[a-z][a-z0-9_]*\z/
VALID_SYMBOL_PATTERN =

Pattern for valid symbol names (C identifier format)

/\A[a-zA-Z_][a-zA-Z0-9_]*\z/

Class Method Summary collapse

Class Method Details

.add_trusted_directory(directory) ⇒ void

This method returns an undefined value.

Register a custom trusted directory

Use this to add directories where you install tree-sitter grammars, such as Homebrew locations, luarocks paths, or other package managers.

Examples:

Register linuxbrew directory

TreeHaver::PathValidator.add_trusted_directory("/home/linuxbrew/.linuxbrew/Cellar")

Register user’s luarocks directory

TreeHaver::PathValidator.add_trusted_directory("~/.local/share/mise/installs/lua")

Raises:

  • (ArgumentError)

    if directory is not an absolute path



106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# File 'lib/tree_haver/path_validator.rb', line 106

def add_trusted_directory(directory)
  expanded = File.expand_path(directory)

  # :nocov:
  # File.expand_path always returns absolute paths on Unix/macOS.
  # This guard exists for defensive programming on exotic platforms
  # where expand_path might behave differently, but cannot be tested
  # in standard CI environments.
  unless expanded.start_with?("/")
    raise ArgumentError, "Trusted directory must be an absolute path: #{directory.inspect}"
  end
  # :nocov:

  @mutex.synchronize do
    @custom_trusted_directories << expanded unless @custom_trusted_directories.include?(expanded)
  end
  nil
end

.clear_custom_trusted_directories!void

This method returns an undefined value.

Clear all custom trusted directories

Does not affect DEFAULT_TRUSTED_DIRECTORIES or ENV-based directories. Primarily useful for testing.



141
142
143
144
# File 'lib/tree_haver/path_validator.rb', line 141

def clear_custom_trusted_directories!
  @mutex.synchronize { @custom_trusted_directories.clear }
  nil
end

.custom_trusted_directoriesArray<String>

Get the list of custom trusted directories (for debugging)



149
150
151
# File 'lib/tree_haver/path_validator.rb', line 149

def custom_trusted_directories
  @mutex.synchronize { @custom_trusted_directories.dup }
end

.has_valid_extension?(path) ⇒ Boolean

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Check if path has a valid library extension Allows: .so, .dylib, .dll, and versioned .so files like .so.0, .so.14



342
343
344
345
346
347
348
349
350
351
# File 'lib/tree_haver/path_validator.rb', line 342

def has_valid_extension?(path)
  # Check for exact matches first (.so, .dylib, .dll)
  return true if ALLOWED_EXTENSIONS.any? { |ext| path.end_with?(ext) }

  # Check for versioned .so files (Linux convention)
  # e.g., libtree-sitter.so.0, libtree-sitter.so.14
  return true if path.match?(/\.so\.\d+\z/)

  false
end

.in_trusted_directory?(path) ⇒ Boolean

Check if a path is within a trusted directory

Checks against DEFAULT_TRUSTED_DIRECTORIES, custom registered directories, and directories from TREE_HAVER_TRUSTED_DIRS environment variable.



208
209
210
211
212
213
214
215
216
# File 'lib/tree_haver/path_validator.rb', line 208

def in_trusted_directory?(path)
  return false if path.nil?

  # Resolve the real path to handle symlinks
  check_path = resolve_check_path(path)
  return false if check_path.nil?

  trusted_directories.any? { |trusted| check_path.start_with?(trusted) }
end

.remove_trusted_directory(directory) ⇒ void

This method returns an undefined value.

Remove a custom trusted directory



129
130
131
132
133
# File 'lib/tree_haver/path_validator.rb', line 129

def remove_trusted_directory(directory)
  expanded = File.expand_path(directory)
  @mutex.synchronize { @custom_trusted_directories.delete(expanded) }
  nil
end

.resolve_check_path(path) ⇒ String?

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Resolve a path to its real path for trust checking



223
224
225
226
227
228
229
230
231
232
233
# File 'lib/tree_haver/path_validator.rb', line 223

def resolve_check_path(path)
  File.realpath(path)
rescue Errno::ENOENT
  # File doesn't exist yet, check the directory
  dir = File.dirname(path)
  begin
    File.realpath(dir)
  rescue Errno::ENOENT
    nil
  end
end

.safe_backend_name?(backend) ⇒ Boolean

Validate a backend name



279
280
281
282
283
# File 'lib/tree_haver/path_validator.rb', line 279

def safe_backend_name?(backend)
  return true if backend.nil? # nil means :auto

  %i[auto mri rust ffi java].include?(backend.to_s.to_sym)
end

.safe_language_name?(name) ⇒ Boolean

Validate a language name is safe

Language names are used to construct:

  • Environment variable names (TREE_SITTER_<LANG>_PATH)

  • Library filenames (libtree-sitter-<lang>.so)

  • Symbol names (tree_sitter_<lang>)

Examples:

PathValidator.safe_language_name?(:toml)  # => true
PathValidator.safe_language_name?("json") # => true
PathValidator.safe_language_name?("../../etc") # => false


249
250
251
252
253
254
255
256
257
# File 'lib/tree_haver/path_validator.rb', line 249

def safe_language_name?(name)
  return false if name.nil?

  name_str = name.to_s
  return false if name_str.empty?
  return false if name_str.length > 64 # Reasonable limit

  name_str.match?(VALID_LANGUAGE_PATTERN)
end

.safe_library_path?(path, require_trusted_dir: false) ⇒ Boolean

Validate a path is safe for loading as a shared library

Checks performed:

  • Path is not nil or empty

  • Path length is reasonable

  • Path is absolute (no relative path traversal)

  • Path has an allowed extension

  • Path does not contain null bytes

  • Filename portion matches safe pattern

Examples:

PathValidator.safe_library_path?("/usr/lib/libtree-sitter-toml.so")
# => true

PathValidator.safe_library_path?("../../../tmp/evil.so")
# => false


173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
# File 'lib/tree_haver/path_validator.rb', line 173

def safe_library_path?(path, require_trusted_dir: false)
  return false if path.nil? || path.empty?
  return false if path.length > MAX_PATH_LENGTH
  return false if path.include?("\0") # Null byte injection

  # Must be absolute path (prevents relative path traversal)
  return false unless path.start_with?("/") || windows_absolute_path?(path)

  # Check for path traversal attempts
  return false if path.include?("/../") || path.end_with?("/..")
  return false if path.include?("/./") || path.end_with?("/.")

  # Validate extension
  # Allow versioned .so files like .so.0, .so.14, etc. (common on Linux)
  return false unless has_valid_extension?(path)

  # Validate filename portion
  filename = File.basename(path)
  return false unless filename.match?(VALID_FILENAME_PATTERN)

  # Optionally require the path to be in a trusted directory
  if require_trusted_dir
    return false unless in_trusted_directory?(path)
  end

  true
end

.safe_symbol_name?(symbol) ⇒ Boolean

Validate a symbol name is safe for dlsym lookup

Examples:

PathValidator.safe_symbol_name?("tree_sitter_toml") # => true
PathValidator.safe_symbol_name?("evil; rm -rf /")   # => false


267
268
269
270
271
272
273
# File 'lib/tree_haver/path_validator.rb', line 267

def safe_symbol_name?(symbol)
  return false if symbol.nil?
  return false if symbol.empty?
  return false if symbol.length > 256 # Reasonable limit

  symbol.match?(VALID_SYMBOL_PATTERN)
end

.sanitize_language_name(name) ⇒ Symbol?

Sanitize a language name for safe use

Examples:

PathValidator.sanitize_language_name("TOML")  # => :toml
PathValidator.sanitize_language_name("c++")   # => nil (invalid)


293
294
295
296
297
298
299
300
301
# File 'lib/tree_haver/path_validator.rb', line 293

def sanitize_language_name(name)
  return if name.nil?

  sanitized = name.to_s.downcase.gsub(/[^a-z0-9_]/, "")
  return if sanitized.empty?
  return unless sanitized.match?(/\A[a-z]/)

  sanitized.to_sym
end

.trusted_directoriesArray<String>

Get all trusted directories (default + custom + from ENV)



71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# File 'lib/tree_haver/path_validator.rb', line 71

def trusted_directories
  dirs = DEFAULT_TRUSTED_DIRECTORIES.dup

  # Add custom registered directories
  @mutex.synchronize { dirs.concat(@custom_trusted_directories) }

  # Add directories from environment variable
  ENV[TRUSTED_DIRS_ENV_VAR]&.split(",")&.each do |dir|
    expanded = File.expand_path(dir.strip)
    # :nocov:
    # File.expand_path always returns absolute paths on Unix/macOS.
    # This guard exists for defensive programming on exotic platforms
    # where expand_path might behave differently, but cannot be tested
    # in standard CI environments.
    dirs << expanded if expanded.start_with?("/")
    # :nocov:
  end

  dirs.uniq
end

.validation_errors(path) ⇒ Array<String>

Get validation errors for a path (for debugging/error messages)



307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
# File 'lib/tree_haver/path_validator.rb', line 307

def validation_errors(path)
  errors = []

  if path.nil? || path.empty?
    errors << "Path is nil or empty"
    return errors
  end

  errors << "Path exceeds maximum length (#{MAX_PATH_LENGTH})" if path.length > MAX_PATH_LENGTH
  errors << "Path contains null byte" if path.include?("\0")
  errors << "Path is not absolute" unless path.start_with?("/") || windows_absolute_path?(path)
  errors << "Path contains traversal sequence (/../)" if path.include?("/../") || path.end_with?("/..")
  errors << "Path contains traversal sequence (/./)" if path.include?("/./") || path.end_with?("/.")

  unless has_valid_extension?(path)
    errors << "Path does not have allowed extension (.so, .so.X, .dylib, .dll)"
  end

  filename = File.basename(path)
  unless filename.match?(VALID_FILENAME_PATTERN)
    errors << "Filename contains invalid characters"
  end

  errors
end

.windows_absolute_path?(path) ⇒ Boolean

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.



334
335
336
337
# File 'lib/tree_haver/path_validator.rb', line 334

def windows_absolute_path?(path)
  # Match Windows absolute paths like C:\path or D:/path
  path.match?(/\A[A-Za-z]:[\\\/]/)
end