Class: SchemaSherlock::PerformanceOptimizer

Inherits:

Object

Object
SchemaSherlock::PerformanceOptimizer

show all

Defined in:: lib/schema_sherlock/performance_optimizer.rb

Overview

Centralized performance optimization for file and pattern processing

Constant Summary collapse

SMALL_FILE_THRESHOLD = File size thresholds for processing strategies

64 * 1024

LARGE_FILE_THRESHOLD = 64KB

1024 * 1024

Class Method Summary collapse

.count_patterns_optimized(content, table_name, column_name) ⇒ Object

Fast pattern matching with pre-filtering.
.filter_relevant_files(file_paths, column_name) ⇒ Object

Smart file filtering to reduce I/O.
.process_files_parallel(file_paths, table_name, column_name) ⇒ Object

Parallel file processing with optimal thread count.
.read_file_optimized(file_path) ⇒ Object

High-performance file reading with size-based optimization.

Class Method Details

.count_patterns_optimized(content, table_name, column_name) ⇒ `Object`

Fast pattern matching with pre-filtering

# File 'lib/schema_sherlock/performance_optimizer.rb', line 28

def count_patterns_optimized(content, table_name, column_name)
  # Early exit if content is empty or too short
  return 0 if content.nil? || content.length < column_name.length
  
  # Quick pre-filter: check if column name exists at all
  content_lower = content.downcase
  column_lower = column_name.downcase
  
  # If column name doesn't appear anywhere, skip expensive matching
  return 0 unless content_lower.include?(column_lower)
  
  # Use optimized scanner
  OptimizedScanner.count_column_references_native(content, table_name, column_name)
end

.filter_relevant_files(file_paths, column_name) ⇒ `Object`

Smart file filtering to reduce I/O

# File 'lib/schema_sherlock/performance_optimizer.rb', line 75

def filter_relevant_files(file_paths, column_name)
  # For very large sets, do a quick filename-based filter first
  if file_paths.size > 1000
    # Filter by filename patterns that are likely to contain the column
    association_name = column_name.gsub(/_id$/, '')
    relevant_patterns = [column_name, association_name, 'model', 'service', 'query']
    
    file_paths.select do |path|
      filename = File.basename(path, '.rb').downcase
      relevant_patterns.any? { |pattern| filename.include?(pattern) }
    end
  else
    file_paths
  end
end

.process_files_parallel(file_paths, table_name, column_name) ⇒ `Object`

Parallel file processing with optimal thread count

# File 'lib/schema_sherlock/performance_optimizer.rb', line 44

def process_files_parallel(file_paths, table_name, column_name)
  return 0 if file_paths.empty?
  
  # Limit threads to avoid overwhelming the system
  max_threads = [Concurrent.processor_count, file_paths.size, 8].min
  
  futures = []
  thread_pool = Concurrent::FixedThreadPool.new(max_threads)
  
  file_paths.each do |file_path|
    future = Concurrent::Future.execute(executor: thread_pool) do
      content = read_file_optimized(file_path)
      count_patterns_optimized(content, table_name, column_name)
    end
    futures << future
  end
  
  # Collect results efficiently
  total_count = futures.sum do |future|
    future.value || 0
  rescue
    0
  end
  
  thread_pool.shutdown
  thread_pool.wait_for_termination(5)
  
  total_count
end

.read_file_optimized(file_path) ⇒ `Object`

High-performance file reading with size-based optimization

# File 'lib/schema_sherlock/performance_optimizer.rb', line 10

def read_file_optimized(file_path)
  return "" unless File.exist?(file_path) && File.readable?(file_path)
  
  file_size = File.size(file_path)
  return "" if file_size == 0
  
  if file_size < LARGE_FILE_THRESHOLD
    # Small/medium files: direct read
    File.read(file_path, encoding: 'UTF-8', invalid: :replace, undef: :replace)
  else
    # Large files: chunked reading with buffer
    read_large_file_chunked(file_path)
  end
rescue
  ""
end

Class: SchemaSherlock::PerformanceOptimizer

Overview

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.count_patterns_optimized(content, table_name, column_name) ⇒ Object

.filter_relevant_files(file_paths, column_name) ⇒ Object

.process_files_parallel(file_paths, table_name, column_name) ⇒ Object

.read_file_optimized(file_path) ⇒ Object

.count_patterns_optimized(content, table_name, column_name) ⇒ `Object`

.filter_relevant_files(file_paths, column_name) ⇒ `Object`

.process_files_parallel(file_paths, table_name, column_name) ⇒ `Object`

.read_file_optimized(file_path) ⇒ `Object`