Table of Contents
Scope
This gem is aimed to collect a set of file paths starting by a wildcard rule, filter them by any default/custom filters (access time, matching name and size range) and apply a set of actions via a block call.
Motivation
This gem is helpful to purge obsolete files or to promote relevant ones, by calling external services (CDN APIs) and/or local file system actions (copy, move, delete, etc).
Installation
Add this line to your application's Gemfile:
gem "file_scanner"
And then execute:
bundle
Or install it yourself as:
gem install file_scanner
Usage
Loader
The first step is to create a Loader instance by specifying the path where the files need to be scanned with optional extensions list:
require "file_scanner"
loader = FileScanner::Loader.new(path: ENV["HOME"], extensions: %w[html txt])
Filters
The second step is to provide the filters list to select file paths for which the call method is truthy.
Selection is done with the any? predicate, so also one matching filter will do the selection.
Defaults
If you specify no filters the default ones are loaded, selecting files by:
- checking if file is older than 30 days
- checking if file size is within 0KB and 5KB
- checking if file basename matches the specified regexp (if any)
You can update default filters behaviours by passing custom arguments:
a_week_ago = FileScanner::Filters::LastAccess.new(Time.now-7*24*3600)
one_two_mb = FileScanner::Filters::SizeRange.new(min: 1024**2, max: 2*1024**2)
hidden = FileScanner::Filters::MatchingName.new(/^\./)
filters = [a_week_ago, one_two_mb, hidden]
Custom
It is convenient to create custom filters by creating Proc instances that satisfy the callable protocol:
filters << ->(file) { File.directory?(file) }
Worker
Now that you have all of the collaborators in place, you can create the Worker instance to performs actions on the filtered paths:
worker = FileScanner::Worker.new(loader: loader, filters: filters)
worker.call do |paths|
# do whatever you want with the paths list
end
Batches
In case you are going to scan a large number of files, it is suggested to work in batches.
The Worker constructor accepts a slice attribute to give you a chance to distribute loading:
worker = FileScanner::Worker.new(loader: loader, slice: 1000)
worker.call do |slice|
# perform action 1000 paths per time
end
Enumerator
In case you want access the sliced enumerator directly, just do not pass a block to the method:
slices = worker.call
count = slices.flatten.size
Logger
If you dare to trace what the worker is doing (including errors), you can specify a logger to the worker class:
my_logger = Logger.new("my_file.log")
worker = FileScanner::Worker.new(loader: loader, logger: my_logger)
worker.call do |slice|
fail "Doh!" # will log error to my_file.log and re-raise exception
end
If you want to easily pass the same logger instance to the actions you are performing, it's available as the second argument of the block:
require "fileutils"
worker.call do |slice, logger|
logger.info { "going to remove #{slice.size} files from disk!" }
FileUtils.rm_rf(slice)
end
Factory
You can create loader and worker instances at once by using the available factory:
worker = FileScanner::Worker.factory(path: ENV["HOME"], extensions: %w[html txt], filters: filters, logger: my_logger, slice: 1000)
worker.call do |slice, logger|
# perform action 1000 paths per time
end