Table of Contents

Scope

This gem is aimed to collect a set of file paths starting by a wildcard rule, filter them by any default/custom filters (access time, size range) and apply a set of custom policies to them.

Motivation

This gem is helpful to purge obsolete files or to promote relevant ones, by calling external services (CDN APIs) and/or local file system actions (copy, move, delete, etc).

Installation

Add this line to your application's Gemfile:

gem "file_scanner"

And then execute:

bundle

Or install it yourself as:

gem install file_scanner

Usage

Loader

The first step is to create a Loader instance by specifying the path where the files need to be scanned with optional extensions list:

require "file_scanner"

loader = FileScanner::Loader.new(path: ENV["HOME"], extensions: %w[html txt])

Filters

The second step is to provide the filters list to select file paths for which the call method is truthy.
Selection is done with the any? predicate, so also one matching filter will select the path.

Default

If you specify no filters the existing ones will select files by:

  • checking if file is older than 30 days
  • checking if file size is within 0KB and 5KB
  • checking if file basename matches the specified regexp (if any)

You can update default behaviours by passing custom arguments:

a_week_ago = FileScanner::Filters::LastAccess.new(Time.now-7*24*3600)
one_two_mb = FileScanner::Filters::SizeRange.new(min: 1024**2, max: 2*1024**2)
hidden = FileScanner::Filters::MatchingName.new(/^\./)

filters = []
filters << a_week_ago
filters << one_two_mb
filters << hidden

Custom

It is convenient to create custom filters by creating Proc instances that satisfy the callable protocol:

filters << ->(file) { File.directory?(file) }

Policies

The third step is creating custom policies objects (no defaults exist) to be applied to the list of filtered paths.
Again, it suffice the policy responds to the call method and accepts an array of paths as unique argument:

require "fileutils"

remove_from_disk = ->(paths) do
  FileUtils.rm_rf(paths)
end

policies = []
policies << remove_from_disk

Worker

Now that you have all of the collaborators in place, you can create the Worker instance:

worker = FileScanner::Worker.new(loader: loader, filters: filters, policies: policies)
worker.call # apply all the specified policies to the filtered file paths

Slice of files

In case you are going to scan a large number of files, it is better to work in batches.
The Worker constructor accepts a slice attribute to better distribute loading (no sleep by default, use block syntax):

worker = FileScanner::Worker.new(loader: loader, policies: policies, slice: 1000)
worker.call # call policies by slice of 1000 files with default filters

Block syntax

In case you prefer to specify the policies inside a block for a more granular control on the slice of paths, you must omit the policies argument and use the block syntax:

worker = FileScanner::Worker.new(loader: loader)
worker.call do |slice|
  policy = ->(slice) { FileUtils.chmod_R(0700, slice) }
  policy.call
  sleep 10 # wait 10 seconds before slurping next slice 
end

Use a logger

If you dare to trace what the worker is doing (including errors), you can specify a logger to the worker class:

my_logger = Logger.new("my_file.log")

worker = FileScanner::Worker.new(loader: loader, logger: my_logger)
worker.call do |slice|
  fail "Doh!" # will log error to my_file.log and re-raise exception
end