RubyDoc.info: File: README – Documentation for file_scanner (2.0.4)

Scope
Motivation
Installation
Usage
- Loader
- Filters
- Defaults
- Custom
- Worker
- Batches
- Enumerator
- Logger
- Factory

Scope

This gem is aimed to collect a set of file paths starting by a wildcard rule, filter them by any default/custom filters (access time, matching name and size range) and apply a set of actions via a block call.

Motivation

This gem is helpful to purge obsolete files or to promote relevant ones, by calling external services (CDN APIs) and/or local file system actions (copy, move, delete, etc).

Installation

Add this line to your application's Gemfile:

gem "file_scanner"

And then execute:

bundle

Or install it yourself as:

gem install file_scanner

Usage

Loader

The first step is to create a Loader instance by specifying the path where the files need to be scanned with optional extensions list:

require "file_scanner"

loader = FileScanner::Loader.new(path: ENV["HOME"], extensions: %w[html txt])

Filters

The second step is to provide the filters list to select file paths for which the call method is truthy.
Selection is done with the any? predicate, so also one matching filter will do the selection.

Defaults

If you specify no filters the default ones are loaded, selecting files by:

checking if file is older than 30 days
checking if file size is within 0KB and 5KB
checking if file basename matches the specified regexp (if any)

You can update default filters behaviours by passing custom arguments:

a_week_ago = FileScanner::Filters::LastAccess.new(Time.now-7*24*3600)
one_two_mb = FileScanner::Filters::SizeRange.new(min: 1024**2, max: 2*1024**2)
hidden = FileScanner::Filters::MatchingName.new(/^\./)
filters = [a_week_ago, one_two_mb, hidden]

Custom

It is convenient to create custom filters by creating Proc instances that satisfy the callable protocol:

filters << ->(file) { File.directory?(file) }

Worker

Now that you have all of the collaborators in place, you can create the Worker instance to performs actions on the filtered paths:

worker = FileScanner::Worker.new(loader: loader, filters: filters)
worker.call do |paths|
  # do whatever you want with the paths list
end

Batches

In case you are going to scan a large number of files, it is suggested to work in batches.
The Worker constructor accepts a slice attribute to give you a chance to distribute loading:

worker = FileScanner::Worker.new(loader: loader, slice: 1000)
worker.call do |slice|
  # perform action 1000 paths per time
end

Enumerator

In case you want access the sliced enumerator directly, just do not pass a block to the method:

slices = worker.call
count = slices.flatten.size

Logger

If you dare to trace what the worker is doing (including errors), you can specify a logger to the worker class:

my_logger = Logger.new("my_file.log")
worker = FileScanner::Worker.new(loader: loader, logger: my_logger)
worker.call do |slice|
  fail "Doh!" # will log error to my_file.log and re-raise exception
end

If you want to easily pass the same logger instance to the actions you are performing, it's available as the second argument of the block:

require "fileutils"

worker.call do |slice, logger|
  logger.info { "going to remove #{slice.size} files from disk!" }
  FileUtils.rm_rf(slice)
end

Factory

You can create loader and worker instances at once by using the available factory:

worker = FileScanner::Worker.factory(path: ENV["HOME"], extensions: %w[html txt], filters: filters, logger: my_logger, slice: 1000)
worker.call do |slice, logger|
  # perform action 1000 paths per time
end

Table of Contents