samplelines
A simple command line tool to take samples lines from a file or set of files.
samplelines [options] [filename(s) and/or STDIN and/or STDERR]
-v, --version print version info
-h, --help print usage information
-p, --percent set % chance of a line being output [trumps use of -1]
-1, --one-in compute odds of outputing a line as 1 out of every N
-m, --max return at most this many lines
-t, --time stop running after N seconds
Either -p
or -1
is required. Filenames (or the special strings STDIN
and STDERR) will be processed in the order you specify them.
As a convenience, samplelines
will automatically deal with gzipped files
that end in .gz
Note that there is no guarantee of output. If you give a low percentage and a small file, there's a chance nothing will be randomly chosen.
Examples
# Pick about 10% of the lines from bigfile.txt
samplelines -p 10 bigfile.txt
# Ditto, but get a maximum of 100 lines
samplelines -p 10 -m 10 bigfile.txt
# This time, get about one out of every 500 lines
samplelines -1 500 -m 10 bigfile.txt
# Get about 2% of the lines from a set of gzipped files
samplelines -p 2 *.txt.gz
# Ditto, but don't get more than 10_000 lines or
# run for more than five seconds
samplelines -p 2 -t 5 -m 10000 *.txt.gz
Installation
Add this line to your application's Gemfile:
gem 'samplelines'
And then execute:
$ bundle
Or install it yourself as:
$ gem install samplelines
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request