Build Status

bio-kmer_counter is a simple biogem for fingerprinting nucleotide sequences by counting the occurences of particular kmers in the sequence. The methodology is not new, for references see Teeling et. al. 2004. The default parameters are derived from the methods section of Dick et. al. 2009.

This methodology is quite different to that of other software that counts kmer content with longer kmers, e.g. khmer. Here only small kmers are intended (e.g. 1mer or 4mer).

Note: this software is under active development!


gem install bio-kmer_counter


To analyse a fasta file (that contains one or more sequences in it) for 4-mer (tetranucleotide) content, reporting the fingerprint of 5kb windows in each sequence separately, plus the leftover part if it is longer than 2kb:

kmer_counter.rb <fasta_file> >tetranucleotide_content.csv

The fingerprints are reported in percentages. Well, between 0 and 1, that is. From there it is up to you how to use the fingerprints, sorry. For the full gamut of options, see

kmer_counter.rb -h

Project home page

Information on the source tree, documentation, examples, issues and how to contribute, see

The BioRuby community is on IRC server:, channel: #bioruby.


This software is currently unpublished, so please just cite the homepage (thanks!).

Please also cite the tools upon which it is based, one of:


Copyright (c) 2012 Ben J Woodcroft. See LICENSE.txt for further details.