README.DEV

Copyright

Copyright (C) 2005, 2006 Toshiaki Katayama <k@bioruby.org>

Copyright

Copyright (C) 2006, 2008 Jan Aerts <jandot@bioruby.org>

Copyright

Copyright (C) 2011 Naohisa Goto <ng@bioruby.org>

HOW TO CONTRIBUTE TO THE BIORUBY PROJECT?

There are many possible ways to contribute to the BioRuby project, such as:

  • Join the discussion on the BioRuby mailing list

  • Send a bug report or write a bug fix patch

  • Add and correct documentation

  • Develop code for new features, etc.

All of these are welcome! This document mainly focuses on the last option, how to contribute your code to the BioRuby distribution. This may also be helpful when you send large patches for existing codes.

We would like to include your contribution as long as the scope of your module meets the field of bioinformatics.

Git

Bioruby is now under git source control at github.com/bioruby/bioruby. There are two basic ways to contribute: with patches or pull requests. Both are explained on the bioruby wiki at bioruby.open-bio.org/wiki.

Preparation before sending patches or pull requests

Before sending patches or pull requests, rewriting history and reordering or selecting patches are recommended. See “Creating the perfect patch series” in the Git User's Manual. www.kernel.org/pub/software/scm/git/docs/user-manual.html#patch-series

Sending your contribution

With patches

You can send patches with git-format-patch. For a smaller change, unified diff (diff -u) without using git can also be accepted.

With pull requests

We are happy if your commits can be pulled with fast-forward. For the purpose, using git-rebase before sending pull request is recommended. See “Keeping a patch series up to date using git rebase” in the Git User's Manual. www.kernel.org/pub/software/scm/git/docs/user-manual.html#using-git-rebase

Notes for the treatment of contributions in the blessed repository

Merging policy

We do not always merge your commits as is. We may edit, rewrite, reorder, select, and/or mix your commits before and/or after merging to the blessed repository.

Git commit management policy

We want to keep the commit history linear as far as possible, because it is easy to find problems and regressions in commits. See “Why bisecting merge commits can be harder than bisecting linear history” in the Git User's Manual. www.kernel.org/pub/software/scm/git/docs/user-manual.html#bisect-merges

Note that the above policy is only for the main 'blessed' repository, and it does not aim to restrict each user's fork.

LICENSE

If you would like your module to be included in the BioRuby distribution, you need to give us right to change the license of your module to make it compatible with other modules in BioRuby.

BioRuby was previously distributed under the LGPL license, but now is distributed under the same terms as Ruby.

CODING STYLE

You will need to follow the typical coding styles of the BioRuby modules:

Use the following naming conventions

  • CamelCase for module and class names

  • '_'-separated_lowercase for method names

  • '_'-separated_lowercase for variable names

  • all UPPERCASE for constants

Indentation must not include tabs

  • Use 2 spaces for indentation.

  • Don't replace spaces to tabs.

Parenthesis in the method definition line should be written

  • Good: def example(str, ary)

  • Discouraged: def example str, ary

Comments

Don't use =begin and =end blocks for comments. If you need to add comments, include it in the RDoc documentation.

Documentation should be written in the RDoc format in the source code

The RDoc format is becoming the popular standard for Ruby documentation. We are now in transition from the previously used RD format to the RDoc format in API documentation.

Additional tutorial documentation and working examples are encouraged with your contribution. You may use the header part of the file for this purpose as demonstrated in the previous section.

Standard documentation

of files

Each file should start with a header, which covers the following topics:

  • copyright

  • license

  • description of the file (not the classes; see below)

  • any references, if appropriate

The header should be formatted as follows:

#
# = bio/db/hoge.rb - Hoge database parser classes
#
# Copyright::  Copyright (C) 2001, 2003-2005 Bio R. Hacker <brh@example.org>,
# Copyright::  Copyright (C) 2006 Chem R. Hacker <crh@example.org>
#
# License::    The Ruby License
#
# == Description
#
# This file contains classes that implement an interface to the Hoge database.
#
# == References
#
# * Hoge F. et al., The Hoge database, Nucleic. Acid. Res. 123:100--123 (2030)
# * http://hoge.db/
#

require 'foo'

module Bio

  autoload :Bar, 'bio/bar'

class Hoge
  :
end  # Hoge

end  # Bio

of classes and methods within those files

Classes and methods should be documented in a standardized format, as in the following example (from lib/bio/sequence.rb):

# == Description
#
# Bio::Sequence objects represent annotated sequences in bioruby.
# A Bio::Sequence object is a wrapper around the actual sequence, 
# represented as either a Bio::Sequence::NA or a Bio::Sequence::AA object.
# For most users, this encapsulation will be completely transparent.
# Bio::Sequence responds to all methods defined for Bio::Sequence::NA/AA
# objects using the same arguments and returning the same values (even though 
# these methods are not documented specifically for Bio::Sequence).
#
# == Usage
# 
#   require 'bio'
#   
#   # Create a nucleic or amino acid sequence
#   dna = Bio::Sequence.auto('atgcatgcATGCATGCAAAA')
#   rna = Bio::Sequence.auto('augcaugcaugcaugcaaaa')
#   aa = Bio::Sequence.auto('ACDEFGHIKLMNPQRSTVWYU')
# 
#   # Print in FASTA format
#   puts dna.output(:fasta)
# 
#   # Print all codons
#   dna.window_search(3,3) do |codon|
#     puts codon
#   end
# 
class Sequence

  # Create a new Bio::Sequence object
  #
  #   s = Bio::Sequence.new('atgc')
  #   puts s                                  # => 'atgc'
  #
  # Note that this method does not intialize the contained sequence
  # as any kind of bioruby object, only as a simple string
  #
  #   puts s.seq.class                        # => String
  #
  # See Bio::Sequence#na, Bio::Sequence#aa, and Bio::Sequence#auto 
  # for methods to transform the basic String of a just created 
  # Bio::Sequence object to a proper bioruby object
  # ---
  # *Arguments*:
  # * (required) _str_: String or Bio::Sequence::NA/AA object
  # *Returns*:: Bio::Sequence object
  def initialize(str)
    @seq = str
  end

  # The sequence identifier.  For example, for a sequence
  # of Genbank origin, this is the accession number.
  attr_accessor :entry_id

  # An Array of Bio::Feature objects
  attr_accessor :features
end # Sequence

Preceding the class definition (class Sequence), there is at least a description and a usage example. Please use the Description and Usage headings. If appropriate, refer to other classes that interact with or are related to the class.

The code in the usage example should, if possible, be in a format that a user can copy-and-paste into a new script to run. It should illustrate the most important uses of the class. If possible and if it would not clutter up the example too much, try to provide any input data directly into the usage example, instead of refering to ARGV or ARGF for input.

dna = Bio::Sequence.auto('atgcatgcATGCATGCAAAA')

Otherwise, describe the input shortly, for example:

# input should be string consisting of nucleotides
dna = Bio::Sequence.auto(ARGF.read)

Methods should be preceded by a comment that describes what the method does, including any relevant usage examples. (In contrast to the documentation for the class itself, headings are not required.) In addition, any arguments should be listed, as well as the type of thing that is returned by the method. The format of this information is as follows:

# ---
# *Arguments*:
# * (required) _str_: String or Bio::Sequence::NA
# * (optional) _nr_: a number that means something
# *Returns*:: true or false

Attribute accessors can be preceded by a short description.

# P-value (Float)
attr_reader :pvalue

For writing rdoc documentation, putting two or more attributes in a line (such as attr_reader :evalue, :pvalue) is strongly discouraged.

Methods looks like attributes can also be preceded by a short description.

# Scientific name (String)
def scientific_name
  #...
end

# Scientific name (String)
def scientific_name=(str)
  #...
end

Exception handling

Don't use

$stderr.puts "WARNING"

in your code. Instead, try to avoid printing error messages. For fatal errors, use raise with an appropriate message.

Kernel#warn can only be used to notice incompatible changes to programmers. Typically it may be used for deprecated or obsolete usage of a method. For example,

warn "The Foo#bar method is obsoleted. Use Foo#baz instead."

Testing code should use 'test/unit'

Unit tests should come with your modules by which you can assure what you meant to do with each method. The test code is useful to make maintenance easy and ensure stability. The use of

if __FILE__ == $0

is deprecated.

Using autoload

To quicken the initial load time we have replaced most of 'require' to 'autoload' since BioRuby version 0.7. During this change, we have found some tips:

You should not separate the same namespace into several files.

  • For example, if you have separated definitions of the Bio::Foo class into two files (e.g. 'bio/foo.rb' and 'bio/bar.rb'), you need to resolve the dependencies (including the load order) yourself.

  • If you have a defined Bio::Foo in 'bio/foo.rb' and a defined Bio::Foo::Bar in 'bio/foo/bar.rb' add the following line in the 'bio/foo.rb' file:

    autoload :Bar, 'bio/foo/bar'

You should not put several top level namespaces in one file.

  • For example, if you have Bio::A, Bio::B and Bio::C in the file 'bio/foo.rb', you need

    autoload :A, 'bio/foo'
    autoload :B, 'bio/foo'
    autoload :C, 'bio/foo'

    to load the module automatically (instead of require 'bio/foo'). In this case, you should put them under the new namespace like Bio::Foo::A, Bio::Foo::B and Bio::Foo::C in the file 'bio/foo', then use

    autoload :Foo, 'bio/foo'

    so autoload can be written in 1 line.

NAMESPACE

Your module should be located under the top-level module Bio and put under the 'bioruby/lib/bio' directory. The class/module names and the file names should be short and descriptive.

There are already several sub directories in 'bioruby/lib':

bio/*.rb   -- general and widely used basic classes
bio/appl/  -- wrapper and parser for the external applications
bio/data/  -- basic biological data
bio/db/    -- flatfile database entry parsers
bio/io/    -- I/O interfaces for files, RDB, web services etc.
bio/util/  -- utilities and algorithms for bioinformatics

If your module doesn't match any of the above, please propose an appropriate directory name when you contribute. Please let the staff discuss on namespaces (class names), API (method names) before commiting a new module or making changes on existing modules.

MAINTENANCE

Finally, please maintain the code you've contributed. Please let us know (on the bioruby list) before you commit, so that users can discuss on the change.

RUBY VERSION and IMPLEMENTATION

We are mainly using Ruby MRI (Matz' Ruby Implementation, or Matz' Ruby Interpreter). Please confirm that your code is running on current stable release versions of Ruby MRI.

We are very happy if your code can run on both Ruby 1.8.x and 1.9.x. Note that Ruby 1.9.0 should be ignored because it was discontinued. Ruby 1.8.5 or earlier versions can also be ignored. See README.rdoc and RELEASE_NOTES.rdoc for recommended Ruby versions.

It is welcome to support JRuby, Rubinius, etc, in addition to Ruby MRI.

Of course, it is strongly encouraged to write code that is not affected by differences between Ruby versions and/or implementations, as far as possible.

OS and ARCHITECTURE

We hope BioRuby can be run on both UNIX (and UNIX-like OS) and Microsoft Windows.