bio-cnls_screenscraper

bio-cnls_screenscraper is a programmatic biogem interface to nls-mapper.iab.keio.ac.jp/cgi-bin/NLS_Mapper_form.cgi - a server for prediction of importin α-dependent nuclear localization signals.

First, cache the results for each sequence in your amino acid sequence fasta file. This contacts the cNLS server once for each sequence, waiting 1 second in between so as not to overload the server. Each result is saved as a separate HTML file, so it is best to do this command in an empty directory.

mkdir cNLS_cache
cd cNLS_cache
bio-nls_screenscraper.rb -h <fasta_file> 2>cNLS_caching.err

Then parse these HTML files and collate into a single tab-separated values file. Perhaps best to put the results file not in the cache directory. The parsing uses the default cutoff of 8.0 for monopartite NLSs, and 7.0 for bipartite NLSs.

bio-nls_screenscraper.rb -cp >../cNLS_results.csv

Some sequences are unacceptable to the cNLS server - sequences that are too short (<19 aa), too long, or contain non-standard amino acids such as ‘X’.

Contributing to bio-cnls_screenscraper

  • Check out the latest master to make sure the feature hasn’t been implemented or the bug hasn’t been fixed yet

  • Check out the issue tracker to make sure someone already hasn’t requested it and/or contributed it

  • Fork the project

  • Start a feature/bugfix branch

  • Commit and push until you are happy with your contribution

  • Make sure to add tests for it. This is important so I don’t break it in a future version unintentionally.

  • Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.

Copyright © 2011 Ben J. Woodcroft. See LICENSE.txt for further details.