BaseSpace Ruby SDK
Bio::BaseSpace is a Ruby based SDK to be used in the development of Apps and scripts for working with Illumina's BaseSpace cloud-computing solution for next-gen sequencing data analysis.
The primary purpose of the SDK is to provide an easy-to-use Ruby environment enabling developers to authenticate a user, retrieve data, and upload data/results from their own analysis to BaseSpace.
Note: It will be necessary to have created a BaseSpace account with a new App and have the client_key and client_secret codes for the App available to run a number of the following examples.
Availability
Current version of Bio::BaseSpace can be checked out here:
git clone https://github.com/joejimbo/basespace-ruby-sdk.git
or by,
git clone git@github.com:joejimbo/basespace-ruby-sdk.git
Note: We will make the SDK available as a Ruby gem when our implementation is production ready. It will then also be listed on Biogems.info.
Setup
Requirements: Ruby 1.9.3 and above. The multi-part file upload will currently only run on a Unix setup.
You can include 'Bio::BaseSpace' by setting below environmental variable:
export RUBYLIB=/path/to/basespace-ruby-sdk/lib/
or add it to your Ruby scripts using Bio::BaseSpace:
$: << '/path/to/basespace-ruby-sdk/lib/'
To test that everything is working as expected, launch a Interactive Ruby and try importing 'Bio::BaseSpace':
$ irb
>> require 'basespace'
>> include Bio::BaseSpace
Application triggering
This section demonstrates how to retrieve the AppSession object produced when a user triggers a BaseSpace App.
Further, we cover how to automatically generate the scope strings to request access to the data object (be it a project or a sample) that the App was triggered to analyze.
The initial http request to our App from BaseSpace is identified by an ApplicationActionId, using this piece of information
we are able to obtain information about the user who launched the App and the data that is sought analyzed by the App.
First, we instantiate a BaseSpaceAPI object using the client_key and client_secret codes provided on the BaseSpace developer's website when registering our App, as well as the AppSessionId generated from the app-triggering:
require 'basespace'
include Bio::BaseSpace
# initialize an authentication object using the key and secret from your app
# Fill in with your own values
client_id = 'my client key'
client_secret = 'my client secret'
app_session_id = 'my app session id'
basespace_url = 'https://api.basespace.illumina.com/'
api_version = 'v1pre3'
# First we will initialize a BaseSpace API object using our app information and the appSessionId
bs_api = BaseSpaceAPI.new(client_id, client_secret, basespace_url, api_version, app_session_id)
# Using the bmy_app_session.spaceApi we can request the appSession object corresponding to the AppSession id supplied
my_app_session = bs_api.get_app_session
puts my_app_session
# An app session contains a referal to one or more appLaunchObjects which reference the data module
# the user launched the app on. This can be a list of projects, samples, or a mixture of objects
puts "Type of data the app was triggered on can be seen in 'references'"
puts my_app_session.references.inspect # .inspect is used to put surrounding []
puts
The output will be:
App session by 600602: Toshiaki Katayama - Id: <my app session id> - status: Complete
Type of data the app was triggered on can be seen in 'references'
[Project]
We can also get a handle to the user who started the AppSession and further information on the AppLaunchObject:
# We can also get a handle to the user who started the AppSession
puts "We can get a handle for the user who triggered the app"
puts my_app_session.user_created_by
puts
# Let's have a closer look at the appSessionLaunchObject
my_reference = my_app_session.references.first
puts "We can get out information such as the href to the launch object:"
puts my_reference.href_content
puts
puts "and the specific type of that object:"
puts my_reference.type
puts
The output will be:
We can get a handle for the user who triggered the app
13039: Eri Kibukawa
We can get out information such as the href to the launch object:
v1pre3/projects/848850
and the specific type of that object:
Project
To start working, we will want to expand our permission scope for the trigger object so we can read and write data. The details of this process is the subject of the next section. We end this section by demonstrating how one can easily obtain the so-called "scope string" and make the access request:
puts "\nWe can get out the specific project objects by using 'content':"
my_reference_content = my_reference.content
puts my_reference_content
puts "\nThe scope string for requesting write access to the reference object is:"
puts my_reference_content.get_access_str('write')
The output will be: We can get out the specific project objects by using 'content': MyProject - id=848850
The scope string for requesting write access to the reference object is:
write project 848850
We can easily request write access to the reference object so our App can start contributing analysis by default we ask for write permission to and authentication for a device:
access_map = bs_api.get_access(my_reference_content, 'write')
puts "We get the following access map"
puts access_map
The output will be:
We get the following access map
{"device_code"=>"<my device code>", "user_code"=>"<my user code>", "verification_uri"=>"https://basespace.illumina.com/oauth/device", "verification_with_code_uri"=>"https://basespace.illumina.com/oauth/device?code=<my user code>", "expires_in"=>1800, "interval"=>1}
Have the user visit the verification uri to grant us access
puts "\nPlease visit the uri within 15 seconds and grant access"
puts access_map['verification_with_code_uri']
The output will be:
Please visit the uri within 15 seconds and grant access
https://basespace.illumina.com/oauth/device?code=<my user code>
Accept for this test code through web browser
link = access_map['verification_with_code_uri']
host = RbConfig::CONFIG['host_os']
case host
when /mswin|mingw|cygwin/
system("start #{link}")
when /darwin/
system("open #{link}")
when /linux/
system("xdg-open #{link}")
end
sleep(15)
Once the user has granted us access to objects we requested we can get the BaseSpace access-token and start browsing simply by calling updatePriviliges on the baseSpaceApi instance:
code = access_map['device_code']
bs_api.update_privileges(code)
puts "The BaseSpaceAPI instance was update with write privileges"
The output will be:
The BaseSpaceAPI instance was update with write privileges
For more details on access-requests and authentication and an example of the web-based case see example 1_authentication.rb
Requesting an access-token for data browsing
Here we demonstrate the basic BaseSpace authentication process. The work-flow outlined here is
- Request of access to a specific data-scope
- User approval of access request
- Browsing data
Note: It will be useful if you are logged in to the BaseSpace web-site before launching this example to make the access grant procedure faster.
Again, we will start out by initializing a BaseSpaceAPI object:
require 'basespace'
include Bio::BaseSpace
client_id = 'my client key'
client_secret = 'my client secret'
app_session_id = 'my app session id'
basespace_url = 'https://api.basespace.illumina.com/'
api_version = 'v1pre3'
# First we will initialize a BaseSpace API object using our app information and the appSessionId
bs_api = BaseSpaceAPI.new(client_id, client_secret, basespace_url, api_version, app_session_id)
First, get the verification code and uri for scope 'browse global'
device_info = bs_api.get_verification_code('browse global')
puts
puts "URL for user to visit and grant access: "
puts device_info['verification_with_code_uri']
At this point the user must visit the verification uri to grant us access
## PAUSE HERE
# Have the user visit the verification uri to grant us access
puts "\nPlease visit the uri within 15 seconds and grant access"
puts device_info['verification_with_code_uri']
link = device_info['verification_with_code_uri']
host = RbConfig::CONFIG['host_os']
case host
when /mswin|mingw|cygwin/
system("start #{link}")
when /darwin/
system("open #{link}")
when /linux/
system("xdg-open #{link}")
end
sleep(15)
## PAUSE HERE
The output will be:
URL for user to visit and grant access:
https://basespace.illumina.com/oauth/device?code=<my code>
Please visit the uri within 15 seconds and grant access
https://basespace.illumina.com/oauth/device?code=<my code>
Once the user has granted us access to objects we requested, we can get the basespace access_token and start browsing simply by calling updatePriviliges on the baseSpaceApi instance.
code = device_info['device_code']
bs_api.update_privileges(code)
As a reference the provided access-token can be obtained from the BaseSpaceApi object
puts "My Access-token: #{bs_api.get_access_token}"
puts
The output will be:
My Access-token:
<my access-token>
At this point we can start using the BaseSpaceAPI instance to browse the available data for the current user, the details of this process is the subject of the next section. Here we will end with showing how the API object can be used to list all BaseSpace genome instances:
# We will get all available genomes with our new api!
all_genomes = bs_api.get_available_genomes
puts "Genomes: #{all_genomes}"
The output will be:
Genomes
[Arabidopsis thaliana, Bos Taurus, Escherichia coli, Homo sapiens, Mus musculus, Phix,\
Rhodobacter sphaeroides, Rattus norvegicus, Saccharomyces cerevisiae, Staphylococcus aureus]
Browsing data with global browse access
This section demonstrates basic browsing of BaseSpace objects once an access-token for global browsing has been obtained. We will see how
objects can be retrieved using either the BaseSpaceAPI class or by use of method calls on related object instances (for example once
a user instance we can use it to retrieve all project belonging to that user).
First we will initialize a BaseSpaceAPI using our access-token for global browse:
require 'basespace'
include Bio::BaseSpace
# REST server information and user access_token
client_id = 'my client key'
client_secret = 'my client secret'
access_token = 'your access token'
app_session_id = 'my app session id'
basespace_url = 'https://api.basespace.illumina.com/'
api_version = 'v1pre3'
# First, create a client for making calls for this user session
my_api = BaseSpaceAPI.new(client_id, client_secret, basespace_url, api_version, app_session_id, access_token)
First we will try to retrieve a genome object:
# Now grab the genome with id=4
my_genome = my_api.get_genome_by_id('4')
puts "The Genome is #{my_genome}"
puts "We can get more information from the genome object"
puts "Id: #{my_genome.id}"
puts "Href: #{my_genome.href}"
puts "DisplayName: #{my_genome.display_name}"
The output will be:
The Genome is Homo sapiens
We can get more information from the genome object
Id: 4
Href: v1pre3/genomes/4
DisplayName: Homo Sapiens - UCSC (hg19)
Using a comparable method we can get a list of all available genomes:
# Get a list of all genomes
all_genomes = my_api.get_available_genomes
puts "Genomes: #{all_genomes}"
The output will be:
Genomes
[Arabidopsis thaliana, Bos Taurus, Escherichia coli, Homo sapiens, Mus musculus, Phix,\
Rhodobacter sphaeroides, Rattus norvegicus, Saccharomyces cerevisiae, Staphylococcus aureus]
Now, let us retrieve the User objects for the current user, and list all projects for this user:
# Take a look at the current user
user = my_api.get_user_by_id('current')
puts "The current user is #{user}"
puts
# Now list the projects for this user
my_projects = my_api.get_project_by_user('current')
puts "The projects for this user are #{my_projects}"
puts
The output will be:
[BaseSpaceDemo - id=2, Cancer Sequencing Demo - id=4, HiSeq 2500 - id=7, ResequencingPhixRun - id=12, TrainingRun - id=114, Note - id=165, 120313-tra - id=606, S.abortusequi-17_L2508 - id=619, TSChIP-Seq - id=14042, BCereusDemoData_Illumina - id=34061]
The current user is
<user id>: Your Name
The projects for this user are
[BaseSpaceDemo - id=2, Cancer Sequencing Demo - id=4, HiSeq 2500 - id=7, ResequencingPhixRun - id=12, TSChIP-Seq - id=14042, BCereusDemoData_Illumina - id=34061]
We can also achieve this by making a call using the user instance. Notice that these calls take an instance of BaseSpaceAPI with apporpriate
priviliges to complete the transaction as parameter, this true for all retrieval method calls made on data objects:
my_projects2 = user.get_projects(my_api)
puts "Projects retrieved from the user instance"
puts my_projects2
# List the runs available for the current user
runs = user.get_runs(my_api)
puts "The runs for this user are"
puts runs
The output will be:
Projects retrieved from the user instance
[BaseSpaceDemo - id=2, Cancer Sequencing Demo - id=4, HiSeq 2500 - id=7, ResequencingPhixRun - id=12, TSChIP-Seq - id=14042, BCereusDemoData_Illumina - id=34061]
The runs for this user are
[BacillusCereus, Genome-in-a-Day, TSCA_test, 2x151PhiX, TruSeq Amplicon_Cancer Panel, CancerDemo]
In the same manner we can get a list of accessible user runs:
runs = user.get_runs(my_api)
puts "Runs retrieved from user instance"
puts runs
The output will be:
Runs retrieved from user instance
[BacillusCereus, Genome-in-a-Day, TSCA_test, 2x151PhiX, TruSeq Amplicon_Cancer Panel, CancerDemo]
Accessing file-trees and querying BAM or VCF files
In this section we demonstrate how to access samples and analysis from a projects and how to work with the available file data for such instances. In addition, we take a look at some of the special queuring methods associated with BAM- and VCF-files.
Again, start out by initializing a BaseSpaceAPI instance and retrieving all projects belonging to the current user:
# First, create a client for making calls for this user session
require 'basespace'
include Bio::BaseSpace
client_id = 'my client key'
client_secret = 'my client secret'
access_token = 'your access token'
app_session_id = 'my app session id'
basespace_url = 'https://api.basespace.illumina.com/'
api_version = 'v1pre3'
my_api = BaseSpaceAPI.new(client_id, client_secret, basespace_url, api_version, app_session_id, access_token)
user = my_api.get_user_by_id('current')
my_projects = my_api.get_project_by_user('current')
app_results = nil
samples = nil
Now we can list all the analyses and samples for these projects
# Let's list all the AppResults and samples for these projects
my_projects.each do |single_project|
puts "# Project: #{single_project}"
app_results = single_project.get_app_results(my_api)
puts " The App results for project #{single_project} are"
puts " #{app_results}"
samples = single_project.get_samples(my_api)
puts " The samples for project #{single_project} are"
puts " #{samples}"
end
The output will be:
# Project: BaseSpaceDemo - id=2
The App results for project BaseSpaceDemo - id=2 are
[Resequencing, Resequencing, Resequencing, Resequencing, Resequencing, Resequencing, Resequencing, Resequencing, Resequencing, Resequencing]
The samples for project BaseSpaceDemo - id=2 are
[BC_1, BC_2, BC_3, BC_4, BC_5, BC_6, BC_7, BC_8, BC_9, BC_10]
# Project: Cancer Sequencing Demo - id=4
The App results for project Cancer Sequencing Demo - id=4 are
[Amplicon, Amplicon]
The samples for project Cancer Sequencing Demo - id=4 are
[L2I]
# Project: HiSeq 2500 - id=7
The App results for project HiSeq 2500 - id=7 are
[Resequencing]
The samples for project HiSeq 2500 - id=7 are
[NA18507]
......
We'll take a further look at the files belonging to the sample from the last project in the loop above:
samples.each do |sample|
puts "# Sample: #{sample}"
files = sample.get_files(my_api)
puts files
end
The output will be:
# Sample: Bcereus_1
Bcereus-1_S1_L001_R1_001.fastq.gz - id: '14235852', size: '179971155'
Bcereus-1_S1_L001_R2_001.fastq.gz - id: '14235853', size: '193698522'
# Sample: Bcereus_2
Bcereus-2_S2_L001_R1_001.fastq.gz - id: '14235871', size: '126164153'
Bcereus-2_S2_L001_R2_001.fastq.gz - id: '14235872', size: '137077949'
......
Now, have a look at some of the methods calls specific to Bam and VCF files. First, we will get a Bam-file and then retrieve the coverage information available for chromosome 2 between positions 1 and 20000:
device_info = my_api.get_verification_code('read project 183184')
link = device_info['verification_with_code_uri']
host = RbConfig::CONFIG['host_os']
case host
when /mswin|mingw|cygwin/
system("start #{link}")
when /darwin/
system("open #{link}")
when /linux/
system("xdg-open #{link}")
end
sleep(15)
code = device_info['device_code']
my_api.update_privileges(code)
# Now do some work with files
# we'll grab a BAM by id and get the coverage for an interval + accompanying meta-data
my_bam = my_api.get_file_by_id('44154664')
puts "# BAM: #{my_bam}"
cov = my_bam.get_interval_coverage(my_api, 'chr1', '50000', '60000')
puts cov
= my_bam.(my_api, 'chr1')
puts
The output will be:
# BAM: sorted_S1.bam - id: '44154664', size: '105789387933', status: 'complete'
Chrom chr1: 1-1792, BucketSize=2
CoverageMeta: max=1158602 gran=128
For VCF-files we can filter variant calls based on chromosome and location as well:
# and a vcf file
my_vcf = my_api.get_file_by_id('44154644')
# Get the variant meta info
= my_vcf.(my_api)
puts
var = my_vcf.filter_variant(my_api, '1', '20000', '30000')
puts var
The output will be:
VariantHeader: SampleCount=1
[Variant - chr2: 10236 id=['.'], Variant - chr2: 10249 id=['.'], ....]
Creating an AppResult and uploading files
In this section we will see how to create a new AppResults object, change the state of the related AppSession, and upload result files to it as well as retrieve files from it.
First, create a client for making calls for this user session:
myBaseSpaceAPI = BaseSpaceAPI(client_key, client_secret, BaseSpaceUrl, version, AppSessionId,AccessToken=accessToken)
# Now we'll do some work of our own. First get a project to work on
# we'll need write permission, for the project we are working on
# meaning we will need get a new token and instantiate a new BaseSpaceAPI
p = myBaseSpaceAPI.getProjectById('89')
Assuming we have write access for the project, we will list the current analyses for the project:
appRes = p.getAppResults(myBaseSpaceAPI,statuses=['Running'])
print "\nThe current running AppResults are \n" + str(appRes)
The output will be:
Output[]:
The current running AppResults are
[Results for sample 123, Results for sample 124 ...]
To create an appResults for a project, simply give the name and description:
appResults = p.createAppResult(myBaseSpaceAPI,"testing","this is my results",appSessionId='')
print "\nSome info about our new app results"
print appResults
print appResults.Id
print "\nThe app results also comes with a reference to our AppSession"
myAppSession = appResults.AppSession
print myAppSession
The output will be:
Output[]:
Some info about our new app results
AppResult: testing
153153
The app results also comes with a reference to our AppSession
App session by 152152: Morten Kallberg - Id: <my appSession Id> - status: Running
We can change the status of our AppSession and add a status-summary as follows
myAppSession.setStatus(myBaseSpaceAPI,'needsattention',"We worked hard, but encountered some trouble.")
print "\nAfter a change of status of the app sessions we get\n" + str(myAppSession)
# we'll set our appSession back to running so we can do some more work
myAppSession.setStatus(myBaseSpaceAPI,'running',"Back on track")
The output will be:
Output[]:
After a change of status of the app sessions we get
App session by 152152: Morten Kallberg - Id: <my appSession Id> - status: NeedsAttention
Now we will make another AppResult and try to upload a file to it
appResults2 = p.createAppResult(myBaseSpaceAPI,"My second AppResult","This one I will upload to")
appResults2.uploadFile(myBaseSpaceAPI, '/home/mkallberg/Desktop/testFile2.txt', 'BaseSpaceTestFile.txt', '/mydir/', 'text/plain')
print "\nMy AppResult number 2 \n" + str(appResults2)
## let's see if our new file made it
appResultFiles = appResults2.getFiles(myBaseSpaceAPI)
print "\nThese are the files in the appResult"
print appResultFiles
f = appResultFiles[-1]
The output will be:
Output[]:
My AppResult number 2
AppResult: My second AppResult
These are the files in the appResult
[BaseSpaceTestFile.txt]
We can even download our newly uploaded file in the following manner:
f = myBaseSpaceAPI.getFileById(f.Id)
f.downloadFile(myBaseSpaceAPI,'/home/mkallberg/Desktop/')
Cookbook
This section contains useful code-snippets demonstrating use-cases that frequently come up in App development.
Filtering file-lists and AppResult-lists using query parameter dictionaries
Given a sample "a" we can retrieve a subset of the full file-list using a query parameter dictionary:
In [10]: a.getFiles(myAPI)
Out[10]: [sorted.bam, sorted.bam.bai, genome.vcf]
In [11]: a.getFiles(myAPI,myQp={'Extensions':'bam'})
Out[11]: [sorted.bam]
Filter with multiple extensions:
In [12]: a.getFiles(myAPI,myQp={'Extensions':'bam,vcf'})
Out[12]: [sorted.bam, genome.vcf]
You can provide all other legal sorting/filtering keyword in this dictionary to get further refinement of the list:
In [13]: a.getFiles(myAPI,myQp={'Extensions':'bam,vcf','SortBy':'Path'})
Out[13]: [genome.vcf, sorted.bam]
You can supply a dictionary of query parameters when you retrieving appresults, in the same way you filter file lists. Below is an example of how to limit the number of results from 100 (default value for “Limit”) to 10.
In [3]: res = p.getAppResults(myBaseSpaceAPI)
In [4]: len(res)
Out[4]: 100
In [5]: res = p.getAppResults(myBaseSpaceAPI,myQp={'Limit':'10'})
In [6]: len(res)
Out[6]: 10
Feature Requests and Bugs
Please feel free to report any feedback regarding the BaseSpace Ruby SDK directly to the GitHub repository. We appreciate any and all feedback about the SDKs and we will do anything we can to improve the functionality and quality of the SDK to make it easy for developers to use.
SDK Development Manual
Building a New Version of the Gem
bundle exec rake gemspec
bundle exec gem build bio-basespace.gemspec
sudo gem install bio-basespace
Unit Testing
First, install the gem as described just above. Then use RSpec for unit testing:
rspec -c -f d
Porting
BaseSpace Ruby SDK was initially ported by translating the BaseSpace Python SDK to Ruby. If it becomes necessary to port further code from the Python SDK, then the following porting guidelines should be observed:
- indentation: Python 4 spaces, Ruby 2 spaces
- compund words: Python
ExampleLabel, Rubyexample_label - constructors: Python
def __init__(self):, Rubydef initialize - class variables: Python
self.swaggerTypes = { "Key":"value" }, Ruby@swagger_types = { "Key" => "value" } - void types: Python
None, Rubynil - string representation: Python
__str__(self), Rubyto_s (return @val.to_s) - object dump: Python
__repr__(self), Rubyto_str (return self.inspect)orself.attributes.inspectfor attribute values - exceptions: Python
FooBarException->FooBarError - types:
- Python
str, RubyString - Python
int, RubyInteger - Python
float, RubyFloat - Python
bool, Rubytrue/false - Python
list<>, RubyArray - Python
dict, RubyHash - Python
file, RubyFile
- Python
Authors
Joachim Baran, Raoul Bonnal, Eri Kibukawa, Francesco Strozzi, Toshiaki Katayama
Copying / License
See License.txt for details on licensing and distribution.