Biothings APIs: high-performance bioentity-centric web services
-
Upload
chunlei-wu -
Category
Science
-
view
190 -
download
1
Transcript of Biothings APIs: high-performance bioentity-centric web services
Chunlei Wu, [email protected]
@chunleiwu
Associate Professor of Molecular MedicineDept. of Molecular Experimental Medicine
The Scripps Research InstituteLa Jolla, CA, USA
07/2016
Biothings APIs: high-performance bioentity-centric web services
BioThings APIs
Objective:
Building unified APIs for “Bio-things” (biological entities)
Biological knowledge is a complex network
No one-fit-all database can capture the entire knowledge space
Typical database representations
{ _id: 1017, name: CDK2, taxid: 9606}
Relational database
Document database
RDF triplestore
Tables JSON objects Triples
Key-value store
Key-value pairs
BioThings APIs are built on document databases
Why we picked document databases:• Object representation
• Rich data structures, handles heterogeneous data very well
• Atomic operations, built for big-data scale
Gene and Variant annotations represented in JSON documents
{ "_id": "chr1:g.196659237C>T", "cosmic": { "chrom": "1", "hg19": { "start": 196659237, "end": 196659237 }, "ref": "C", "alt": "T", "tumor_site": "breast", "mut_freq": 0.49, "mut_nt": "C>T", "cosmic_id": "COSM424915"}
{ “_id”: “1017”, “Symbol”: “CDK2”, “Ensembl”: “ENSG00000123374”, “RefSeq”: [ “NM_001798”, “NM_052827” ], “Reporter”: { “U95A”: [ “1792_g_at”, “1833_at” ], “U133A”:[ “211804_s_at”, “2045252_at”, “211803_at” ] }}
Keep data always up-to-date
Each data source is updated individually. Colors indicate their different updating
schedules.
Schematic view of MyVariant.info architecture
High-performance web service APIs
Schematic view of MyVariant.info architecture
MyGene.info + MyVariant.infoGene
G
Variant
V
MyVariant.info
MyGene.info
/v2/gene/<geneid>/v2/query?q=<query>
/v1/variant/<hgvsid>/v1/query?q=<query>
/v3/gene/<geneid>/v3/query?q=<query>
single query on GET, batch query on POST
We focus on building APIs. Try to …
Make it really easy to use
Just two endpoints
No registration/sign-in
No API key
Developer-friendly
Python/R clients(also js client for myvariant)
search “mygene” and “myvariant”in PyPI and Bioconductor
JSONPCORShttps
msgpackhttp compression
http cachingJSON-LD
Supported!
Aggregate Everything about gene and variant
MyVariant.info
MyGene.info
Support >15M genes for ~17K species ~ 200 annotation fields
Support > 334 M variants ~ 500 annotation fields
from 14 sources: ClinVar dbNSFP dbSNP …
Keep up-to-date
MyVariant.info
MyGene.info
Weekly Monthly
Support >15M genes for ~17K species ~ 200 annotation fields
Support > 334 M variants ~ 500 annotation fields
from 14 sources: ClinVar dbNSFP dbSNP …
High-performance and scalable
>95% queries response < 30ms
High-performance and scalable
High availability
99.999% over last year
MyVariant.info
MyGene.info
99.87% over last 6
months
Availability tracked by
Who is using
MinePath.org
Gene Wiki
JBrowse
Live applications:
Who is using
Many users use them in their
daily analysis pipelines or
simply caching annotations locally
MyGene.info recent usage statsrequests unique IPs
Jan-16 3,885,192 2,498Feb-16 5,313,950 2,786Mar-16 3,362,354 3,121Apr-16 10,918,104 3,065
May-16 10,776,858 3,803Jun-16 6,396,148 3,940
39%direct calls 38%
mygene.py
14%mygene.R
9%BioGPS
Over 40M requestsIn six months
MyVariant.info recent usage stats
requests unique IPsJan-16 83,519 1,330Feb-16 3,054,191 1,192Mar-16 272,424 1,771Apr-16 701,526 1,500
May-16 89,642 1,891Jun-16 213,767 1,924
21%direct calls 23%
myvariant.py
50%myvariant.R
6%myvariant.js
~4.5M requestsIn six months
Generalized BioThings SDK
BioThings SDK
MyVariant.info
MyGene.info JSON data aggregation mechanism
High-performance query engine
Well-designed REST API pattern
JSON-LD enabled Linked Data
Data-updating schedulerPython/R clients…
BioThings SDK
A tutorial here (more docs are coming):http://biothingsapi.readthedocs.io/en/latest/
v.biothings.io
g.biothings.io
BioThings SDK
gene
variant
s.biothings.io
species/taxonomy
alias to MyGene.info
alias to MyVariant.info
BioThings API for species/taxonomy
{"_id": "9606","_version": 1,"authority": [
"homo sapiens linnaeus, 1758"],"children": [ 63221, 741158],"common_name": "man","genbank_common_name": "human","has_gene": true,"lineage": [ 9606, 9605, 207598, …,131567, 1],"parent_taxid": 9605,"rank": "species","scientific_name": "homo sapiens","taxid": 9606,"uniprot_name": "homo sapiens"
}
http://s.biothings.io/v1/species/9606?include_children=true
BioThings API for species/taxonomy
{ "hits": [ { "_id": "1239", "_score": 10.971453, "common_name": […], "genbank_common_name": "gram-positive bacteria", "has_gene": false, "lineage": [1239, 1783272, 2, 131567, 1], "parent_taxid": 1783272, "rank": "phylum", "scientific_name": "firmicutes", "taxid": 1239, "uniprot_name": "firmicutes" } ], "max_score": 10.971453, "took": 12, "total": 1}
http://s.biothings.io/v1/query?q=rank:phylum AND common_name:gram-positive
Species API used in MyGene.info
You can now query for genes beyond species:
Q: Give me all lytic enzymes for any firmicutes
http://mygene.info/v2/query?q=lytic enzyme&species=1239&include_tax_tree=true
http://mygene.info/v2/query?q=lytic enzyme&species=1239
0 hits
5 hits
Very minimal code for building a species API
Have the flexibility to customize your query
v.biothings.io
g.biothings.io
BioThings SDK
s.biothings.io
c.biothings.io
gene
variant
species/taxonomydrugs/ compounds
∙ ∙ ∙ ∙ ∙ ∙
alias to MyGene.info
alias to MyVariant.info
disease d.biothings.io
BioThings APIs
A collection of data APIs A framework for building new APIs
Data as a service
Software as a serviceGot a new type of “BioThings”?
We can help you to build or even host your biothings API
BioThings TEAM
Funding and SupportU01HG008473U54GM114833
TSRI:
Chunlei WuAndrew SuJiwen XinCyrus AfrasiabiSebastien LelongGinger TsuengJulee AdesaraMike Mayers
U. Washington:
Sean MooneyMoritz JuchlerNikhil Gopal
More @ISMB:
Talk: TT13
Poster: G09
Source code
• MyGene.infohttps://github.com/sulab/mygene.info
• MyVariant.infohttps://github.com/sulab/myvariant.info
• BioThings API for species/taxonomyhttps://github.com/sulab/biothings.species
• BioThings SDKhttps://github.com/sulab/biothings.api