BioThingsAPIs: Linked High-performance APIs for Biological ......JSON -LD enabled Linked Data. Data-...
Transcript of BioThingsAPIs: Linked High-performance APIs for Biological ......JSON -LD enabled Linked Data. Data-...
BioThings APIs: Linked High-performance APIs for Biological Ent it ies
Chunlei Wu, [email protected]
@chunleiwu
Associate Professor of Molecular MedicineDept. of Molecular Experimental Medicine
The Scripps Research InstituteLa Jolla, CA, USA
BD2K-AHM 11/2016
BioThings APIs
Object ive:
Building unif ied APIs for “Bio-Things” (biological ent it ies)
Biological knowledge is a complex network
No one-f it-all database can capturethe entire knowledge space
Simplify the knowledge network as ent it ies
Extracting those central hub nodes as f lat lists:
Gene
Variant
Pathway
Metabolite
Disease
∙ ∙ ∙ ∙ ∙ ∙
∙ ∙ ∙
∙ ∙ ∙
∙ ∙ ∙
∙ ∙ ∙
∙ ∙ ∙
Gene and Variant annotat ions represented in JSON documents
{"_id": "chr1:g.196659237C>T","cosmic": {
"chrom": "1","hg19": {
"start": 196659237,"end": 196659237
},"ref": "C","alt": "T","tumor_site": "breast","mut_freq": 0.49,"mut_nt": "C>T","cosmic_id": "COSM424915"
}
{“_id”: “1017”,“Symbol”: “CDK2”,“Ensembl”: “ENSG00000123374”,“RefSeq”: [
“NM_001798”,“NM_052827”
],“Reporter”: {
“U95A”: [“1792_g_at”,“1833_at”
],“U133A”:[
“211804_s_at”,“2045252_at”,“211803_at”
]}
}
Keep data always up-to-date
Schematic view of MyVariant.info architecture
Each data source is updated individually. Colors indicate their dif ferent updating schedules.
High-performance web service APIs
Schematic view of MyVariant.info architecture
MyGene.info + MyVariant .info
Gene
G
Variant
V
MyVariant .inf oMyGene.inf o
/v3/gene/<geneid>/v3/query?q=<query>
/v1/variant/<hgvsid>/v1/query?q=<query>
single query on GET, batch query on POST
We focus on building APIs. Try to …
Make it really easy to use
J ust two endpoints
No registration/sign-in
No API key
Developer-f riendly
J SONPCORShttps
msgpackhttp compression
http cachingJ SON-LD
Python/R clients(also js client for myvariant)
search “mygene” and “myvariant”in PyPI and B ioconductor
Supported!
Aggregate everything about genes and variants
MyVariant .inf oMyGene.inf o
Support >17M genesfor ~18K species
~ 200 annotation f ields
Support > 340 M variants
~ 500 annotation f ields
from 14 sources:ClinVardbNSFPdbSNP
…
Keep up-to-date
MyVariant .inf oMyGene.inf o
Support >17M genesfor ~18K species
~ 200 annotation f ields
Support > 340 M variants
~ 500 annotation f ields
from 14 sources:ClinVardbNSFPdbSNP
…
Weekly Monthly
High-performance and scalable
>95% queries response < 30ms
High-performance and scalable
High-performance and scalable
Over 100M request s in Nov 2016
High availabilit y
MyVariant .inf oMyGene.inf o
99.999%over last year
99.935% over last year
Availability tracked by
Who is usingLive applications:
MinePath.org
Gene Wiki
J Browse
Who is using
Many users use them in their
daily analysis pipelines
or
simply caching annotations locally
Generalized BioThings SDK
BioThings SDK
MyVariant .inf o
MyGene.inf oJ SON data aggregation mechanism
High-performance query engine
Well-designed REST API pattern
J SON-LD enabled Linked Data
Data-updating schedulerPython/R clients…
BioThings SDK
A tutorial here (more docs are coming):http://biothingsapi.readthedocs.io/en/latest/
BioThings SDK
v.biot hings.io
g.biot hings.io
BioThings SDK
s.biot hings.io
c.biot hings.io
gene
variant
species/taxonomy
drugs/ compounds
∙ ∙ ∙ ∙ ∙ ∙
alias to MyGene.info
alias to MyVariant.info
diseased.biot hings.io
JSON-LD brings the linkage between BioThings APIs
Apply JSON-LD contextJSON document
{"_id" : "chr6:g.26093141G>A","cl invar" : {
"gene" : {" id" : "3077" , "symbol" : "HFE“
}},"dbsnp" : {
" rsid" : " rs1800562“},"cadd" : {
" genename " : " HFE “}}
N-Quads f ormat out put
_:b0 <ht t p:/ / schema.myvariant .inf o/ dat asource/ cl invar> _:b1 ._:b0 <ht t p:/ / schema.myvariant .inf o/ dat asource/ dbsnp> _:b3 ._:b0 <ht t p:/ / schema.myvariant .inf o/ dat asource/ cadd> _:b4 ._:b1 <ht t p:/ / schema.myvariant .inf o/ dat anode/ gene> _:b2 ._:b2 <ht t p:/ / ident if iers.org/ hgnc.symbol> "HFE" ._:b3 <ht t p:/ / ident if iers.org/ dbsnp/ > " rs1800562" ._:b4 <ht t p:/ / ident if iers.org/ hgnc.symbol> "HFE" .
JSON-LD Cont ext
{" root " : {"@cont ext " : {
"cl invar" : "ht t p:/ / schema.myvariant .inf o/ dat asource/ cl invar" , "dbsnp" : "ht t p:/ / schema.myvariant .inf o/ dat asource/ dbsnp" , "genename": "ht t p:/ / ident if iers.org/ hgnc.symbol" , " cadd" : "ht t p:/ / schema.myvariant .inf o/ dat asource/ cadd" , " rsid" : "ht t p:/ / ident if iers.org/ dbsnp/ " , "gene" : "ht t p:/ / schema.myvariant .inf o/ dat anode/ gene"}},
"cl invar/ gene" : {"@cont ext " : {
"symbol " : " ht t p:/ / ident if iers.org/ hgnc.symbol " }}}
N-Quads Transf ormat ion
BioThings TEAM
TSRI:
Chunlei WuAndrew SuJ iwen XinCyrus AfrasiabiSebastien LelongGinger TsuengJ ulee AdesaraMike Mayers
U. Washingt on:
Sean MooneyMoritz J uchlerNikhil GopalSicheng Song
Funding and SupportU01HG008473U54GM114833