Biothings APIs: high-performance bioentity-centric web services

33
Chunlei Wu, Ph.D. [email protected] @chunleiwu Associate Professor of Molecular Medicine Dept. of Molecular Experimental Medicine The Scripps Research Institute La Jolla, CA, USA 07/2016 Biothings APIs: high-performance bioentity- centric web services

Transcript of Biothings APIs: high-performance bioentity-centric web services

Page 1: Biothings APIs: high-performance bioentity-centric web services

Chunlei Wu, [email protected]

@chunleiwu

Associate Professor of Molecular MedicineDept. of Molecular Experimental Medicine

The Scripps Research InstituteLa Jolla, CA, USA

07/2016

Biothings APIs: high-performance bioentity-centric web services

Page 2: Biothings APIs: high-performance bioentity-centric web services

BioThings APIs

Objective:

Building unified APIs for “Bio-things” (biological entities)

Page 3: Biothings APIs: high-performance bioentity-centric web services

Biological knowledge is a complex network

No one-fit-all database can capture the entire knowledge space

Page 4: Biothings APIs: high-performance bioentity-centric web services

Typical database representations

{ _id: 1017, name: CDK2, taxid: 9606}

Relational database

Document database

RDF triplestore

Tables JSON objects Triples

Key-value store

Key-value pairs

Page 5: Biothings APIs: high-performance bioentity-centric web services

BioThings APIs are built on document databases

Why we picked document databases:• Object representation

• Rich data structures, handles heterogeneous data very well

• Atomic operations, built for big-data scale

Page 6: Biothings APIs: high-performance bioentity-centric web services

Gene and Variant annotations represented in JSON documents

{ "_id": "chr1:g.196659237C>T", "cosmic": { "chrom": "1", "hg19": { "start": 196659237, "end": 196659237 }, "ref": "C", "alt": "T", "tumor_site": "breast", "mut_freq": 0.49, "mut_nt": "C>T", "cosmic_id": "COSM424915"}

{ “_id”: “1017”, “Symbol”: “CDK2”, “Ensembl”: “ENSG00000123374”, “RefSeq”: [ “NM_001798”, “NM_052827” ], “Reporter”: { “U95A”: [ “1792_g_at”, “1833_at” ], “U133A”:[ “211804_s_at”, “2045252_at”, “211803_at” ] }}

Page 7: Biothings APIs: high-performance bioentity-centric web services

Keep data always up-to-date

Each data source is updated individually. Colors indicate their different updating

schedules.

Schematic view of MyVariant.info architecture

Page 8: Biothings APIs: high-performance bioentity-centric web services

High-performance web service APIs

Schematic view of MyVariant.info architecture

Page 9: Biothings APIs: high-performance bioentity-centric web services

MyGene.info + MyVariant.infoGene

G

Variant

V

MyVariant.info

MyGene.info

/v2/gene/<geneid>/v2/query?q=<query>

/v1/variant/<hgvsid>/v1/query?q=<query>

/v3/gene/<geneid>/v3/query?q=<query>

single query on GET, batch query on POST

Page 10: Biothings APIs: high-performance bioentity-centric web services

We focus on building APIs. Try to …

Page 11: Biothings APIs: high-performance bioentity-centric web services

Make it really easy to use

Just two endpoints

No registration/sign-in

No API key

Page 12: Biothings APIs: high-performance bioentity-centric web services

Developer-friendly

Python/R clients(also js client for myvariant)

search “mygene” and “myvariant”in PyPI and Bioconductor

JSONPCORShttps

msgpackhttp compression

http cachingJSON-LD

Supported!

Page 13: Biothings APIs: high-performance bioentity-centric web services

Aggregate Everything about gene and variant

MyVariant.info

MyGene.info

Support >15M genes for ~17K species ~ 200 annotation fields

Support > 334 M variants ~ 500 annotation fields

from 14 sources: ClinVar dbNSFP dbSNP …

Page 14: Biothings APIs: high-performance bioentity-centric web services

Keep up-to-date

MyVariant.info

MyGene.info

Weekly Monthly

Support >15M genes for ~17K species ~ 200 annotation fields

Support > 334 M variants ~ 500 annotation fields

from 14 sources: ClinVar dbNSFP dbSNP …

Page 15: Biothings APIs: high-performance bioentity-centric web services

High-performance and scalable

>95% queries response < 30ms

Page 16: Biothings APIs: high-performance bioentity-centric web services

High-performance and scalable

Page 17: Biothings APIs: high-performance bioentity-centric web services

High availability

99.999% over last year

MyVariant.info

MyGene.info

99.87% over last 6

months

Availability tracked by

Page 18: Biothings APIs: high-performance bioentity-centric web services

Who is using

MinePath.org

Gene Wiki

JBrowse

Live applications:

Page 19: Biothings APIs: high-performance bioentity-centric web services

Who is using

Many users use them in their

daily analysis pipelines or

simply caching annotations locally

Page 20: Biothings APIs: high-performance bioentity-centric web services

MyGene.info recent usage statsrequests unique IPs

Jan-16 3,885,192 2,498Feb-16 5,313,950 2,786Mar-16 3,362,354 3,121Apr-16 10,918,104 3,065

May-16 10,776,858 3,803Jun-16 6,396,148 3,940

39%direct calls 38%

mygene.py

14%mygene.R

9%BioGPS

Over 40M requestsIn six months

Page 21: Biothings APIs: high-performance bioentity-centric web services

MyVariant.info recent usage stats

requests unique IPsJan-16 83,519 1,330Feb-16 3,054,191 1,192Mar-16 272,424 1,771Apr-16 701,526 1,500

May-16 89,642 1,891Jun-16 213,767 1,924

21%direct calls 23%

myvariant.py

50%myvariant.R

6%myvariant.js

~4.5M requestsIn six months

Page 22: Biothings APIs: high-performance bioentity-centric web services

Generalized BioThings SDK

BioThings SDK

MyVariant.info

MyGene.info JSON data aggregation mechanism

High-performance query engine

Well-designed REST API pattern

JSON-LD enabled Linked Data

Data-updating schedulerPython/R clients…

Page 23: Biothings APIs: high-performance bioentity-centric web services

BioThings SDK

A tutorial here (more docs are coming):http://biothingsapi.readthedocs.io/en/latest/

Page 24: Biothings APIs: high-performance bioentity-centric web services

v.biothings.io

g.biothings.io

BioThings SDK

gene

variant

s.biothings.io

species/taxonomy

alias to MyGene.info

alias to MyVariant.info

Page 25: Biothings APIs: high-performance bioentity-centric web services

BioThings API for species/taxonomy

{"_id": "9606","_version": 1,"authority": [

"homo sapiens linnaeus, 1758"],"children": [ 63221, 741158],"common_name": "man","genbank_common_name": "human","has_gene": true,"lineage": [ 9606, 9605, 207598, …,131567, 1],"parent_taxid": 9605,"rank": "species","scientific_name": "homo sapiens","taxid": 9606,"uniprot_name": "homo sapiens"

}

http://s.biothings.io/v1/species/9606?include_children=true

Page 26: Biothings APIs: high-performance bioentity-centric web services

BioThings API for species/taxonomy

{ "hits": [ { "_id": "1239", "_score": 10.971453, "common_name": […], "genbank_common_name": "gram-positive bacteria", "has_gene": false, "lineage": [1239, 1783272, 2, 131567, 1], "parent_taxid": 1783272, "rank": "phylum", "scientific_name": "firmicutes", "taxid": 1239, "uniprot_name": "firmicutes" } ], "max_score": 10.971453, "took": 12, "total": 1}

http://s.biothings.io/v1/query?q=rank:phylum AND common_name:gram-positive

Page 27: Biothings APIs: high-performance bioentity-centric web services

Species API used in MyGene.info

You can now query for genes beyond species:

Q: Give me all lytic enzymes for any firmicutes

http://mygene.info/v2/query?q=lytic enzyme&species=1239&include_tax_tree=true

http://mygene.info/v2/query?q=lytic enzyme&species=1239

0 hits

5 hits

Page 28: Biothings APIs: high-performance bioentity-centric web services

Very minimal code for building a species API

Page 29: Biothings APIs: high-performance bioentity-centric web services

Have the flexibility to customize your query

Page 30: Biothings APIs: high-performance bioentity-centric web services

v.biothings.io

g.biothings.io

BioThings SDK

s.biothings.io

c.biothings.io

gene

variant

species/taxonomydrugs/ compounds

∙ ∙ ∙ ∙ ∙ ∙

alias to MyGene.info

alias to MyVariant.info

disease d.biothings.io

Page 31: Biothings APIs: high-performance bioentity-centric web services

BioThings APIs

A collection of data APIs A framework for building new APIs

Data as a service

Software as a serviceGot a new type of “BioThings”?

We can help you to build or even host your biothings API

Page 32: Biothings APIs: high-performance bioentity-centric web services

BioThings TEAM

Funding and SupportU01HG008473U54GM114833

TSRI:

Chunlei WuAndrew SuJiwen XinCyrus AfrasiabiSebastien LelongGinger TsuengJulee AdesaraMike Mayers

U. Washington:

Sean MooneyMoritz JuchlerNikhil Gopal

More @ISMB:

Talk: TT13

Poster: G09

Page 33: Biothings APIs: high-performance bioentity-centric web services

Source code

• MyGene.infohttps://github.com/sulab/mygene.info

• MyVariant.infohttps://github.com/sulab/myvariant.info

• BioThings API for species/taxonomyhttps://github.com/sulab/biothings.species

• BioThings SDKhttps://github.com/sulab/biothings.api