20120220 Tri-Con Cloud Computing Symposium

24
BioGPS and mygene.info: Consuming and Providing Cloud Computing Resources Molecular Med Tri-Con February 20, 2012 Andrew Su, Ph.D. http://sulab.org @andrewsu +Andrew Su [email protected]

Transcript of 20120220 Tri-Con Cloud Computing Symposium

Page 1: 20120220 Tri-Con Cloud Computing Symposium

BioGPS and mygene.info: Consuming and Providing Cloud

Computing Resources

Molecular Med Tri-ConFebruary 20, 2012

Andrew Su, Ph.D.

http://sulab.org@andrewsu+Andrew Su

[email protected]

Page 2: 20120220 Tri-Con Cloud Computing Symposium

High-throughput molecular profiling is powerful2

Gene/protein list

m/z

Testable hypothesis

Page 3: 20120220 Tri-Con Cloud Computing Symposium

3

20 million papers900,000 new papers / year

Page 4: 20120220 Tri-Con Cloud Computing Symposium

Gene databases are numerous and overlapping4

… and hundreds more …

Page 5: 20120220 Tri-Con Cloud Computing Symposium

http://biogps.org

Community extensibility and user customizability5

Page 6: 20120220 Tri-Con Cloud Computing Symposium

Crowdsourcing depends on positive feedback6

Utility

UsersContributors

1001

2002

Page 7: 20120220 Tri-Con Cloud Computing Symposium

Utility

UsersContributors

Utility: A simple and universal plugin interface7

Page 8: 20120220 Tri-Con Cloud Computing Symposium

Utility

UsersContributors

Utility: A simple and universal plugin interface8

Page 9: 20120220 Tri-Con Cloud Computing Symposium

Utility

UsersContributors

Utility: A simple and universal plugin interface9

Page 10: 20120220 Tri-Con Cloud Computing Symposium

Utility

UsersContributors

Utility: A simple and universal plugin interface10

Page 11: 20120220 Tri-Con Cloud Computing Symposium

Utility

UsersContributors

Utility: A simple and universal plugin interface11

Page 12: 20120220 Tri-Con Cloud Computing Symposium

Utility: A simple and universal plugin interface12

Utility

UsersContributors

Total of 389 gene-centric online databases registered as BioGPS plugins

Page 13: 20120220 Tri-Con Cloud Computing Symposium

Users: BioGPS has critical mass13

• > 4100 registered users• 4000 unique visitors per week• 40,000 page views per week

1. Harvard2. NIH3. UCSD4. Scripps5. MIT6. Cambridge

7. U Penn8. Stanford9. Wash U10. UNC

Top 10 organizations

Daily pageviewsUtility

UsersContributors

Page 14: 20120220 Tri-Con Cloud Computing Symposium

Contributors: Explicit and implicit knowledge14

389 plugins registered (65% publicly shared)

by over 75 users

spanning 150+ domains

Utility

UsersContributors

Page 15: 20120220 Tri-Con Cloud Computing Symposium

BioGPS architecture15

http://mygene.info

Page 16: 20120220 Tri-Con Cloud Computing Symposium

mygene.info architecture16

http://mygene.info

NGINX

Page 17: 20120220 Tri-Con Cloud Computing Symposium

BioGPS as a cloud computing consumer17

EC2 Micro

EC2 Small

NGINX

EC2 MicroEC2 Micro

Total monthly cost: ~$100

Page 18: 20120220 Tri-Con Cloud Computing Symposium

BioGPS as a cloud computing provider18

Use case: Create web application to display custom Affymetrix data

“204252_at”

“CDK2”

Data set samples

204252_at

Exp

ress

ion

Gene Annotation as a Service

(GAaaS)

Users

Developers

Users

Developers

Users

Developers

Page 19: 20120220 Tri-Con Cloud Computing Symposium

http://mygene.info/query?q=cdk2

Gene query web service19

http://mygene.info/query?q=cdk?http://mygene.info/query?q=GO:0000307http://mygene.info/query?q=P24941http://mygene.info/query?q=204252_at

Page 20: 20120220 Tri-Con Cloud Computing Symposium

http://mygene.info/query?q=cdk*

Gene annotation web service20

http://mygene.info/gene/1017

Page 21: 20120220 Tri-Con Cloud Computing Symposium

Optimized for performance in web apps21

10 100 1000 10000 1000000.01

0.1

1

10

# of query terms

Tim

e (

s)

More documentation (paging, sorting, filtering, etc.) plus code snippets at http://mygene.info.

Page 22: 20120220 Tri-Con Cloud Computing Symposium

The future of BioGPS22

Third party content providers

Page 23: 20120220 Tri-Con Cloud Computing Symposium

The future of BioGPS23

Third party content providers

Semantic interpretation,

change detection, etc.

Page 24: 20120220 Tri-Con Cloud Computing Symposium

24

Erik ClarkeBen GoodSalvatore Loguercio

Ian MacleodChunlei Wu

Group members

Funding and Support

(BioGPS: GM83924, Gene Wiki: GM089820)

Contact

http://[email protected]

@andrewsu+Andrew Su