Cyberinfrastructure Day 2010: Applications in Biocomputing
-
Upload
jeremy-yang -
Category
Documents
-
view
1.058 -
download
0
description
Transcript of Cyberinfrastructure Day 2010: Applications in Biocomputing
Jeremy Yang Software Systems Manager Division of Biocomputing Dept. of Biochemistry & Molecular Biology UNM School of Medicine
Cyberinfrastructure Day -- April 22, 2010
What is Biocomputing? Cyber Revolution (~1980-2010+) Cyberinfrastructure (To be or not to be?) Super Computing, Redefined
I. II. III. IV.
Division of Biocomputing http://biocomp.health.unm.edu/
Department of Biochemistry & Molecular Biology School of Medicine
Also affiliated with the NIH Roadmap-funded UNM Center for Molecular Discovery
Biomolecular screening informatics
Cheminformatics Bioinformatics Genomics Virtual screening Molecular modeling SAR (Structure-
Activity-Relationship)
Data mining, machine learning
3D visualization Public data integration Collaborations in
chemistry, biology, medicine, comp sci
BIOMED 505 course Software development,
management, deployment & support
Larry Sklar, et al., UNMCMD (NIH Roadmap)
~$20M NIH awarded to date
32 cpu Linux cluster 32GB RAM server Linux: OpenSUSE, CentOS, RedHat, Fedora, Ubuntu SGI/IRIX Windows, Mac OS X Automated integration with NIH databases
2+ Oracle instances PostgreSQL, MySQL Stereo graphics workstation 25+ scientific software packages Supported in-house applications
We are cyberinfrastructure users and providers!
Virtual chemistry; property prediction, chemspace navigation, computer aided molecular design, graph
theory, databases
Nucleotide and protein sequence analysis Genomics, proteomics Merging with chemical biology, etc.
Computational search for likely biological actives Database may be real or virtual compounds 2D and 3D methods 2D similarity search 3D similarity search (shape, pharmacophore) docking (3D, protein binding site)
Example: 3D shape search;
prozac & paxil
c/o OpenEye Rocs
atoms, bonds, surfaces, fields, interactions, stereo
serotonin
hemoglobin
Computational models for protein-ligand binding
interaction potential sites: hydrophobic (green), hydrophillic (purple), hbond acceptors (red)
Gleevec is a leukemia drug known to bind with Abl kinase.
Abl kinase (1iep.pdb)
Gleevec in binding site
http://video.google.com/videoplay?docid=-5859274887925224981#
http://chemapps.stolaf.edu/pe/protexpl/htm/top.htm?id=1d66&&&chpa=true
Jmol interactive DNA modeling demo:
PyMol movie:
Expert users can advance understanding via rich, dynamic, visual interfaces.
(Watch movies...)
E.g., Searching NIH PubChem for non-selectivity
Many biomedical data sources worldwide
SLIDE 15 (15 MIN?)
Division of Biocomputing in 2008
Rapid change, challenge and opportunity Learning from history, trends (new not enough) Winners and losers Science, experts have led and followed. ~1980-2010 covers 3σ (99.7%) And evolution...
Rapid change, challenge and opportunity Learning from history, trends Winners and losers Science, experts have led and followed. ~1980-2010 covers 3σ (99.7%) And evolution...
1977: Atari 2600 1978: Space Invaders 1981: IBM-PC (MS-DOS) 1983: cellphone 1983: GNU Project 1984: Neuromancer,
William Gibson, “cyberspace”
1984: Apple Mac, mouse, windows & icons
1985: Oracle 5 (client-server) 1989: Intel 486 Pentium (1M
transistors, 50MHz) 1990: MS Windows 3.0 1990: WWW (Berners-Lee) 1991: High Perf Comp &
Comm Act (Al Gore) 1991: Linux (Linux Torvalds) 1991: AOL 1991: ETrade
1993: Jurassic Park (via SGI) 1993: NCSA Mosaic 1994: Netscape Navigator 1994: “Good Times” hoax 1994: Match.com 1995: “Concept” virus (Word) 1995: Internet Explorer 1995: Apache project 1995: Yahoo!
1995: Amazon.com 1995: My mother gets email 1997: Google 1997: eBay 1999: Melissa virus (Outlook) 1999: Napster (p2p) 2000: MS convicted 2000: 3M USA broadband* 2000: dot-com bubble pops
*Fixed non dial-up internet connections >56k (FCC).
2000: 802.11b wireless 2001: Apple iPod 2001: Apple iTunes 2001: Wikipedia 2003: Skype 2005: YouTube 2005: Rio power grid hacked 2005: NSA domestic surveillance 2006: Facebook
2006: Amazon Cloud 2007: DOD hacked 2008: 70M USA broadband* 2009: Cyberdefense USA priority 2009: Twitter role in Iran election
protests 2010: UAVs are SOPs 2011: Cyber terrorism?
*Fixed non dial-up internet connections >56k (FCC).
The dotted line keeps moving...
Case study: database cheminformatics in pharma research, 1990→2000.
In 1990, high speed chemical searching was beyond standard capabilities. Research groups managed local servers in their labs & specialized DB engines (e.g. Daylight Inc.). By 2000, this function had moved to IT (via Oracle cartridges, etc.) corporate informatics infrastructure Transition not smooth, but very beneficial.
imidazoles
cocaine Standard functions:
substructure, similarity, identity
chemical searching
(1) office equipment (2) lab equipment (3) experimental apparatus (4) the experiment (5) a commodity (6) custom configured experimental
vehicle for exploration (5) all of the above
(1) office equipment (2) lab equipment (3) experimental apparatus (4) the experiment (5) a commodity (6) custom configured experimental
vehicle for exploration (5) all of the above
Scientific software Computational science Commodity software Engineering enables science Science requires agile development, high performance, experimentation, risk taking, play. Cyberinfrastructure users and developers/maintainers
SLIDE 30 (30 MIN?)
Scientific research Computational research High performance computing as a research tool High performance infrastructure as a productivity tool
Scientific software for experts Enabling software for scientists Commoditization (e.g. cloud computing) Plumbing vs. experimental apparatus Appropriate tiers and domains
IT: “Poorly managed computers and needy ill-
trained users put the system at risk.”
Research: “We need power, flexibility and
access and not another lame PC.”
And with other cyberfolks too. And with great results.
In ~5 yrs, super → un-super Super computing? Define computer. Advances from unexpected places:
gaming, movies (graphics -- vs. AI) social networking (crowdsourcing) even business (web standards, UIs, security)
Super computing is pushing the current limits But where are the key frontiers?
Advances from unexpected places...
Colossus code breaking computer, UK.
Eniac computer, Univ of Pennsylvania.
Cray computer
SLIDE 40 (40 MIN?)
High performance (super) computing is pushing the current limits.
This is what a “computer” looks like.
“The network is the computer.” - John Gage (Sun, NetDay founder)
Corollaries: The network is the (semantic) database The network is cyberspace The network is us too
Super users → super computing Blackbox AI/monolith paradigm limiting Human/computer co-evolution
Cytoscape biological network
visualizer with drug - target interactions
“Super Computers” @ Division of Biocomputing Tudor Oprea Cristian Bologa Stephen Mathias Oleg Ursu Jerome Abear Ramona Curpan Liliana Halip Andrei Leitao
Jeremy Yang [email protected]
Cyberinfrastructure Day -- April 22, 2010
Happy Earth Day!