Post on 15-Jan-2016
description
North CarolinaBioinformatics Grid
Thom H. Dunning, Jr.
HPCC Division, MCNCChemistry, University of North Carolina
GenomicsA Compute- & Data-Intensive Science
* from TimeLogic
Data ExplosionRapid Growth of GenBank
No.
Gb
ases
Growth of GenBank Number of base pairs
increasing dramatically (exponentially)
Growth in 2002 due to additions in just 21 days!
1982 1986 1990 1994 1998 20020
5
10
15
20
Data ExplosionNumber and Diversity of Databases
Nucleic Acids Research, 2002, Vol. 30, No. 1
Table 1. Molecular Biology Database Collection
Major Public Sequence Repositories
DNA Data Bank of Japan (DDBJ) http://www.ddbj.nig.ac.jp All known nucleotide and protein sequences…
Varied Biomedical Content
…
VirOligo http://viroligo.okstate.edu Virus-specific oligonucleotides for PCR and…
333 Databases
Computing ExplosionAssembly and Analysis of Genomic Data
Celera Genomics–Assembling the Genome Compaq Alpha Clusters Number of processors: ~ 750 Peak performance: 1 teraops
NuTech Sciences–Mining the Genome IBM p640 System Number of processors: ~ 5,000 Peak performance: 7½ teraops Total memory: 2½ terabytes Total disk storage: 50 terabytes
GenomicsMeeting the Information Challenge
GridMiddleware
DataStorage
Computers
Network
North Carolina Supercomputing Center
North CarolinaResearch and Education Network
Greensboro
Charlotte
Pembroke
WinstonSalem
NCSU
NCSUCentennialCampus
NCCUDuke
UNC-CH
Wilmington
ElizabethCity
Asheville
Cullowhee Fayetteville
Greenville
RTP
MCNC
Boone
MoreheadCity
Rocky Mount
Qwest
RTP RPoP
NCREN3• Increased bandwidth• Increased reliability• Increased resiliency
Grid Technologies
Major New Computing Technology Under development since mid-1990s
Distinguishing Characteristics “Middleware” to support efficient resource sharing in a
distributed, heterogeneous computing and data storage environment
Focus on use of large-scale computing and data storage
Some Major Grid Efforts NASA IPG—Testbed linking selected NASA centers DataGrid—International Grid being developed for high-
energy physics (CERN)
Grid Technologies (cont’d)
Some Major Grid Efforts (cont’d)
GriPhyN—Research in Grid technologies for physics applications (Argonne, Florida)
e-Science Grid—Major effort in UK to develop a Grid infrastructure for science and engineering research
BIRN—Data Grid focused on neuroimaging data (UCSD, SDSC)
North CarolinaGenomics and Bioinformatics Consortium
Goal Provide a venue for Consortium members to share
information and resources, plan strategic initiatives, and form alliances
Distributed Across North Carolina Concentration in Research Triangle, but extends across
all of North Carolina
Diverse Goals and Expertise Human health, including animal models; agriculture
and forestry; evolutionary biology basic research; tool development
Overall NC BioGrid Architecture
Computing and Data Resources
Network
Grid Middleware
BioApp#1
BioApp#2
BioApp#3
…
Globus, Legion, …
Grid-aware, -enabled bioinformatics applications
NCREN3
NCSC plusMember’s Computing Centers
NC BioGrid Project
Two Phases Testbed Phase—test existing middleware, resolve
issues, prepare detailed plan (12-18 months) Production Phase—create and operate NC BioGrid
Funding for Testbed from MCNC
Project Manager Phil Emer, MCNC, Chief Architect/NC BioGrid
Project Oversight MCNC Board of Directors HPCC Advisory Board NC BioGrid Technical Advisory Group