BIC$LSU (Big(Data(Research(Integraonwith …€¦ · 5 LSUCyberinfrastructureforBigData Storage...
Transcript of BIC$LSU (Big(Data(Research(Integraonwith …€¦ · 5 LSUCyberinfrastructureforBigData Storage...
1
NSF CC-‐NIE Integra/on: Bridging, Transferring and Analyzing Big Data over 10Gbps Campus-‐Wide SoGware Defined Networks
BIC-‐LSU
(Big Data Research Integra6on with Cyberinfrastructure for LSU)
Seung-‐Jong (Jay) Park
Associate Professor Computer Science
Center for Computa/on & Technology Louisiana State University
2
Big Data Research at LSU q Biology & Veterinary
Ø Genome Sequencing
q Chemistry Ø Experiment & Simula/on
q Computer Science Ø Data Mining & Visualiza/on
q Costal Science: Ø Hazard Simula/on & Modeling
q Physics & Astronomy: Ø LIGO
Fast supercomputer, Big Data requires Large storage,
High speed network
3
Challenges @ LSU
HPC clusters
How to Store
How to Transfer
How to Process
§ Each research lab is located at remote place § It has slow storages: HDD speed < 1Gbps
§ Network between a Lab and HPC : bandwidth < 1Gbps
§ Massage Passing Interface (MPI) : Hard to program
4
3 Objec6ves @ LSU
HPC clusters
How to Store
How to Transfer
How to Process
1. Develop 8 SSD Storage Servers = 12TB & 20Gbps I/O Bandwidth
2. Network between Labs and HPC : bandwidth = 20 Gbps
3. Develop Virtual Hadoop Cluster
5
LSU Cyberinfrastructure for Big Data
Storage Server @Vet School
10Gbps
Edge OF Switch Pronto 3290
LONI
Cisco AS9000
Storage Server @Chemistry
Storage Server @CCT
Storage Server @Biology
Pluribus Core OF Switch @D Boyd
Aggregation OF Switch Pronto 3780
Hadoop On Demand SuperMike II
@Frey
Hadoop Cluster @Frey
Gene Sequencer
Pluribus Core OF Switch @Frey
Storage Server @Costal
40Gbps
Storage Server @EECS
100Gbps Router @Frey
2 X 10Gbps
40Gbps
Internet2 10Gbps
Collaboration with Samsung For SSD storage servers
6
Case Study: Genome Sequence Analysis
q Human Genome Sequencing Ø An NIH standard set of humane genome genome sequence has
470 GB raw data and requires more than TB memory for assembly
q Hadoop/Giraph-‐based soGware framework
Ø Assembling billions of short reads into one 3 billion base pair sequences
7
Case Study: De novo Assembly q Developing Giraph/Hadoop based De novo Assembler
8
BIC-‐LSU: Milestones • 1st year:
– 2013 Sept: Project start – 2013 Dec: Constructed fibers at 2 sites (CCT,CS) – 2013 Mar: SSD storage servers by Samsung – 2013 Apr: Tes/ng Openflow Switches (PICA8, HP, Pluribus) – 2013 May: Shipping SSD servers from Samsung – 2013 July: Finish fibers at 4 sites (Bio, Vet, Chem, Coastal)
• 2nd year: – 2013 Aug: deploy OF switches – 2013 Dec: develop a POX based OF controller – 2014 Feb: develop web-‐based Gateway – 2014 May: Demonstrate Genome Assembly over BIC-‐LSU