SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
1
Transcript of SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the...
![Page 1: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/1.jpg)
SimMillennium and Beyond
From “Computer Systems, Computational Science and Engineering
in the Large” to “petabyte stores”
David Culler,
NSF Site VisitMarch 5, 2003
![Page 2: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/2.jpg)
Millennium 2
SimMillennium Project Goals
• Vision: To work, think, and study in a computationally rich environment with deep information stores and powerful services
• Enable major advances in Computational Science and Engineering
– Simulation, Modeling, and Information Processing becoming ubiquitous
• Explore novel design techniques for large, complex systems– Fundamental Computer Science problems ahead are problems of scale
– Organized in concert with Univ. structure => computational economy
• Develop fundamentally better ways of assimilating and interacting with large volumes of information and with each other
• Explore emerging technologies– networking, OS, devices
![Page 3: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/3.jpg)
Millennium 3
Research Infrastructure We Built
• Cluster of Clusters (CLUMPS) distributed over multiple departments
– gigabit ethernet within and between
– Myrinet High speed interconnect
• Vineyard Cluster System Architecture– Rootstock remote cluster installation tools
– Ganglia remote cluster monitoring
– GEXEC remote execution, GM (Myricom) messaging, MPI
– PCP – parallel file tools
– collection of port daemons, tools to make it all hand together
• Gigabit to desktop, immersadesk, ...
![Page 4: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/4.jpg)
Millennium 4
169.229.51.230
Cisco-6500
1/210001002
Cisco-6500
MillenniumClusterSoda
Millennium Gen3 Network TopologyIn a perfect world ....
Backbone
AstrophysicsClusterCampbell
MathClusterEvans
EECSClusterCory
202UCB campus core
GigE DesktopSodasoda442-xlr
soda498-xlr soda598-xlr
soda798-xlr(soda542-xlr)
soda698-xlr
OceanstoreClusterSoda
1200
1200 1100
1100
1200
Millennium | Clustered Com puting Research Group | Univers ity of California, Berke ley | 15 Nov 02
BigIron8k
1100
1100
1100
FutureClusters
BigIron8k
201
FastIron1500
FastIron1500
soda-bb
evans-bb
Future WAN/CITRISMillennium backbone providesplug and play support for:10 GigE LAN/WAN PHYOC-3,12,48 POS
m ath-gw
astro-gw
ocean-gw 1
eecs-gw
m il-gw
citris -gwNetw orks under
Millenniummanagement
Netw orks notunder Millennium
management
Key:Primary linkSecondary linkAll links are 1000Mbps
CITRISClusterSoda
BigIron4k
BigIron4k
ocean-gw 2
AdministrativeClusterSoda
adm in-gw
???
NPACIRocksCluster
PlanetLabCluster
CITRISPilotCluster
NOW
![Page 5: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/5.jpg)
Millennium 5
Cluster Counts
• Millennium Central Cluster– 99 Dell 2300/6400/6450 Xeon Dual/Quad: 336 processors– Total: 238 GB memory, 2 TB disk– Myrinet 2000 + 1000Mb fiber ethernet
• Millennium Campus Clusters (Astro, Math, CE, EE, Physics, Bio)– 176 proc, 34 GB mem, 1.2 TB local disk– total: 512 proc, 292 GB mem, 3.2 TB scratch
• NPACI ROCKS Cluster– 8 proc, 2 GB mem, 36 GB
• OceanStore/ROC cluster• PlanetLab Cluster
– 6 prc, 1.32 GHz, 3 GB mem, 180 GB
• CITRIS Cluster 1: 3/2002 deployment (Intel Donation)– 4 Dell Precision 730 Itanium Duals: 8 processors– Total: 8GB memory, 128GB disk– Myrinet 2000 + 1000Mb copper ethernet (SimMil)
• CITRIS Cluster 2: deployment (Intel Donation)– ~128 Dell McKinley class Duals: 256 processors
» 16x2 installed– Total: ~512GB memory, ~8TB disk– Myrinet 2000 + 1000Mb copper ethernet (SimMil)
• Many phasing out– NOW, Ninja, Dig Lab. ...
![Page 6: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/6.jpg)
Millennium 6
Cluster Top Users 2/2003
• ~800 users total on central cluster• 84 major users for 2/2003: average 62% total CPU utilization
– ROC – middle tier storage layer testing/performance (bling,ach,fox@stanford)– Computer Vision Group – image recognition, boundary detection and
segmentation, data mining (aberg,lwalk,dmartin,ryanw, xren) “2 hours on cluster vs. 2 weeks on local resources”
– Computational Biology Lab - large-scale biological sequence database searches in parallel (brenner@compbio)
– Tempest - TCAD tools for Next Generation Lithography (yunfei)– Internet services – performance characteristics of multithreaded servers
(jrvb,jcondit)– Sensor Networks – power reduction (vwen)– Economic modeling – (stanton@haas)– Machine learning – information retrieval, text processing (blei)– Analyzing trends in BGP routing tables (sagarwal, mccaesar)– Graphics - Optical simulation and high quality rendering (adamb, csh)– Digital Library Project – image retreival by image content (loretta)– Bottleneck Analysis of Fine-grain Parallelism – (bfields)– SPUR – Earthquake simulation (jspark@ce)– Titanium – compiler and runtime system design for high performance parallel
programming languages (bonachea)– AMANDA – neutrino detection from polar ice core samples (amanda)
http://ganglia.millennium.berkeley.edu
![Page 7: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/7.jpg)
Millennium 7
Impact
• Numerous groups doing research they could not have done without it
– Malik photorealistic rendering, physics simulation,..– Yelick, Titanium, Heart Modeling, ...– Wilensky, Digital Library, image segmentation– Brewer, Culler, Ninja Internet Service Arch...– Price, AMANDA, ...– Kubiatowicz, OceanStore, Katz, Sahara, Hellerstein PIER
• First eScience Portals– Tempest, EUV lithography, Sugar MEMS simulation services
• safe.millennium.berkeley.edu on Sept 11– built w/i hours, scaled to million hits per day
• CS267 – core of MS of computation science X• Cluster tools widely adopted
– NPACI ROCKS– Ganglia the most downloaded cluster tool, in all the distributions,
OSCAR, open source development team
![Page 8: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/8.jpg)
Millennium 8
Computational Economy
• Developed economic-based resource allocation– decentralized design
– interactive and batch
• Advanced the SOA– controlled experiments with priced and unpriced clusters
– analysis of utility gain relative to traditional resource allocation algorithms
• Picked up in several other areas– index – pricing internet bandwidth
– iceberg – pricing in telco/internet merge
– core to internet design for planetary scale services
![Page 9: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/9.jpg)
Millennium 9
Emergence of Planetary-Scale Services
• In past year Millennium became THE simulation engine for P2P
– oceanstore, I^3, Sahara, BGP alternatives, PIER
• Ganglia was the technical enabler for planetlab– > 100 machines at > 50 sites in > 8 countries
– THE testbed for internet-scale systems research
![Page 10: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/10.jpg)
Millennium 10
Fundamental Bottleneck: Storage
• Current storage hierarchy– based on NPACI reference
– 3 TB local /scratch and /net/MMxx/scratch 4-day deletion
– 0.5 TB global NFS /work 9-day deletion
» inadequate BW and capacity
– ~4 TB /home and /project
» uniform naming through automount
» doesn’t scale to cluster access
• => augment capacity, BW, and metadata BW
• we’ve been tracking cluster storage options since xFS on NOW and Tertiary Disk in 1995.
![Page 11: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/11.jpg)
Millennium 11
Another Cluster – a storage cluster
Millennium Clusters
Citris Clusters
Massive StorageClusters
Scalable GigECore
Myrinet SA
NDesigned for higher reliability
Avoid competition from on-going computation
Local disks heavily used as scratch
![Page 12: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/12.jpg)
Millennium 12
Foundry8000
1TFlop 1.6TB memory128 Dual Itanium 2
Compute Nodes
4 Storage Controller2 MetaServers
3.5TB Fibre ChannelStorage
Myrinet2000
Foundry8000
Foundry1500
CampusCore
128
6
128
4
1 Gigabit Ethernet
Myrinet
Fibre Channel
2 Frontend Nodes2
2
6
Initial Cluster Design with 3.5TB Distributed File Store
![Page 13: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/13.jpg)
Millennium 13
Storage Controller
864GBStorage Controller
864GBStorage Controller
864GBStorage Controller
864GB
= 36GB 15K rpm = fibre channel = gbit ethernet
Meta Server Meta Server
Initial 3.5 TB Cluster Data Store
= myrinet
BlueARC si8300 with 24 36GB 15K rpm disks and growth room
![Page 14: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/14.jpg)
Millennium 14
Lustre: A High-Performance, Scalable, Distributed File System for Clusters and Shared-Data Environments
• Progress since xFS– TruCluster, GPFS, pvfs, ...
– need “production quality”
– NAS is finally here
• History: CMU, Seagate, Los Alamos, Sandia, TriLabs
• Distributed Filesystem replacing NFS
• Object based file storage– object like inode represents a file
• Opensource development managed by Cluster File Systems, Inc.
• Gaining wide acceptance for production high-performance computing
– PNNL and LLNL
– Los Alamos and Sandia Labs
– HP support as part of linux cluster effort
– Intel Enterprise Architecture Lab
![Page 15: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/15.jpg)
Millennium 15
Lustre: Key Advantages
• Open protocols, standards: Portals API, XML, LDAP
• Runs on commodity PC hardware + 3rd party OST– such as BlueArc
• Uses commodity filesystems on OSTs – such as ext3, JFS ReiserFS and XFS
• Scalable and efficient design splits– (qty 2) Metadata servers: storing file system metadata
– (up to 100) Object storage targets: storing files
– To support up to 2000+ clients
• Flexible model for adding new storage to existing Lustre file system.
• Metadata server failover
![Page 16: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/16.jpg)
Millennium 16
Meta Servers(Meta Data Servers)
Clients
Storage Controllers(Object Storage Targets)
system and parallelfile I/O, file locking
directory metadataand concurrency
recovery,file status,
file creation
Lustre: Functionality
![Page 17: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/17.jpg)
Millennium 17
Growth Plan
• based on conservative 50% per year density– expect roughly double
y03 y04 y05 y06 y07
3.5 TB4 SS2 MS
8 TB6 SS3 MS
14 TB8 SS3 MS
23 TB8 SS3 MS
35 TB8 SS3 MS
![Page 18: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/18.jpg)
Millennium 18
Example Projects
• Cluster monitoring trace– ¼ TB per year for 300 nodes
• ROC failure data– ¼ TB per year, much higher if get industrial feeds
• Digital Library
• Video– 100 GB/hour uncompressed
• Vision– 100 GB per experiement
• PlanetLab– internet wide instrumentation and logging
We will look back and say,
“we are doing research today that
we could not have done without
this”
![Page 19: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/19.jpg)
Millennium 19
End of the Tape Era
Aug, 1999 NSF RI 99 18
Massive Cheap Storage
•Basic unit:
2 PCs double-ending four SCSI chains
Currently serving Fine Art at http://www.thinker.org/imagebase/
log $/GB
year
disk
tape
2001
![Page 20: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/20.jpg)
Millennium 20
Emergence of the Sensor Net Era
• 100s of research groups and companies using the Berkeley Mote / TinyOS platform
• dozens of projects on campus
• billions of networked devices connected to the physical world – constantly streaming data
• => start building the storage and processing infrastructure for this new class of system today!
![Page 21: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/21.jpg)
Millennium 21
Environment Monitoring Experience
• Canonical “patch” net architecture
• live & historical readings www.greatduckisland.net
• 43 nodes, 7/13-11/18
• above and below ground
• light, temperature, relative humidity, and occupancy data, at 1 minute resolution
• >1 million measurements– Best nodes ~90,000
• 3 major maintenance events
• node design and packaging in harsh environment
– -20 – 100 degrees, rain, wind
• power mgmt and interplay with sensors and environment
Basestation
Gateway
Sensor Patch
Patch Network
Base-Remote Link
Data Service
Internet
Client Data Browsingand Processing
Sensor Node
Transit Network
![Page 22: SimMillennium and Beyond From “Computer Systems, Computational Science and Engineering in the Large” to “petabyte stores” David Culler, NSF Site Visit.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d405503460f94a19917/html5/thumbnails/22.jpg)
Millennium 22
Sample ResultsNode Lifetime and Utility
Effective Communication Phase
Packet Loss
Correlation