[Harvard CS264] 01 - Introduction
description
Transcript of [Harvard CS264] 01 - Introduction
Lecture #1: Introduction | January 25th, 2011
Nicolas Pinto (MIT, Harvard) [email protected]
Massively Parallel ComputingCS 264 / CSCI E-292
...
Distant Students
Take a picture with...
I likea friend
I likehis dog
cool hardware
your mom
Today
Outline
Outline
Human? “Computing”
Massively Parallel Computing
Supercomputing
Cloud ComputingHigh-Throughput Computing
Many-core Computing
MPC
Human? “Computing”
Massively Parallel Computing
Cloud ComputingHigh-Throughput Computing
Many-core Computing
SupercomputingMPC
http://www.youtube.com/watch?v=jj0WsQYtT7M
Modeling & Simulation
• Physics, astronomy, molecular dynamics, finance, etc.
• Data and processing intensive
• Requires high-performance computing (HPC)
• Driving HPC architecture development
Top Dog (2008)
• Roadrunner, LANL
• #1 on top500.org in 2008 (now #7)
• 1.105 petaflop/s
• 3000 nodes with dual-core AMD Opteron processors
• Each node connected via PCIe to two IBM Cell processors
• Nodes are connected via Infiniband 4x DDR
CS264 (2009)
http://www.top500.org/lists/2010/11
Tianhe-1Aat NSC Tianjin
2.507 Petaflop7168 Tesla M2050 GPUs
Slide courtesy of Bill Dally (NVIDIA)
1 Petaflop/s = ~1M high-end laptops = ~world population with hand calculators 24/7/365 for ~16 years
http://news.cnet.com/8301-13924_3-20021122-64.html
What $100+ million can buy you...
Roadrunner (#7) Jaguar (#2)
http://www.lanl.gov/roadrunner/
Roadrunner (#7)
Jaguar (#2)
Who uses HPC?
Who uses HPC?
Human? “Computing”
Massively Parallel Computing
Supercomputing
Cloud ComputingHigh-Throughput Computing
Many-core Computing
MPC
Cloud Computing?
Buzzword ?
Careless Computing?
...
Response from the legend:
http://techcrunch.com/2010/12/14/stallman-cloud-computing-careless-computing/
Cloud Utility Computing?for CS264
http://code.google.com/appengine/
http://www.nilkanth.com/my-uploads/2008/04/comparingpaas.png
Web Data Explosion
How much Data?
• Google processes 24 PB / day, 8 EB / year (’10)
• Wayback Machine has 3 PB,100 TB/month (’09)
• Facebook user data: 2.5 PB, 15 TB/day (’09)
• Facebook photos: 15 B, 3 TB/day (’09) - 90 B (now)
• eBay user data: 6.5 PB, 50 TB/day (’09)
• “all words ever spoken by human beings”~ 42 ZB
Adapted from http://www.umiacs.umd.edu/~jimmylin/cloud-2010-Spring/
“640k ought to be enough for anybody.”- Bill Gates just a rumor (1981)
Disk Throughput
• Average Google job size: 180 GB
• 1 SATA HDD = 75 MB / sec
• Time to read 180 GB off disk: 45 mins
• Solution: parallel reads
• 1000 HDDs = 75 GB / sec
• Google’s solutions: BigTable, MapReduce, etc.
• Clear trend: centralization of computing resources in large data centers
• Q: What do Oregon, Iceland, and abandoned mines have in common?
• A: Fiber, juice, and space
• Utility computing!
Cloud Computing
Human? “Computing”
Massively Parallel Computing
Supercomputing
Cloud ComputingHigh-Throughput Computing
Many-core Computing
MPC
Instrument Data Explosion
Sloan Digital Sky Survey
ATLUM / Connectome Project
Another example?hint: Switzerland
CERN in 2005....
CERN Summer School 2005
CERN Summer School 2005
bad taste party...
pitchers...
CERN Summer School 2005
LHC
Maximilien Brice, © CERN
Maximilien Brice, © CERN
LHC
Maximilien Brice, © CERN
LHC
~5000 nodes (‘05)
CERN’s Cluster
CERN Summer School 2005
presentations...
Diesel Powered HPC
Life SupportLife Support……
Slide courtesy of Hanspeter Pfister
Murchison Widefield Array
How much Data?
• NOAA has ~1 PB climate data (‘07)
• MWA radio telescope: 8 GB/sec of data
• Connectome: 1 PB / mm3 of brain tissue (1 EB for 1 cm3)
• CERN’s LHC will generate 15 PB a year (‘08)
High Flops / Watt
Human? “Computing”
Massively Parallel Computing
Supercomputing
Cloud ComputingHigh-Throughput Computing
Many-core Computing
MPC
Computer Games
• PC gaming business:
• $15B / year market (2010)
• $22B / year in 2015 ?
• WOW: $1B / year
• NVIDIA Shipped 1B GPUs since 1993:
• 10 years to ship 200M GPUs (1993-2003)
• 1/3 of all PCs have more than one GPU
• High-end GPUs sell for around $300
• Now used for science application
CryEngine 2, CRYTEK
Intel Core i7-980X Extreme6 cores
1.17B transistors
NVIDIA GTX 580 SC512 cores
3B transistors
Many-Core Processors
http://en.wikipedia.org/wiki/Transistor_count
Data Throughput
MassiveData
Parallelism
InstructionLevel
Parallelism
Data Fits in Cache Huge Data
CPU
GPU
David Kirk, NVIDIA
3 of Top5 Supercomputers
!
"!!
#!!!
#"!!
$!!!
$"!!
%&'()*+#, -'./'0 1*2/3'* %4/2'5* 6788*09::
!"#$%&'()
Bill Dally, NVIDIA
Personal Supercomputers
~4 Teraflops @ 1500 Watts
Disruptive Technologies
• Utility computing
• Commodity off-the-shelf (COTS) hardware
• Compute servers with 100s-1000s of processors
• High-throughput computing
• Mass-market hardware
• Many-core processors with 100s-1000s of cores
• High compute density / high flops/W
Green HPC
NVIDIA/NCSA Green 500 Entry
Green HPC
NVIDIA/NCSA Green 500 Entry
128 nodes, each with:1x Core i3 530 (2 cores, 2.93 GHz => 23.4 GFLOP peak)1x Tesla C2050 (14 cores, 1.15 GHz => 515.2 GFLOP peak)4x QDR Infiniband4 GB DRAM
Theoretical Peak Perf: 68.95 TFFootprint: ~20 ft^2 => 3.45 TF/ft^2 Cost: $500K (street price) => 137.9 MF/$Linpack: 33.62 TF, 36.0 kW => 934 MF/W
One more thing...
Human? “Computing”
Massively Parallel Computing
Supercomputing
Cloud ComputingHigh-Throughput Computing
Many-core Computing
MPC
Human? “Computing”
Massively Parallel Computing
Supercomputing
Cloud ComputingHigh-Throughput Computing
Many-core Computing
MPC
Massively Parallel Human Computing ???
• “Crowdsourcing”
• Amazon Mechanical Turk (artificial artificial intelligence)
• Wikipedia
• Stackoverflow
• etc.
What is this course about?
What is this course about?Massively parallel processors
• GPU computing with CUDA
Cloud computing
• Amazon’s EC2 as an example of utility computing
• MapReduce, the “back-end” of cloud computing
Less like Rodin...
More like Bob...
Outline
wikipedia.org
Anant Agarwal, MIT
Power Cost
• Power ∝ Voltage2 x Frequency
• Frequency ∝ Voltage
• Power ∝ Frequency3
Jack Dongarra
Power Cost
Cores Freq Perf Power P/W
CPU 1 1 1 1 1
“New” CPU 1 1.5 1.5 3.3 0.45x
Multicore 2 0.75 1.5 0.8 1.88x
Jack Dongarra
Anant Agarwal, MIT
Problem with Buses
Problem with Disks
Tom’s Hardware
64 MB / sec
Good News
• Moore’s Law marches on
• Chip real-estate is essentially free
• Many-core architectures are commodities
• Space for new innovations
Bad News
• Power limits improvements in clock speed
• Parallelism is the only route to improve performance
• Computation / communication ratio will get worse
• More frequent hardware failures?
BadNews
A “Simple” Matter of Software
• We have to use all the cores efficiently
• Careful data and memory management
• Must rethink software design
• Must rethink algorithms
• Must learn new skills!
• Must learn new strategies!
• Must learn new tools...
tew Our mantra: always use the right tool !
Outline
Instructor: Nicolas Pinto
• biz card (joke on it abt PhD now)
• I’m like you guys
• not an expert
• we are all here to learn from each other
• recent graduate
• collaborative event this class
The Rowland Institute at HarvardHARVARD UNIVERSITY
~50% of is for vision!
Everyone knows that...
The ApproachReverse and Forward Engineering the Brain
The ApproachReverse and Forward Engineering the Brain
Build Artificial System
FORWARD REVERSE Study
Natural System
brain = 20 petaflops !
Linus Pauling(double Nobel Prize Winner)
If you want to have good ideas you must have many ideas.”“
Most of them will be wrong, and what you have to learn is
which ones to throw away.
“”
High-throughput Screening
thousands of big models
The curse of speed...and the blessing of massively parallel computing
large amounts of unsupervised learning experience
The curse of speed...and the blessing of massively parallel computing
No off-the-shelf solution? DIY!
Engineering (Hardware/SysAdmin/Software) Science
Leverage non-scientific high-tech markets and their $billions of R&D...
Gaming: Graphics Cards (GPUs), PlayStation 3
Web 2.0: Cloud Computing (Amazon, Google)
Build your own!
DIY GPU pr0n (since 2006) Sony Playstation 3s (since 2007)
The blessing of GPUs
Q9450 (Matlab/C) [2008]
Q9450 (C/SSE) [2008]
7900GTX (OpenGL/Cg) [2006]
PS3/Cell (C/ASM) [2007]
8800GTX (CUDA1.x) [2007]
GTX280 (CUDA2.x) [2008]
GTX480 (CUDA3.x) [2010] 974.3
339.3
192.7
111.4
68.2
9.0
0.3
>1000X speedup is game changing...
Pinto, Doukhan, DiCarlo, Cox PLoS 2009
Pinto, Cox GPU Comp. Gems 2011
speed(in billion floating point operations per second)
(Fermi)
Tired Of Waiting For Your Computations?
6.963 (IAP09)
Supercomputing on your desktop:
Programming the next generation of cheap and
massively parallel hardware using CUDA
This IAP has been designed to give students extensive
hands-on experience in using a new potentially disruptive
technology. This technology enables the masses having
access to supercomputing capabilities.
We will introduce students to the CUDA programming
language developed by NVIDIA Corp. which, has been an
essential step towards simplifying and unifying the
programming of massively parallel chips.
This IAP is supported by generous contributions from
NVIDIA Corp. , The Rowland Institute at Harvard, and MIT
(OEIT, BCS, EECS) and will be featuring talks given by
experts from various fields.
Co-Instructor:Hanspeter Pfister
Visual Computing• Large image & video collections
• Physically-based modeling
• Face modeling and recognition
• Visualization
VolumePro 500
Released1999
GPGPU
Connectome
NSF CDI Grant ’08-’11
NVIDIA CUDA Center of Excellence
TFs
• Claudio Andreoni (MIT Course 18)
• Dwight Bell (Harvard DCE)
• Krunal Patel (Accelereyes)
• Jud Porter (Harvard SEAS)
• Justin Riley (MIT OEIT)
• Mike Roberts (Harvard SEAS)
Claudio Andreoni(MIT Course 18)
Dwight Bell(Harvard DCE)
Krunal Patel(Accelereyes)
Jud Porter(Harvard SEAS)
Justin Riley(MIT OEIT)
Mike Roberts(Harvard SEAS)
About You
About you...
• Undergraduate ? Graduate ?
• Programming ? >5 years ? <2 years ?
• CUDA ? MPI ? MapReduce ?
• CS ? Life Sc ? Applied Sc ? Engineering ? Math ? Physics ?
• Humanities ? Social Sc ? Economy ?
Outline
CS 264 Goals• Have fun!
• Learn basic principles of parallel computing
• Learn programming with CUDA
• Learn to program a cluster of GPUs (e.g. MPI)
• Learn basics of EC2 and MapReduce
• Learn new learning strategies, tools, etc.
• Implement a final project
Experimental Learning Strategy
Mem
ory
“rec
all”
Repeat, repeat, re
peat
Lectures
•Theory, Architecture, Patterns ?
•Act 1: GPU Computing
•Act II: Cloud Computing
•Act III: Guest Lectures
Lectures “Format”
• 2x ~ 45min regular “lectures”
• ~ 15min “Clinic”• we’ll be here to fix your problems
• ~ 5 min: Life and Code “Hacking”:• GTD Zen
• Presentation Zen
• Ninja Programming Tricks & Tools, etc.
• Interested? email [email protected]
Act I: GPU Computing
• Introduction to GPU Computing
• CUDA Basics
• CUDA Advanced
• CUDA Ninja Tricks !
Performance / Effort
Matlab
C/SSE
PS3
GT20010.0
30.0
10.0
0.5
339.3
111.4
9.0
0.3
Performance (g!ops) Development Time (hours)
3D Filterbank Convolution
Empirical results...
Performance (g!ops)
Q9450 (Matlab/C) [2008]
Q9450 (C/SSE) [2008]
7900GTX (Cg) [2006]
PS3/Cell (C/ASM) [2007]
8800GTX (CUDA1.x) [2007]
GTX280 (CUDA2.x) [2008]
GTX480 (CUDA3.x) [2010] 974.3
339.3
192.7
111.4
68.2
9.0
0.3
>1000X speedup is game changing...
Act II: Cloud Computing
• Introduction to utility computing
• EC2 & starcluster (Justin Riley, MIT OEIT)
• Hadoop (Zak Stone, SEAS)
• MapReduce with GPU Jobs on EC2
Amazon’s Web Services
• Elastic Compute Cloud (EC2)
• Rent computing resources by the hour
• Basic unit of accounting = instance-hour
• Additional costs for bandwidth
• You’ll be getting free AWS credits for course assignments
MapReduce
• Functional programming meets distributed processing
• Processing of lists with <key, value> pairs
• Batch data processing infrastructure
• Move the computation where the data is
Act III: Guest Lectures• Andreas Knockler (NYU): OpenCL & PyOpenCL
• John Owens (UC Davis): fundamental algorithms/data structures and irregular parallelism
• Nathan Bell (NVIDIA): Thrust
• Duane Merrill* (Virginia Tech): Ninja Tricks
• Mike Bauer* (Stanford): Sequoia
• Greg Diamos (Georgia Tech): Ocelot
• Other lecturers* from Google, Yahoo, Sun, Intel, NCSA, AMD, Cloudera, etc.
Labs
• Lead by TF(s)
• Work on an interesting small problem
• From skeleton code to solution
• Hands-on
53 Church St.
53 Church St.
53 Church St.
53 Church St., Rm 10453 Church St., Room 104
Thu, Fri 7.35-9.35 pm
53 Church St., Rm 10553 Church St., Room 105
NVIDIA Fx4800 Quadro• MacPro
• NVIDIA Fx4800 Quadro, 1.5 GB
Resonance @ SEAS• Quad-core Intel Xeon
host, 3 GHz, 8 GB
• 8 Tesla S1070s (32 GPUs, 4 GB each)
• 16 quad-core Intel Xeons, 2 GHz, 16 GB
• http://community.crimsongrid.harvard.edu/getting-started/resources/resonance-cuda-host
What do you need to know?
• Programming (ideally in C / C++)
• See HW 0
• Basics of computer systems
• CS 61 or similar
Homeworks
• Programming assignments
• “Issue Spotter” (code debug & review, Q&A)
• Contribution to the community(OSS, Wikipedia, Stackoverflow, etc.)
• Due: Fridays at 11 pm EST
• Hard deadline - 2 “bonus” days
Office Hours
• Lead by a TF
• 104 @ 53 Church St (check website and news feed)
Participation
• HW0 (this week)
• Mandatory attendance for guest lectures
• forum.cs264.org
• Answer questions, help others
• Post relevant links and discussions (!)
Final Project
• Implement a substantial project
• Pick from a list of suggested projects or design your own
• Milestones along the way (idea, proposal, etc.)
• In-class final presentations
• $500+ price for the best project
Grading
• On a 0-100 scale
• Participation: 10%
• Homework: 50%
• Final project: 40%
www.cs264.org
• Detailed schedule (soon)
• News blog w/ RSS feed
• Video feeds
• Forum (forum.cs264.org)
• Academic honesty policy
• HW0 (due Fri 2/4)
iPhD
Thank you!
iPhD one more thingfrom WikiLeaks?
Is this course for me ???
This course is not for you...
• If you’re not genuinely interested in the topic
• If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software
• If you’re not ready to do a lot of programming
• If you’re not open to thinking about computing in new ways
• If you can’t put in the time
Slide after Jimmy Lin, iSchool, Maryland
Otherwise...It will be a richly rewarding experience!
Guaranteed?!
http://davidzinger.wordpress.com/2007/05/page/2/
Be Patient
Be Flexible
Be Constructive
It would be a win-win-win situation!
(The Office Season 2, Episode 27: Conflict Resolution)
Hypergrowth ?
Acknowledgements
• Hanspeter Pfister & Henry Leitner, DCE
• TFs
• Rob Parrott & IT Team, SEAS
• Gabe Russell & Video Team, DCE
• NVIDIA, esp. David Luebke
• Amazon
COME
Next?
• Fill out the survey: http://bit.ly/enrb1r
• Get ready for HW0 (Lab 1 & 2)
• Subscribe to http://forum.cs264.org
• Subscribe to RSS feed: http://bit.ly/eFIsqR