Accelerators
description
Transcript of Accelerators
Accelerators
Ran GinosarAvinoam Kolodny
Yuval CassutoKoby CrammerShmuel WimerDani Lichinski
(research)(motivation) questions
• We love accelerators, but…• What accelerators ?• What workload? What “killer applications” ?• Why study / develop them?• Who needs them?• What architecture(s) ?• What goals are we seeking to fulfill ?– In addition to winning ICRI-CI research grants
Why accelerators?
• Semiconductor industry sells $300B/year (10% INTC)
– 1M high profit chips/day• $100/chip, $100M/day. Mostly CPU. • 10% of revenues. 100-1000% gross profit
– 90M low cost chips/day• $10/chip, $900M/day. 50% gross profit
• Growth < 10%• In the year 2023?– Need to expand into another rich industry
• Store-and-compute accelerators will be the driver
• Which industry is– Rich• Much richer than semiconductors
– Under-utilized– Begs for progress (and can pay for it)– Critical, will not disappear
• Video? Entertainment? Communication?
Health Care
• $2.5 Trillion in US alone– Already 10x the entire global semiconductor industry– $4.5T by 2020– Global is probably 3X, $15T by 2020
• Key challenge:– Today: imprecise, statistics-based diagnosis and
treatment– Develop into more efficient, more successful
discipline by combining science & computing
Future health care is computerized (store and compute)
• Medical/health data about 10B people– Genomics, proteomics (5 GB/person)– Health & medical record (1 GB/person)– Continuous accumulating readings of sensors
(4 GB/person)• Medical, environmental, food & drugs
• Monitor and process all individuals– Machine learning– Predict and alert medical conditions– Individualize drugs, diets, treatments
Storage required
• 10 GB/person• 10B people• 1020 Bytes (100 ExaBytes,
100 Mega-TeraBytes)– 100 million of today’s 1 TBytes disk.
100+ data centers– 500 MegaWatts to store, read and write• $350 Million / year
Computing required
• Run through 50% of data each day• Perform 10 op / byte• 1021 OP/day = 1016 OP/sec– Only 10M cores of 1 GOPS each– 100 data centers
• Power: only 10 MegaWatt– 2% of storage power
Solution: move computing closer to data
• The HMC industry already makes the first step
• 100,000 TSV vertical interconnects
Not yet there
• Wish to get closer: stack memory on top CPU ?• NO. Too hot– CPU operates above 100ºC– DRAM is useless above 85ºC
• Solution– Dispose of the CPU– Create 3D low-power (low temperature),
uniform-power-density, high-performance store & compute machine
Store & Compute
• 1 Tbyte / chip in 2020– Combined DRAM + NVM
• Accelerators– 1000 cores “many-core”• MIMD• Associative Processors• SIMD
• Internal + external networks
NOC2D Accelerator
p-m NOCp-m NOCp-m NOCp-m NOCp-m NOCp-m NOCp-m NOC3D Accelerator
NVMNVMNVMNVMNVM
DRAM+SRAM
NVMNVMNVMNVMNVMNVMNVMNVM
Challenges• Need 100M chips• Max 0.1 W / chip– Total 10 MWatt– 100-1000
data centers
5 mm
NOC2D Acceleratorp-m NOCp-m NOCp-m NOC
NVM
20 mm 20 mm
500 chips50 Watt
More challenges
• Understand workload• Understand algorithms• Architect the store & compute accelerators• Low low low power• High (data-intensive) performance
Approaches
1. Associative processors– Classic store & compute– Uniform power distribution– Massive parallelism– Very low power
2. Orthogonal access SIMD processors – Sequential and parallel access– Mitigate data-movement
bottleneck
Approaches
3. Average case computing– ALU that runs faster than worst case– And dissipates less power than worst case– Enables low power just-in-time architecture
4. Personalized vision/graphics for personal mobile devices– Inspires workload understanding
5. Memristive processors and resistive memories – Presented by Yuval Cassuto