Accelerators

17
Accelerators Ran Ginosar Avinoam Kolodny Yuval Cassuto Koby Crammer Shmuel Wimer Dani Lichinski

description

Accelerators. Ran Ginosar Avinoam Kolodny Yuval Cassuto Koby Crammer Shmuel Wimer Dani Lichinski. (research)(motivation) questions. We love accelerators, but… What accelerators ? What workload? What “killer applications” ? Why study / develop them? Who needs them? - PowerPoint PPT Presentation

Transcript of Accelerators

Page 1: Accelerators

Accelerators

Ran GinosarAvinoam Kolodny

Yuval CassutoKoby CrammerShmuel WimerDani Lichinski

Page 2: Accelerators

(research)(motivation) questions

• We love accelerators, but…• What accelerators ?• What workload? What “killer applications” ?• Why study / develop them?• Who needs them?• What architecture(s) ?• What goals are we seeking to fulfill ?– In addition to winning ICRI-CI research grants

Page 3: Accelerators

Why accelerators?

• Semiconductor industry sells $300B/year (10% INTC)

– 1M high profit chips/day• $100/chip, $100M/day. Mostly CPU. • 10% of revenues. 100-1000% gross profit

– 90M low cost chips/day• $10/chip, $900M/day. 50% gross profit

• Growth < 10%• In the year 2023?– Need to expand into another rich industry

• Store-and-compute accelerators will be the driver

Page 4: Accelerators

• Which industry is– Rich• Much richer than semiconductors

– Under-utilized– Begs for progress (and can pay for it)– Critical, will not disappear

• Video? Entertainment? Communication?

Page 5: Accelerators

Health Care

• $2.5 Trillion in US alone– Already 10x the entire global semiconductor industry– $4.5T by 2020– Global is probably 3X, $15T by 2020

• Key challenge:– Today: imprecise, statistics-based diagnosis and

treatment– Develop into more efficient, more successful

discipline by combining science & computing

Page 6: Accelerators

Future health care is computerized (store and compute)

• Medical/health data about 10B people– Genomics, proteomics (5 GB/person)– Health & medical record (1 GB/person)– Continuous accumulating readings of sensors

(4 GB/person)• Medical, environmental, food & drugs

• Monitor and process all individuals– Machine learning– Predict and alert medical conditions– Individualize drugs, diets, treatments

Page 7: Accelerators

Storage required

• 10 GB/person• 10B people• 1020 Bytes (100 ExaBytes,

100 Mega-TeraBytes)– 100 million of today’s 1 TBytes disk.

100+ data centers– 500 MegaWatts to store, read and write• $350 Million / year

Page 8: Accelerators

Computing required

• Run through 50% of data each day• Perform 10 op / byte• 1021 OP/day = 1016 OP/sec– Only 10M cores of 1 GOPS each– 100 data centers

• Power: only 10 MegaWatt– 2% of storage power

Page 9: Accelerators

Solution: move computing closer to data

• The HMC industry already makes the first step

• 100,000 TSV vertical interconnects

Page 10: Accelerators
Page 11: Accelerators
Page 12: Accelerators

Not yet there

• Wish to get closer: stack memory on top CPU ?• NO. Too hot– CPU operates above 100ºC– DRAM is useless above 85ºC

• Solution– Dispose of the CPU– Create 3D low-power (low temperature),

uniform-power-density, high-performance store & compute machine

Page 13: Accelerators

Store & Compute

• 1 Tbyte / chip in 2020– Combined DRAM + NVM

• Accelerators– 1000 cores “many-core”• MIMD• Associative Processors• SIMD

• Internal + external networks

NOC2D Accelerator

p-m NOCp-m NOCp-m NOCp-m NOCp-m NOCp-m NOCp-m NOC3D Accelerator

NVMNVMNVMNVMNVM

DRAM+SRAM

NVMNVMNVMNVMNVMNVMNVMNVM

Page 14: Accelerators

Challenges• Need 100M chips• Max 0.1 W / chip– Total 10 MWatt– 100-1000

data centers

5 mm

NOC2D Acceleratorp-m NOCp-m NOCp-m NOC

NVM

20 mm 20 mm

500 chips50 Watt

Page 15: Accelerators

More challenges

• Understand workload• Understand algorithms• Architect the store & compute accelerators• Low low low power• High (data-intensive) performance

Page 16: Accelerators

Approaches

1. Associative processors– Classic store & compute– Uniform power distribution– Massive parallelism– Very low power

2. Orthogonal access SIMD processors – Sequential and parallel access– Mitigate data-movement

bottleneck

Page 17: Accelerators

Approaches

3. Average case computing– ALU that runs faster than worst case– And dissipates less power than worst case– Enables low power just-in-time architecture

4. Personalized vision/graphics for personal mobile devices– Inspires workload understanding

5. Memristive processors and resistive memories – Presented by Yuval Cassuto