Emergence of the Memory Centric Architectures · Architecture Algorithms & Architectures Reducing...

Emergence of the Memory Centric Architectures

Balint FleischerChief Scientist

http://www.huawei.com/

http://www.huawei.com/

AI is Everywhere

Personal AssistantsNew customer experiencesUnderstanding intentions

Anticipating needs

Advising the CEOExternal Sensing:

Market trends, Competitive environment, Customer sentiment, Demand

Internal Sensing:Production Systems, Supply chain, Asset utilization, Employee Moral

Product Recommendation Personalization

Sentiment Analytics Preventative Maintenance

Fraud Detection eDiscovery

Medical Diagnosis Language Translation

Object recognition Smart City Management

Chatbots Smart Manufacturing

Threat Detection Customer Service

Health Assistant

Direction guide

Communication Assistant

Shopping Assistant

Entertainment

Education

Transportation

Business Consumer

Etc., Etc.,

Real Time AI is Putting a Pressure on Platform Scaling

OperationPersonalizationQuality Control

Alerting, RoutingNetwork Mgmt

Etc.

EfficiencyFailure Prediction

Yield MgmtService Pricing

Traffic CongestionEtc.

BusinessOptimization

Asset utilizationProduct mix

Customer sentimentCompetitive trends

Etc.

<Seconds Minutes Hours/Days

Accuracy & Response time Impacts Business

AI is Impacting Enterprise IT Architecture

Sensing Layer

AI Enhanced Applications

Enterprise Data Base

User’s view Enterprise GoalsTo identify connections between events, people and trends

Discover new insights

Uncovering breakthroughs and predicting trends

Enabling new customer experience via service personalization

Reinventing business models and operations

Next generation Enterprise IT

AI Computing is Challenging

Classic CPU Perf Roadmap

* Applied or narrow AI

GAP

AI Processing Perf DemandOptimized for EstimationProbabilistic calculationsDNN, ML algorithms etc.,

Data IntensiveNew Algorithms, New

Architecture

Optimized for AccuracyLogical operations

Arithmetic operationsData Store and Retrieve

Processor Research to Improve AI Performance

Processing

&

Circuits

Platform

Architecture

Algorithms

&

Architectures

Reducing Energy of Data MovementMemory Hierarchies

Computing in MemoryLow Latency Networks

Reducing Resource RequirementDomain specific Architectures

SW optimizationsCompilers

Energy efficient ConvNet**Binary Weight Networks*

XNOR Networks*Compression, Pruning

Reducing Power and Cost

New process technologies

Near Threshold Switching

DL Optimized Architectures

Bio Inspired systems

•XNOR-Net: ImageNet Classification using Binary Convolutional networks, Mohammed Rastegari, Et al 2016 •** Energy-Efficient ConvNets Through Approximate Computing, Bert Moons, Et al. 2016

Roadmap for a Faster, more efficient AI processor

Neuromorphic Systems

Research examples:Minitaur, SpiNNaker,

TrueNorth. NeuroGrid, Neurocluster, BrainScales,

ROLLS, others

CPU basedX86ARM

Power

GPU basedNVIDIA

Xeon PHIAMD

FPGA basedTeradeep

AlteraXilinx

DeePhi

Digital Analytics Systems

ASIC basedTPU

WaveGraphcoreMovidus

EyeRiss, NeuFlowNeurostreamNeurocube

Others

“Bio Inspired Computing”

Research examples:Vector-Matrix

Multipliers MultiCore Systems

Etc.Prime, Isaac

NeuroMemristive

SystemsGeneral Purpose

Systems

AdvancedAI engines

Today Best

~100MMAC/S/mW ~1MMAC/S/pW

Note: Company and project names for reference only. No implied endorsement by Huawei

Platform Research to Improve AI Performance

Processing

&

Circuits

Platform

Architecture

Algorithms

&

Architectures

Reducing Energy of Data MovementMemory Hierarchies

Computing in MemoryLow Latency Networks

Reducing Resource RequirementDomain specific Architectures

SW optimizationsCompilers

Energy efficient ConvNet**Binary Weight Networks*

XNOR Networks*Compression, Pruning

Reducing Power and Cost

New process technologies

Near Threshold Switching

DL Optimized Architectures

Bio Inspired systems

•XNOR-Net: ImageNet Classification using Binary Convolutional networks, Mohammed Rastegari, Et al 2016 •** Energy-Efficient ConvNets Through Approximate Computing, Bert Moons, Et al. 2016

Reducing Latency to improve Performance

CPU L1-L3 cacheOn Die/On Package Memory

NVMe drives, SSD

DRAM, SCMNVDIMM-P, NVDIMM-N

Memory Hierarchy Technology choices

NVMe drives, NVDIMM-F

Balint Fleischer

Value

Very Low Latency <30nsecExtreme High BW

Low Latency <1 usec, Very High BWByte access, Maybe Non Volatile

Medium Latency <100usec, Medium BW, Block access

Moderate latency across network>1msec,

Moderate BW across network

Use case:Enterprise Data Base

Use case:Large Platform Memory

Data Management tools and architecture features are key

Optimizing Processor Data Movements

Application

Server Memory

In MemoryData Store

Low Latency <1usecMedium/High BW

Large capacityImproved energy efficiency (local data access)

General Purpose CPU

Data ProximityCo Locating Data and Processing

In Memory ComputeIntegrating Processing into memory array

Very Low Latency <<1 usecExtreme High BWLimited capacity

Very good energy efficiencySpecialized, Embedded processing

CPU

Processing Element

Hybrid Memory Cubebased concept

Supporting Ubiquitous AI processingUnifying CPU+AI into One Memory Centric Design for

Scalability

Memory Hub based architectureDistributed Memory to Memory ProtocolsIn Memory Distributed Data Store support

Local Data Store support

In Memory ComputeFor max BW to engine

Data Center Fabric

Large On Package Memory

Emerging Memory Technologies to Support Scalability

DRAM based HBM2, HMC, DDR5

NAND based 3DNAND, Z-NAND

DIMM based NVDIMM-F; NVDIMM-N; NVDIMM-P

New Memory types 3DXpoint, NRAM, MRAM, ReRAMNote: Product names for reference only. No implied endorsement by Huawei

Data Center for AI based Workloads

High Bisectional BW Non Blocking Fabric

tosupport large data streamspredictable performance

provisioning flexibility

Persistent Enterprise Data Base

Latency optimized protocolsfor

improving scalabilitypredictable performance

provisioning flexibility

AI workload Optimized Server Platformto

lower processing cost, increase performance, improve prediction

accuracy and lower energy consumption

Large Platform Memory for

Data Proximity and Distributed In Memory Data Store

NVM based architectureImproved scaling

performance Enhanced response time

Special pool (BMP)for

Low latency Compute

Thank [email protected]

6/12/2017 14Balint Fleischer Huawei

Emergence of the Memory Centric Architectures · Architecture Algorithms & Architectures Reducing...

Documents

Transcript of Emergence of the Memory Centric Architectures · Architecture Algorithms & Architectures Reducing...