Emergence of the Memory Centric Architectures · Architecture Algorithms & Architectures Reducing...
Transcript of Emergence of the Memory Centric Architectures · Architecture Algorithms & Architectures Reducing...
Emergence of the Memory Centric Architectures
Balint FleischerChief Scientist
AI is Everywhere
Personal AssistantsNew customer experiencesUnderstanding intentions
Anticipating needs
Advising the CEOExternal Sensing:
Market trends, Competitive environment, Customer sentiment, Demand
Internal Sensing:Production Systems, Supply chain, Asset utilization, Employee Moral
Product Recommendation Personalization
Sentiment Analytics Preventative Maintenance
Fraud Detection eDiscovery
Medical Diagnosis Language Translation
Object recognition Smart City Management
Chatbots Smart Manufacturing
Threat Detection Customer Service
Health Assistant
Direction guide
Communication Assistant
Shopping Assistant
Entertainment
Education
Transportation
Business Consumer
Etc., Etc.,
Real Time AI is Putting a Pressure on Platform Scaling
OperationPersonalizationQuality Control
Alerting, RoutingNetwork Mgmt
Etc.
EfficiencyFailure Prediction
Yield MgmtService Pricing
Traffic CongestionEtc.
BusinessOptimization
Asset utilizationProduct mix
Customer sentimentCompetitive trends
Etc.
<Seconds Minutes Hours/Days
Accuracy & Response time Impacts Business
AI is Impacting Enterprise IT Architecture
Sensing Layer
AI Enhanced Applications
Enterprise Data Base
User’s view Enterprise GoalsTo identify connections between events, people and trends
Discover new insights
Uncovering breakthroughs and predicting trends
Enabling new customer experience via service personalization
Reinventing business models and operations
Next generation Enterprise IT
AI Computing is Challenging
Classic CPU Perf Roadmap
* Applied or narrow AI
GAP
AI Processing Perf DemandOptimized for EstimationProbabilistic calculationsDNN, ML algorithms etc.,
Data IntensiveNew Algorithms, New
Architecture
Optimized for AccuracyLogical operations
Arithmetic operationsData Store and Retrieve
Processor Research to Improve AI Performance
Processing
&
Circuits
Platform
Architecture
Algorithms
&
Architectures
Reducing Energy of Data MovementMemory Hierarchies
Computing in MemoryLow Latency Networks
Reducing Resource RequirementDomain specific Architectures
SW optimizationsCompilers
Energy efficient ConvNet**Binary Weight Networks*
XNOR Networks*Compression, Pruning
Reducing Power and Cost
New process technologies
Near Threshold Switching
DL Optimized Architectures
Bio Inspired systems
•XNOR-Net: ImageNet Classification using Binary Convolutional networks, Mohammed Rastegari, Et al 2016 •** Energy-Efficient ConvNets Through Approximate Computing, Bert Moons, Et al. 2016
Roadmap for a Faster, more efficient AI processor
Neuromorphic Systems
Research examples:Minitaur, SpiNNaker,
TrueNorth. NeuroGrid, Neurocluster, BrainScales,
ROLLS, others
CPU basedX86ARM
Power
GPU basedNVIDIA
Xeon PHIAMD
FPGA basedTeradeep
AlteraXilinx
DeePhi
Digital Analytics Systems
ASIC basedTPU
WaveGraphcoreMovidus
EyeRiss, NeuFlowNeurostreamNeurocube
Others
“Bio Inspired Computing”
Research examples:Vector-Matrix
Multipliers MultiCore Systems
Etc.Prime, Isaac
NeuroMemristive
SystemsGeneral Purpose
Systems
AdvancedAI engines
Today Best
~100MMAC/S/mW ~1MMAC/S/pW
Note: Company and project names for reference only. No implied endorsement by Huawei
Platform Research to Improve AI Performance
Processing
&
Circuits
Platform
Architecture
Algorithms
&
Architectures
Reducing Energy of Data MovementMemory Hierarchies
Computing in MemoryLow Latency Networks
Reducing Resource RequirementDomain specific Architectures
SW optimizationsCompilers
Energy efficient ConvNet**Binary Weight Networks*
XNOR Networks*Compression, Pruning
Reducing Power and Cost
New process technologies
Near Threshold Switching
DL Optimized Architectures
Bio Inspired systems
•XNOR-Net: ImageNet Classification using Binary Convolutional networks, Mohammed Rastegari, Et al 2016 •** Energy-Efficient ConvNets Through Approximate Computing, Bert Moons, Et al. 2016
Reducing Latency to improve Performance
CPU L1-L3 cacheOn Die/On Package Memory
NVMe drives, SSD
DRAM, SCMNVDIMM-P, NVDIMM-N
Memory Hierarchy Technology choices
NVMe drives, NVDIMM-F
Balint Fleischer
Value
Very Low Latency <30nsecExtreme High BW
Low Latency <1 usec, Very High BWByte access, Maybe Non Volatile
Medium Latency <100usec, Medium BW, Block access
Moderate latency across network>1msec,
Moderate BW across network
Use case:Enterprise Data Base
Use case:Large Platform Memory
Data Management tools and architecture features are key
Optimizing Processor Data Movements
Application
Server Memory
In MemoryData Store
Low Latency <1usecMedium/High BW
Large capacityImproved energy efficiency (local data access)
General Purpose CPU
Data ProximityCo Locating Data and Processing
In Memory ComputeIntegrating Processing into memory array
Very Low Latency <<1 usecExtreme High BWLimited capacity
Very good energy efficiencySpecialized, Embedded processing
CPU
Processing Element
Hybrid Memory Cubebased concept
Supporting Ubiquitous AI processingUnifying CPU+AI into One Memory Centric Design for
Scalability
Memory Hub based architectureDistributed Memory to Memory ProtocolsIn Memory Distributed Data Store support
Local Data Store support
In Memory ComputeFor max BW to engine
Data Center Fabric
Large On Package Memory
Emerging Memory Technologies to Support Scalability
DRAM based HBM2, HMC, DDR5
NAND based 3DNAND, Z-NAND
DIMM based NVDIMM-F; NVDIMM-N; NVDIMM-P
New Memory types 3DXpoint, NRAM, MRAM, ReRAMNote: Product names for reference only. No implied endorsement by Huawei
Data Center for AI based Workloads
High Bisectional BW Non Blocking Fabric
tosupport large data streamspredictable performance
provisioning flexibility
Persistent Enterprise Data Base
Latency optimized protocolsfor
improving scalabilitypredictable performance
provisioning flexibility
AI workload Optimized Server Platformto
lower processing cost, increase performance, improve prediction
accuracy and lower energy consumption
Large Platform Memory for
Data Proximity and Distributed In Memory Data Store
NVM based architectureImproved scaling
performance Enhanced response time
Special pool (BMP)for
Low latency Compute
Thank [email protected]
6/12/2017 14Balint Fleischer Huawei