Architectural tricks to maximize memory bandwidth
-
Upload
deepak-shankar -
Category
Technology
-
view
163 -
download
3
Transcript of Architectural tricks to maximize memory bandwidth
Architectural tricks to maximize Memory Bandwidth
Deepak ShankarCEO, Mirabilis Design
Why Focus on Memory Sub-System
• Processors have huge number of cycles and bandwidth – How do you take advantage of this?
• Memory access is a major bottleneck– Especially in high-performance systems like multimedia
and networking• Memory access forms the largest power
consumption– Too many ACT(RAS, RP and RCD) will dramatically
increase the power
Reports
Introduction
• Importance of improving Memory Performance
• Addressing challenges with Architecture Level Memory explorations
• Need for Performance vs. Power trade-off analysis
• Memory addressing scheme on Performance
About Mirabilis Design
• Provider of system-level architecture exploration solution for electronics and semiconductors
• Platform to conduct power-performance trade-offs, hardware-software partitioning and topology design
• VisualSim- Modeling and simulation software• Based in Silicon Valley with experts in system
modeling and architectures• Largest source of system modeling library with
embedded timing, functionality and power
Explore/Simulate a Memory System
• Key attributes– DRAM datasheet– Memory Controller attributes– Connected Bus topology– Workloads including rate, size, command and back
pressure
Statistical Memory Model for Performance Analysis
Challenges in Memory Usage
• Product– Multimedia, Networking, HPC, Avionics
• Situation– Using an off-the-shelf Processor, FPGA or SoC
• Challenge– What will be the performance and power consumption for
my use-cases?• Metrics
– Power per frame or packet– Latency from sensor input to HDMI output
Opportunities in Memory Usage
• Vary the data sizes• Memory configuration• Ordering of tasks in the use-case• Multiple Masters making asynchronous
request to memory- Addresses• Task and data distribution across multi-core
Full System Analysis
Processor Performance
Challenges in Memory System Design
• SoC interface to memory• AXI bus and NoC topology to minimize the
overhead for each Master• Single vs. dual channels• Memory controller algorithm
Opportunity and Advantage of Design
• Consolidate read and write• Split transaction• Group transaction• Read re-ordering• Transaction priority assignment• Lower clock frequency vs. wider bus
Cycle-accurate Memory Model for Architecture Exploration
Power vs. Timing
About VisualSim
Architecture
Exploration
Performance Analysis
Power Analysis
HW-SW Partitioning
Software
InterfacesRTOS
Hardware
• Graphical and hierarchical modeling
• Large library of stochastic and cycle-accurate components and IP blocks with embedded timing and power
• Library blocks are used to assemble hardware, software, network, traffic, reports and use-cases
System- vs. Pin-level Modeling
Mirabilis Design Inc.
One Router
System Design Transaction-level Cycle-accurate Signal-level
VisualSim
Schematics and RTL are very slow and to detailed for end-to-end metrics
System- vs. Pin-level Modeling
Similarity• Hardware attributes- width,
clock speed, buffer depths• Timing• Algorithms & arbitration• Data & control flow logic• Use addresses
Differences• Data & control combined in
transaction not bits• No pin definitions• No signal handshaking• Skip cycles with no change• Flexible to make major
changes• 100-1000X Faster
05/03/2023 Mirabilis Design Inc. Confidential Slide18
System model accuracy and simulation is sufficient for the explorations
How can System Level Explorations Help improve Memory Performance
• Evaluate performance and power advantages of different types of memory technologies.
• Early prediction of latency, throughput, power, and energy
• Evaluation of next gen Storage device for high bandwidth and less latency requirements
• Spend more time on analysis and less time on implementation
Modeling Libraries - Semiconductors
SoC•AMBA (AHB/ APB/ AXI)•CoreConnect- PLB & OPB•NoC, Virtual Channel•USB
Memory•SDR, DDR, DDR2, DDR3•QDR, RDRAM•LPDDR, LPDDR2, LPDDR3, LPDDR4•HBM•Flash
Processors•ARM•PowerPC- Freescale and IBM•Intel and AMD•TI•MIPS•Tensilica•Renesas SH
Interfaces•PCI, PCI-X, PCIe•RapidIO•NVMe•Serial Switch•Crossbar•Ethernet•Fibre Channel
BenefitsFeatures Benefits
Facilitating transition from concept to design • Creating realistic workload scenarios
driving simulations • Models enable experimentation and
enhance innovation • Simulations facilitate analysis and
exchanges between teams
Increasing productivity • Rapid Exploration and analysis• Graphics are better suited to handle
complexity • Graphics are 10x more efficient than C/C++
programming Optimizing design • HW Footprint, buffers, timings, power
Facilitating implementation and validation • Providing executable specifications for
implementation • Reusing test cases for validation