Post on 20-May-2020
© 2017 Arm Limited Arm Tech Symposia 2017
DynamIQ Processor Designs Using Cortex-A75
& Cortex-A55 for 5G Networks
Jeff Maguire | Senior Product Manager
Infrastructure IP Product Management | Arm
2 © 2017 Arm Limited
Agenda
5G networks
Ecosystem software to support Arm based deployments of network hardware
Arm Cortex-A75 and Cortex-A55
Conclusions
4 © 2017 Arm Limited
What is fog computing?
CLOUD
FOG COMPUTING
A system-level horizontal architecture that distributescomputing, storage, and networking closer to users
AR/VRAutonomous VehiclesIoT/AI
5 © 2017 Arm Limited
5G Dynamics Across the Network: Edge to Cloud
Devices
ACC
ACS ion
Storage
ion
Storage
Packet Flows Packet Flows
Acceleration
Storage
Compute
Packet Flows
S
ACS
C
AC
A
S
SA
CIoT, Automotive,
Industrial, Medical
Latency30-50x
Throughput100x
Connections100x
Mobility1.5x
5G vs. LTE
Core / DCAccess/Edge EdgeCompute
6 © 2017 Arm Limited
Network/Wireless Infrastructure:Three Distinct Segments – Two Distinct Approaches
5G Antenna MEC Core Network/Data Centre
Server
NetworkOffload
Server
NetworkAnd RAN
Offload
CPU
Air Interface
Processing
Typically standardized platform designs
Typically customized platform designs
Base Station
7 © 2017 Arm Limited
Network/Wireless Infrastructure: Edge Access
Arm: Meet diverse needs with heterogeneous platforms
• Mix of latency, throughput and power
• Supplement sub-systems approach with accelerator offload.
• Sub-system/packet processing platform
• ASIC based designs and dedicated platforms
• Merchant Silicon from multiple suppliers
CCIX and connection to FPGA technology
Edge Network Architecture
5G Air Interface Management• Antenna, RRH, DFE, Distributed Base Band,
Multi-Access Compute Acceleration.• IoT Gateway and End applications.• Optimized power, performance, latency
5G Antenna
CPU
Air Interface
Processing
Base Station
Hardware acceleration for advanced signal processing and terabit throughput.
Customized platform designs
Packet processing / front end / edge design .
8 © 2017 Arm Limited
5G Network Hierarchy Design
• Multi-Access Edge Compute, Distributed Evolved Packet Core, Distributed Applications Processing, Distributed Content Delivery
• Application of virtualized/containerized workloads
Network/Wireless Infrastructure: Core Network
Leverage energy efficient Arm servers with specifics for network infrastructure
Linaro Server group activity, VNF porting activity, OPNFV actions, AIDC, OSEC
Standardized general purpose compute platforms on Arm
Performance/Density/Scalibility
• Support dedicated offload through silicon/software.
• DPDK/ODP support
Control plane functions, content and applications delivery can use general purpose virtualized cloud computing.
Core Network/Data Centre
Server
NetworkOffload
Server-like Architecture – more efficient on Arm
9 © 2017 Arm Limited
Network/Wireless Infrastructure: Mobile Edge Compute
• MEC will be critical in meeting 5G goals
• Low latency, low power required
• E2E Automotive assisted driving, IoT and industrial use cases
• 5G network Slicing dependent
• Designs range from chassis, multi chip to SOC designs
Unique designs tailored to MEC form factor will prevail.
Mix of: • Core network needs (software
driven, compute performance)• Edge Network needs (Latency,
throughput, power optimized)
5G network slicing – power and latency
MEC
Server
NetworkAnd RAN
Offload
Workload optimized general purpose compute and accelerators for user plane processing. Edge Compute Architecture
10 © 2017 Arm Limited
Linaro Segment Groups - Infrastructure
ODP cross-platform
support for SoCaccelerators
Reduces fragmentation,
cost, accelerates time to market
for servers based on Arm
OSS for networking
• Real Time Support
• Virtualization, Core isolation
• OpenDataPlane (ODP)
• OPNFV integration
OSS for Arm servers
• ATF/UEFI/ACPI
• KVM/Xen
• Armv8 optimization
• OpenJDK, Hadoop, OpenStack, Docker
Networking - LNG Enterprise - LEG
11 © 2017 Arm Limited
Arm SoC
Firmware & Network Boot ACPI
Virtualization and HAL
Operating System and VirtualInfrastructure Managers
Key Applications, VNFs, and Middleware
OEMs and ODMs
Networking-centric Ecosystem Enabling Deployments
12 © 2017 Arm Limited
• Initiated and hosted by Arm• Public vehicle to engage
infrastructure developer community (H/W & S/W)
AIDC
WorksOnArm, and AIDC
• Initiated and hosted by Packet.net
• Public vehicle to engage open source developers and share software readiness
WorksOnArm
13 © 2017 Arm Limited
Pharos LabsFully compliant NFVI software platform
POD
3x Control
Nodes
2x Compute
NodesBuild Server
ARM64
Jump Serverx86
VPN
Firewall
VPN
Gateway Router
INTERNET
• April 2016: 1st ARM-based OPNFV Lab by Enea
• Orange Labs, China Mobile, and others to be announced replicating Arm NFVI platform in their labs
Complete Pharos Pod @ ~1 cu. Ft. / 250WMarvell MACCHIATObin boards (Quad-CA72)
Arm reference platform
Colorado Reference Stack
15 © 2017 Arm Limited
Scalable solutions from edge to core
Access point Data center compute
System cache
AcceleratorCortex-A
CoreLink CMN-600
DMC-620
IO
NIC-450
IO
DMC-620100GbEPCIe
DMC-620 DMC-620
DMC-620
DM
C-6
20
DM
C-6
20
DM
C-6
20
DM
C-6
20
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CPU CPU CPU CPU
CoreLink CMN-600Bandwidth 1 TB/s20 GB/s
System cache 128MB0MB
DDR channels 81
Cortex-A CPUs 2564
Scalable system configurations
DMC620 : dynamic memory controller
CMN: coherent mesh network interconnect
16 © 2017 Arm Limited
New DynamIQ-based CPUs for new possibilities
>50% more performance
compared to current devices
2.5xhigher power efficiency
compared to current devices
Estimated device performance using SPECINT2006, final device results may varyComparison using Cortex-A72 at 2.4GHz vs Cortex-A75 at 3GHz
Comparison using Cortex-A53 in 28nm devices vs Cortex-A55 in 16nm devices
Arm Cortex-A55Arm Cortex-A75
17 © 2017 Arm Limited
New DynamIQ-based CPUs for new possibilities
All comparisons at ISO process and frequency
Baseline to Cortex-A72 Baseline to Cortex-A53
All comparisons at ISO process and frequency
1.97x
1.42x
1.21x
LMBench memcpy
SPECfp2006
SPECint2006
1.30x
1.39x
1.23x
LMBench memcpy
SPECfp2006
SPECint2006
Arm Cortex-A55Arm Cortex-A75
18 © 2017 Arm Limited
Next-generation features
Dot product and half-precision float for AI/ML processing.
Virtualized Host Extensions (VHE) offering Type-2 hypervisor (KVM) performance improvements.
Cache stashing and atomic operations improves multicore networking performance and improves latency.
Cache clean to persistence to support storage class memory.
Infrastructure class RAS enhancement including data poisoning and improved error management.
19 © 2017 Arm Limited
DynamIQ cluster
0 - 7 CoresCore 0
Private L2
Snoop filter
PowerManagement
L3 Cache
BusI/F
ACP andperipheral
port I/F
Core 7
Private L2
Asynchronous bridges
DynamIQ Shared Unit (DSU)
DynamIQ Shared Unit (DSU)
Streamlines traffic across bridges
Advanced power management features
Improved performance with large private cache
Support for multiple performance domains
Connection to high-performance interconnect, with stashing support
Supports large amounts of partitionable local memory
Low latency interfaces for closely coupled accelerators
Any mix of DynamIQ cores, up to four premium
20 © 2017 Arm Limited
Level 3 cache partition in a networking application
Infrastructure
• Process 1 = data plane
• Process 2 = control plane
• Packet processing data sent through low latency ACP interface
Sensors or I/O agents ACP
Process 2
Core group 2
Example configuration with two Core groups in a DynamIQ cluster
Group 1 Group 2
Core group 1
Process 1
L3 cache
Core0 Core1 Core2 Core3 Core4 Core5 Core6 Core7
Reserved for external accelerators via ACP Group 2
21 © 2017 Arm Limited
Increasing performance through cache stashing
Enables reads/writes into the shared L3 cache or per-core L2 cache.
Allows closely coupled accelerators and I/O agents to gain access to core memory.
AMBA 5 CHI and Accelerator Coherency Port (ACP) can be used for cache stashing.
More throughput with Peripheral Port (PP) for acceleration, network, storage use-cases.
Acceleratoror I/O
CoreLink CMN-600
DMC-620 DMC-620
Agile System Cache
DDR4 DDR4
L3 Cache
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
Cortex-A
Agile System Cache
L3 Cache
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
CPU
L2 Cache
Cortex-A
Stash critical data to any cache level
DynamIQ cluster
0–7 Cores
Core0
Snoop filter
PowerMngmt
L3 Cache
BusI/F
ACP andperipheral
port I/F
Core7
Asynchronous bridges
DynamIQ Shared Unit (DSU)
L1 cache
L2 cache
L1 cache
L2 cache
22 © 2017 Arm Limited
Networking: Compute and packet-processing solution
Build the right mix of compute
• High-performance Cortex-A75
• High-efficiency throughput Cortex-A55
• DSP, accelerators
Scale CoreLink CMN-600 for application
• Access points: 2~5W
• Macro BTS: 20~30W
• Telecom server: 60~100W
DSP / Accel
DMC-620
Cortex-A75
Level-3 Cache
DMC-620
Data plane processing• Throughput compute• Optional local accelerators
Scalable coherent SoC memory systemwith partitionable affinity
Control plane processing• Performance compute• Timing critical processing
PCIe
Multi-chiplet
Multi-socket
Accelerator
Agile System Cache Agile System Cache
Local
AcceleratorCortex-A55
Level-3 Cache
CoreLink CMN-600Enet
23 © 2017 Arm Limited
Data center server blade: Maximize compute density
Cortex-A75 can deliver 1.4x to 2.9x higher performance .
• >1.4x performance with 32 core Cortex-A75 + CMN-600 vs. 32 core Cortex-A72 + CCN-512
• 2.9x performance with 64 core Cortex-A75 +CMN-600
• Higher rate performance than competitive architectures per socket
• 1-1.5W/core enables sustained operation at maximum performance
Power: 60W compute.
Cache Cache Cache Cache
Cache Cache Cache Cache
Cache Cache Cache Cache
Cache Cache Cache CacheD
MC
-60
0
DD
R4
-32
00
DM
C-6
00
DD
R4
-32
00
DMC-600
DDR4-3200
DMC-600
DDR4-3200
DM
C-6
00
DD
R4
-32
00
DM
C-6
00
DD
R4
-32
00
DMC-600
DDR4-3200
DMC-600
DDR4-3200100GbE
CM
L
PCIe
CM
L
PC
IeP
CIe
24 © 2017 Arm Limited
Enabling 5G transformation
Transition to 5G networks and new use cases is transforming network architectures.
• Requires efficient compute closer to the edge
• Mobile Edge Compute (MEC) nodes combine server-like design (software driven, high aggregate performance) with edge network needs (latency, throughput, power optimized)
Software investments from Arm and ecosystem partners to enable real 5G deployment.
Cortex-A75 & Cortex-A55 together with CMN and other Arm IP enable scalable design.
• Address the compute requirements across the network in a range of power budgets
2626 © 2017 Arm Limited
The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks