ARM v8 SoC HW/SW Integration & Verification Solution · PDF fileand Verification Solution ....
Transcript of ARM v8 SoC HW/SW Integration & Verification Solution · PDF fileand Verification Solution ....
Nick Heaton, Distinguished Engineer
Cadence Design Systems
ARMv8-Based SoC HW/SW Integration and Verification Solution
2 © 2014 Cadence Design Systems, Inc. All rights reserved.
Example ARM®-based HW/SW System
LPDDR DRAM NAND
FLASH
NAND
FLASH
Cellular
Modem
WiFi LLI
DigRF
LP
DD
R 2
eM
MC
4.5
UF
S
LP
DD
R 3
SD
3.0
SD
4.0
UF
S
SLIMbus
DSI
CSI2
CSI3
Bluetooth
SDIO
FM
Receiver
GPS
Receiver
RF
FE
SL
IMb
us
Motion
Sensors cJTAG
GBT
SP
MI
Power
Control
Multimedia
Processor
I2C
US
B 2
.0
Memory
Card
HDMI 1.4
Touch Screen
Controller
Display
Driver
Audio
Interface
Camera
Interface
USB 3.0 OTG
OCP 2.0
OCP 3.0
System on PCB
Application Specific Components
SoC Interconnect Fabric
ARM CPU Subsystem
3D
GFX
DSP
A/V
High speed, wired interface peripherals
DDR
3
PHY
Other peripherals
SATA
MIPI
HDMI
WLAN
LTE Low-speed peripheral
subsystem
Low speed peripherals
PMU
MIPI
JTAG
INTC
I2C
SPI
Timer
GPIO
Display
UART
Apps
Accel
Modem
Cortex
-A15
L2 cache
USB3.0
3
.
0 PHY
2
.
0 PHY
PCIe
Gen 2,3
PHY
Ethe
r
net
PHY
Cortex
-A15
Cortex
-A7
L2 cache
Cortex
A-A7
Cache Coherent Fabric
SoC
Software
Ba
re M
eta
l
So
ftw
are
DS
P S
oft
wa
re
Ba
re M
eta
l
So
fwta
re
RTOS
Drivers
Communications L2
Communications L1
Firmware / HAL
Communications L3
Modem Comms
Application
Processor
Bare Metal
Operating Systems (OS)
Drivers
Applications
Middleware
Firmware / HAL
3 © 2014 Cadence Design Systems, Inc. All rights reserved.
Challenges at the SoC, System, & SW level
LPDDR DRAM NAND
FLASH
NAND
FLASH
Cellular
Modem
WiFi LLI
DigRF
LP
DD
R 2
eM
MC
4.5
U
FS
LP
DD
R 3
SD
3.0
S
D 4
.0
UF
S
SLIMbus
DSI
CSI2 CSI3
Bluetooth
SDIO
FM
Receiver
GPS
Receiver
RF
FE
SL
IMb
us
Motion
Sensors cJTAG GBT
SP
MI
Power
Control
Multimedia
Processor
I2C
US
B 2
.0
Memory
Card
HDMI 1.4
Touch Screen
Controller Display
Driver
Audio
Interface
Camera
Interface
USB 3.0 OTG
OCP 2.0 OCP 3.0
System on PCB
Application Specific Components
SoC Interconnect Fabric
ARM CPU Subsystem
3D
GFX
DSP
A/V
High speed, wired interface peripherals
DDR3
PHY
Other peripherals
SATA
MIPI
HDMI
WLAN
LTE Low-speed peripheral
subsystem
Low speed peripherals
PMU
MIPI
JTAG
INTC
I2C
SPI
Timer
GPIO
Display
UART
Apps
Accel
Modem
Cortex
A57
L2 cache
USB3.0
3.0
PHY
2.0
PHY
PCIe
Gen 2,3
PHY
Ether
net
PHY
Cortex
A57
Cortex
A53
L2 cache
Cortex
A53
Cache Coherent Fabric
SOC
Software
Bare
Meta
l
So
ftw
are
DS
P S
oft
ware
Bare
Meta
l
So
fwta
re
RTOS
Drivers
Communications L2
Communications L1
Firmware / HAL
Communications L3
Operating Systems (OS)
Drivers
Applications
Middleware
Firmware / HAL
Multi-core early software bring-
up and integration on 64-bit
How do I represent the SoC
environment?
Developing environments for
hardware/software integration and
use-case verification on
simulation/emulation platforms
Bare-metal software use-case testing to
verify multi-core cache and I/O
coherency, concurrency, power shut
off, etc…
Debugging of complex multi-core SoC software scenarios on RTL simulation/emulation
platforms
Characterizing and analyzing system-on-chip (SoC) performance and efficiently
debugging issues
Verification of IPs on AMBA
interconnect with adherence to
ACE protocol
4 © 2014 Cadence Design Systems, Inc. All rights reserved.
Accelerating ARM-based development
5 © 2014 Cadence Design Systems, Inc. All rights reserved.
Early OS & Software Bring-up
6 © 2014 Cadence Design Systems, Inc. All rights reserved.
Accelerating ARM-based development
7 © 2014 Cadence Design Systems, Inc. All rights reserved.
TLM Virtual Platform – VSP
Emulation – Palladium® XPI/II
Early SW Execution on Palladium
- Up to 100MHz
- Early Availability for SW Developers
- Advanced SW Debug
- Fast SW Turnaround Time
- Up to 4MHz
- From early-RTL to full-SoC Validation
- Advanced HW Debug
- Fast HW Turnaround Time
Hybrid Solution with SW Integrator
.
- Boot Complex OS at 48MHz
- Speed UP SW-Driven tests 1-10X
over emulation
- Early Availability for SW Developers
- Advanced HW + SW Debug
- Fast HW and SW Turnaround Time
8 © 2014 Cadence Design Systems, Inc. All rights reserved.
VSP Execution Engines Palladium
Palladium/VSP Hybrid Solution
Architected for SW Performance
− High-speed virtual platform
− Asynchronous HW/SW Execution with Interrupt driven sync
− High-Speed Multi-Domain Memory Coherency
Designed to integrate HW and SW flows
− Does not require changes to HW or SW stacks
− Virtual connections into SW Engineer’s environments
− Seamless hybrid execution for both HW and SW users
Proven Methodology, Unique Expertise
− Cross-platform and design integration expertise
− Exclusive hybrid methodology delivers performance and repeatability
− Proven during successful application to SW-rich SoCs
Smart
Memory
Virtualized
CPU
Sub-system
CPU
Bridges
Customer
Virtual Models
VSP Virtual
Models UART, eMMC, USB
Integration
APIs
GPU IP
Memory
Controller IP IP
RTL Fabric
DDR
ARM AMBA,
interrupts, resets
Customer Design in Palladium®
AVIP
SW Integrator
9 © 2014 Cadence Design Systems, Inc. All rights reserved.
Customer RTL
RTL
TLM
Mem I/F
Component Color Key
SoC Interconnect Fabric
DDR3 Display
INTC
Timer
CSI
DSI
UART
GPU Memory
Controller SATA
USB3
…
System
Boot
Peripheral Fabric
USB2
Ethernet
SW Integrator
UARTs Timers
Fast
Processor
Model
A15 x 4 A7 x 2
AXI4 or ACE-Lite Interrupts
Smart DDR
MMP model
eMMC
Interrupt
Manager
TLM
/ RTL
Bridge
Reconfigurable Interconnect
CPU Sub-system RTL I/F
Reset
Manager
TLM
Memory
Smart
DDR
Resets
VSP
Palladium®
XP
AV
IP Validate SoC + OS at 5-10 MHz on PXP
High-performance memory coherency
Execute SW at 100MHz With standard or custom processor models
Shorten SoC Debug System Messages
HW / SW Debuggers
Plug and Play Integration with RTL SoC-specific transactors and RTL I/F
Hybrid Example
NVIDIA Example: Performance Results
Boot OSes, run real world applications and benchmarks
Linux kernel boot
Palladium only = 45 mins
Hybrid = 2 mins
Android
Palladium only = Hours*
Hybrid = 40 – 50 mins
Windows
Palladium only = Days*
Hybrid = 75 – 90 mins
Source: System to Silicon
Verification Summit 2013
http://bit.ly/1cT4py2
11 © 2014 Cadence Design Systems, Inc. All rights reserved.
Performance Analysis
12 © 2014 Cadence Design Systems, Inc. All rights reserved.
Performance Analysis
13 © 2014 Cadence Design Systems, Inc. All rights reserved.
CoreLink CCI-400
CortexTM-A53 Cluster
CortexTM-A57 Cluster
Customer or MaliTM
GPU
S4 S3 S2 S1 S0
ADB ADB ADB
ADB
CoreLink GIC-400
CoreLink NIC-400
PCIe RC
LCD DMA
CoreLink NIC-400 (2x1)
ADB
Co
reLi
nk
NIC
-40
0 ADB ADB
CoreLink TZC-400
Customer DDR Controller
F0 F1 F2 F3
On-Chip ROM
SRAM
Video SRAM
#2
#4
L2 Cache
Customer DMA or CoreLink DMA-330
ADB
#1
#3
#2
#4
L2 Cache
#1
#3
Timers
UART
CoreLink NIC-400 CoreLink NIC-400
IP IP IP IP IP
DVFS CLK/PSO
Domain
CLK/PSO Domain
System Control
Processor
Coherent Masters
Non-Coherent
Masters
IP
ADB ADB
ADB
ARM ARMv8-A mobile example SoC
14 © 2014 Cadence Design Systems, Inc. All rights reserved.
CoreLink CCI-400
CortexTM-A53 Cluster
CortexTM-A57 Cluster
Customer or MaliTM
GPU
S4 S3 S2 S1 S0
ADB ADB ADB
ADB
CoreLink GIC-400
CoreLink NIC-400
PCIe RC
LCD DMA
CoreLink NIC-400 (2x1)
ADB
Co
reLi
nk
NIC
-40
0 ADB ADB
CoreLink TZC-400
Customer DDR Controller
F0 F1 F2 F3
On-Chip ROM
SRAM
Video SRAM
#2
#4
L2 Cache
Customer DMA or CoreLink DMA-330
ADB
#1
#3
#2
#4
L2 Cache
#1
#3
Timers
UART
CoreLink NIC-400 CoreLink NIC-400
IP IP IP IP IP
DVFS CLK/PSO
Domain
CLK/PSO Domain
System Control
Processor
Coherent Masters
Non-Coherent
Masters
IP
ADB ADB
ADB
What is the latency of the processor clusters to
memory paths including all async bridges ?
What is the latency of the processor clusters to
memory paths including all async bridges?
ARM ARMv8-A mobile example SoC Performance challenges
15 © 2014 Cadence Design Systems, Inc. All rights reserved.
CoreLink CCI-400
CortexTM-A53 Cluster
CortexTM-A57 Cluster
Customer or MaliTM
GPU
S4 S3 S2 S1 S0
ADB ADB ADB
ADB
CoreLink GIC-400
CoreLink NIC-400
PCIe RC
LCD DMA
CoreLink NIC-400 (2x1)
ADB
Co
reLi
nk
NIC
-40
0 ADB ADB
CoreLink TZC-400
Customer DDR Controller
F0 F1 F2 F3
On-Chip ROM
SRAM
Video SRAM
#2
#4
L2 Cache
Customer DMA or CoreLink DMA-330
ADB
#1
#3
#2
#4
L2 Cache
#1
#3
Timers
UART
CoreLink NIC-400 CoreLink NIC-400
IP IP IP IP IP
DVFS CLK/PSO
Domain
CLK/PSO Domain
System Control
Processor
Coherent Masters
Non-Coherent
Masters
IP
ADB ADB
ADB What is the latency of the processor
clusters to memory paths including all async bridges ?
What is the bandwidth of the paths from IP with high bandwidth demands
to memory?
ARM ARMv8-A mobile example SoC Performance challenges
16 © 2014 Cadence Design Systems, Inc. All rights reserved.
CoreLink CCI-400
CortexTM-A53 Cluster
CortexTM-A57 Cluster
Customer or MaliTM
GPU
S4 S3 S2 S1 S0
ADB ADB ADB
ADB
CoreLink GIC-400
CoreLink NIC-400
PCIe RC
LCD DMA
CoreLink NIC-400 (2x1)
ADB
Co
reLi
nk
NIC
-40
0 ADB ADB
CoreLink TZC-400
Customer DDR Controller
F0 F1 F2 F3
On-Chip ROM
SRAM
Video SRAM
#2
#4
L2 Cache
Customer DMA or CoreLink DMA-330
ADB
#1
#3
#2
#4
L2 Cache
#1
#3
Timers
UART
CoreLink NIC-400 CoreLink NIC-400
IP IP IP IP IP
DVFS CLK/PSO
Domain
CLK/PSO Domain
System Control
Processor
Coherent Masters
Non-Coherent
Masters
IP
ADB ADB
ADB
What is the bandwidth and latency of the paths from real-time IP to memory
?
What is the bandwidth and latency of the paths from real-time IP to
memory?
ARM ARMv8-A mobile example SoC Performance challenges
17 © 2014 Cadence Design Systems, Inc. All rights reserved.
Interconnect
Workbench
Assembly
Performance
Measurements
UVM Testbench
IP-Specific
Traffic Profiles
CoreLink 400 System
IP
RTL and IP-XACT
Performance
Analysis
Verification
Closure
Interconnect
Workbench
Analysis and
Debug
Performance
Analyzer
For Interconnect IP Integration •Performance of use-case traffic loads
•Verify configuration functionality
For SoC Integration •Validate performance in context of IPs
Benefits Shorten performance tuning and analysis iteration loop from
days to hours
Reduce testbench development time from weeks to hours
Tune
Architecture
Manual SoC
Testbench
Automate Simulate Analyze
Cadence VIP
Library for AMBA
User
Meta-Data
Manual Testbench Flow
Generated Testbench
Flow
SoC Traffic
Testbench
SoC Verification
Testbench
Interconnect Workbench
18 © 2014 Cadence Design Systems, Inc. All rights reserved.
CoreLink CCI-400
S4 S3 S2 S1 S0
ADB ADB ADB ADB
ADB ADB
ADB
Sys
tem
Sco
reb
oar
d a
nd
Per
form
ance
M
on
ito
r V
IP
VIP VIP VIP VIP VIP
Functional Verification Tests Routing Model
VIP VIP VIP
VIP
VIP
Active
AMBA VIP
Passive
AMBA VIP
UVM
Testbench
DUT
Component interconnect testbench – Functional verification
Using ARM® AMBA® Designer IP-XACT
19 © 2014 Cadence Design Systems, Inc. All rights reserved.
CoreLink CCI-400
S4 S3 S2 S1 S0
ADB ADB ADB ADB
NIC-400 (2x1)
ADB
ADB ADB
CoreLink TZC-400
Customer DDR Controller
F0 F1 F2 F3
ADB
ADB ADB
ADB
VIP
VIP
VIP
Active
AMBA VIP
Passive
AMBA VIP
Sys
tem
Sco
reb
oar
d a
nd
Per
form
ance
Mo
nit
or
VIP
VIP VIP VIP VIP VIP VIP VIP VIP
VIP VIP VIP VIP
Characterization Tests Routing Model
UVM
Testbench
DUT
Subsystem testbench
Using IP-XACT or CSV metadata
20 © 2014 Cadence Design Systems, Inc. All rights reserved.
Charts split by burst length
Increasing bandwidth as burst
length increases
Interconnect Workbench
Analyze performance results
21 © 2014 Cadence Design Systems, Inc. All rights reserved.
• Significant challenges in predicting and optimizing SoC performance – Multiplicity of IP configuration options particularly in interconnect and DDR
space
– Need a systematic approach with the potential to be automated
• Performance verification accomplished in three steps – Characterization: Fully automated and can be checked as a standard
regressions step
– Architectural: Establish QoS functions as expected
– Use case: Hunt for corner case issues
• Cadence® Interconnect Workbench supports all stages of the process – Automation of testbench, supports ARM CoreLink® System IP
– Automation of the characterization tests
– Comprehensive analysis and checking capabilities
– Traffic synthesizers for architectural and use-case analysis
Interconnect Workbench
22 © 2014 Cadence Design Systems, Inc. All rights reserved.
HW/SW Debug
23 © 2014 Cadence Design Systems, Inc. All rights reserved.
HW/SW Debug
24 © 2014 Cadence Design Systems, Inc. All rights reserved.
ARMv8-based SoC hardware/software debug solutions
IES
PXP Synchronized with design
and testbench debugger
Cortex®-A53/-A57 post-process
SoC debug
• Integrated and synchronized
hardware/software debug with
testbench
• For verification and design teams
• Enables off-line debugging
• Consistent across IES and PXP
Cortex-A53/-A57 JTAG software
debugger
• Interactive software debugging on
PXP
• Support for software developers
using RealView, Lauterbach, etc..
Embedded C source code
debug with assembly view
Software
variable
tracing
25 © 2014 Cadence Design Systems, Inc. All rights reserved.
ARMv8-based SoC hardware/software debug solutions
Cortex-A53/-A57 post-process SoC
debug
• Integrated and synchronized
hardware/software debug with
testbench
• For verification and design teams
• Enables off-line debugging
• Consistent across IES and PXP
Cortex-A53/-A57 JTAG software
debugger
• Interactive software debugging on
PXP
• Support for software developers
using RealView, Lauterbach, etc..
PXP
JTAG debugger support
for software developers
on PXP
ARM RealView
Debugger
Lauterbach
Debugger
26 © 2014 Cadence Design Systems, Inc. All rights reserved.
Verification IP
27 © 2014 Cadence Design Systems, Inc. All rights reserved.
ARM-related Verification IP
28 © 2014 Cadence Design Systems, Inc. All rights reserved.
• Benefits
– Get to market first with latest I/Fs
– Verifies SoC data integrity
– Simplify protocol compliance
– Maximize team productivity
• Highlights
– #1 ACE VIP (ARM collaboration)
– Coherent interconnect validation
– Advanced compliance testing
– Formal and acceleration support
• Specification Support
– ARM AMBA CHI, ACE
– ARM AMBA AXI4, AXI3
– ARM AMBA AHB, APB
Cadence VIP for ARM AMBA specifications
Puresuite CMS TripleCheck
Compliance Method
Protocol Checks
Trace Debug PureView
Configurator
Formal Analysis Interconnect Validation
Acceleration Support1
Verification Technologies
100-500
projects
20-100
projects
1-20
projects
500+
projects
Maturity Level
1Accelerated VIP sold
separately
29 © 2014 Cadence Design Systems, Inc. All rights reserved.
Cadence cache-coherent VIP for ACE Full set of VIP agents to verify cache coherent designs
• Generates coherent stimuli and responds
to snoop bursts
• Includes cache model
• Can be configured as ACE or ACE-lite
• Monitors protocol correctness
• Collects coverage
• Includes cache model
• Can be configured as ACE or ACE-lite
Legend: DUT VIP
Cache Cache
Mem Mem
M2 Passive Master
S3 Passive Slave
Mem
Cache
M2 DUT Master
S1 Active Slave
S2 DUT Slave
S3 DUT Slave
M1 Active Master
Cache
• Responds to read/write
transactions
• Model sparse memory
• ACE-lite port
• Checks protocol correctness
• Collects coverage
• ACE-lite
CoreLink CCI-400
S4 S3
30 © 2014 Cadence Design Systems, Inc. All rights reserved.
Summary
31 © 2014 Cadence Design Systems, Inc. All rights reserved.
Challenges at the SoC, System, & SW level
LPDDR DRAM NAND
FLASH
NAND
FLASH
Cellular
Modem
WiFi LLI
DigRF
LP
DD
R 2
eM
MC
4.5
U
FS
LP
DD
R 3
SD
3.0
S
D 4
.0
UF
S
SLIMbus
DSI
CSI2 CSI3
Bluetooth
SDIO
FM
Receiver
GPS
Receiver
RF
FE
SL
IMb
us
Motion
Sensors cJTAG GBT
SP
MI
Power
Control
Multimedia
Processor
I2C
US
B 2
.0
Memory
Card
HDMI 1.4
Touch Screen
Controller Display
Driver
Audio
Interface
Camera
Interface
USB 3.0 OTG
OCP 2.0 OCP 3.0
System on PCB
Application Specific Components
SoC Interconnect Fabric
ARM CPU Subsystem
3D
GFX
DSP
A/V
High speed, wired interface peripherals
DDR3
PHY
Other peripherals
SATA
MIPI
HDMI
WLAN
LTE Low-speed peripheral
subsystem
Low speed peripherals
PMU
MIPI
JTAG
INTC
I2C
SPI
Timer
GPIO
Display
UART
Apps
Accel
Modem
Cortex
A57
L2 cache
USB3.0
3.0
PHY
2.0
PHY
PCIe
Gen 2,3
PHY
Ether
net
PHY
Cortex
A57
Cortex
A53
L2 cache
Cortex
A53
Cache Coherent Fabric
SOC
Software
Bare
Meta
l
So
ftw
are
DS
P S
oft
ware
Bare
Meta
l
So
fwta
re
RTOS
Drivers
Communications L2
Communications L1
Firmware / HAL
Communications L3
Operating Systems (OS)
Drivers
Applications
Middleware
Firmware / HAL
Multi-core early software bring-
up and integration on 64-bit
How do I represent the SoC
environment?
Developing environments for
hardware/software integration and
use-case verification on
simulation/emulation platforms
Bare-metal software use-case testing to
verify multi-core cache and I/O
coherency, concurrency, power shut
off, etc…
Debugging of complex multi-core SoC software scenarios on RTL simulation/emulation
platforms
Characterizing and analyzing system-on-chip (SoC) performance and efficiently
debugging issues
Verification of IPs on AMBA
interconnect with adherence to
ACE protocol
32 © 2014 Cadence Design Systems, Inc. All rights reserved.
Accelerating ARM-based development