Reducing Time to Point of Interest with Accelerated OS Boot · PDF file ·...

46
Reducing Time to Point of Interest with Accelerated OS Boot Frank Schirrmeister, Cadence ® Robert Kaye, ARM ®

Transcript of Reducing Time to Point of Interest with Accelerated OS Boot · PDF file ·...

Page 1: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Reducing Time to Point of Interest with Accelerated OS Boot

Frank Schirrmeister, Cadence®

Robert Kaye, ARM®

Page 2: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Agenda

Design Challenge

The Hybrid Emulation – Virtual Approach– Enabling technology in Palladium XP and Virtual System Platform

– Fast Models from ARM

ARM Example

Industry Examples and Outlook– NVIDIA, CSR

– Improvement considerations

Page 3: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Design Challenge

Page 4: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Example SoC and System

LPDDRDRAM NAND

FLASH

NAND

FLASH

Cellular

Modem

WiFiLLI

DigRF

LP

DD

R 2

eM

MC

4.5

UF

S

LP

DD

R 3

SD

3.0

SD

4.0

UF

S

SLIMbus

DSI

CSI2

CSI3

Bluetooth

SDIO

FM

Receiver

GPS

Receiver

RF

FE

SL

IMb

us

Motion

SensorscJTAG

GBT

SP

MI

Power

Control

Multimedia

Processor

I2C

US

B 2

.0

Memory

Card

HDMI 1.4

Touch Screen

Controller

Display

Driver

Audio

Interface

Camera

Interface

USB 3.0 OTG

OCP 2.0

OCP 3.0

System on Printed Circuit Board (PCB)

Application Specific Components

SoC Interconnect Fabric

ARM CPU Subsystem

3D

GFX

DSP

A/V

High speed, wired interface peripherals

D

D

R

3

P

H

Y

Other peripherals

SAT

A

MIPI

HD

MI

WL

AN

LTELow-speed peripheral

subsystem

Low speed peripherals

PM

U

MI

PI

JT

AG

IN

TC

I2

C

SP

ITi

me

r

GP

IO

Display

UA

RT

Apps

Accel

Modem

ARM®

Cortex®

A15

L2 cache

USB

3.0

3

.

0P

H

Y

2

.

0P

H

Y

PCIe

Gen

2,3

PHY

Et

h

er

n

etP

H

Y

ARM®

Cortex®

A15

ARM®

Cortex®

A7

L2 cache

ARM®

Cortex®

A7

Cache Coherent Fabric

System on Chip (SOC)

Software

Bare

Meta

l

So

ftw

are

DS

P S

oft

ware

Bare

Meta

l

So

fwta

re RTOS

Drivers

Communications L2

Communications L1

Firmware / HAL

Communications L3

Modem

Comms

Application

Software

Bare Metal Software

Operating Systems (OS)

Drivers

Applications

Middleware

Firmware / HAL

Page 5: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

An Example Project Timeline

IP

Sub-System

System on Chip

SpecPost SiNetlist to GDSII

RTL-Design & IP Integration & VerificationFab

Only

small

gate

level

changes

and

ECO’s

RTL

Becomes

stable

Idea to

specProduction

Post silicon

Validation

IP Qualification

Spec to GDSII: 49 - 83 wks

8-12 wks

Design & Integration & Verification: 35 – 63 wks

Netlist to GDSII: 21 - 32 wks

14 wks

11 - 17 wks

14 - 18 wks

Source: Cadence, IBS

Page 6: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Timeline for System Critical Bugs

RTL

Becomes

stable

Only

small

gate

level

changes

and

ECO’sIP

Sub-System

System on Chip

SoC in System

Idea to

specProduction

Post silicon

Validation

SpecPost SiNetlist to GDSII

RTL-Design & IP Integration & VerificationFabIP Qualification

Time for critical bugs in

System Environment to be removed

Source: Cadence, IBS

Page 7: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Software is Key to verification

Applications

(Basic to Angry

Birds)

IP

Sub-System

Bare metal SW

OS & Drivers

(Linux, Android)

System on Chip

Middleware

(Graphics, Audio)

SoC in System

Only

small

gate

level

changes

and

ECO’s

RTL

Becomes

stable

Idea to

specProduction

Post silicon

Validation

SpecPost SiNetlist to GDSII

RTL-Design & IP Integration & VerificationFabIP Qualification

Time for critical bugs in

System Environment to be removed

SW

Development

on ChipM

ay h

old

final ta

pe

ou

t if bu

g to

o c

ritica

l

Source: Cadence, IBS

Page 8: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Platforms - There is No ‘One-Fits-All’

SDKOS Sim

•Highest speed

•Earliest in the flow

•Ignore hardware

Virtual Platform

•Almost at speed

•Less accurate (or slower)

•Before RTL

•Great to debug (but less detail)

•Easy replication

RTL Simulation

•KHz range

•Accurate

•Excellent HW debug

•Little SW execution

AccelerationEmulation

•MHz Range

•RTL accurate

•After RTL is available

•Good to debug with full detail

•Expensive to replicate

FPGA Prototype

•10’s of MHz

•RTL accurate

•After stable RTL is available

•OK to debug

•More expensive than software to replicate

Prototyping Board

•Real time speed

•Fully accurate

•Post Silicon

•Difficult to debug

•Sometimes hard to replicate

Page 9: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Choosing the Right Engine

Chip

Virtual

Prototyping

FPGA based

Prototyping

Acceleration

Emulation

RTL

Simulation

SW Development

SW Development

HW/SW Validation

HW/SW Verification

HW Verification

CONCEPT PRODUCT

TLM

Sim RTL

SimA&E

FPGA

Hardware Debug &

Turn-around-time

Early Software

Development

System Speed

Software Debug

-+

-+

+-

+ -

Software

Hardware

Hardware Accuracy ++-

Page 10: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

HW/SW Concurrency Gap

Tapeout Silicon Samples Product Ships

SW

HW

System

Legend

HW Development & Verification

Continuous System Validation

Next Generation SW-Driven SoC Flow

HW Development & Verification

SW Dev

On model

HW Development & Verification

System Validation

SW Dev and Bringup On real HW design, Silicon

SW-Enhanced SoC Flow

Traditional SoC then SW Flow

Continuous SW Development & Bringup

SW Dev and Bringup on Silicon

System Validation

Enabled By

Virtual Platform

FPGA Prototype

Emulation

Powered By

Platform Hybrids

Emulation + Virtual Platform + FPGA

Page 11: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

TLM Virtual Platform – VSP Emulation – Palladium® XPI/II

Early SW Execution on Palladium

- Up to 100MHz

- Early Availability for SW Developers

- Advanced SW Debug

- Fast SW Turnaround Time

- Up to 4MHz

- From early-RTL to full-SoC Validation

- Advanced HW Debug

- Fast HW Turnaround Time

Hybrid Solution with SW Integrator

.- Boot Complex OS at 48MHz

- Speed UP SW-Driven tests 1-10X

over emulation

- Early Availability for SW Developers

- Advanced HW + SW Debug

- Fast HW and SW Turnaround Time

Page 12: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

VSP Execution Engines Palladium

Palladium/VSP Hybrid Solution

Architected for SW Performance

− High-speed virtual platform

− Asynchronous HW/SW Execution with Interrupt driven sync

− High-Speed Multi-Domain Memory Coherency

Designed to integrate HW and SW flows

− Does not require changes to HW or SW stacks

− Virtual connections into SW Engineer’s environments

− Seamless hybrid execution for both HW and SW users

Proven Methodology, Unique Expertise

− Cross-platform and design integration expertise

− Exclusive hybrid methodology delivers performance and repeatability

− Proven during successful application to SW-rich SoCs

Smart

Memory

Virtualized

CPU

Sub-system

CPU

Bridges

Customer

Virtual Models

VSP Virtual

Models UART, eMMC, USB

Integration

APIs

GPU IP

Memory

ControllerIP IP

RTL Fabric

DDR

AMBA®, interrupts,

resets

Customer Design in Palladium®

AVIP

SW Integrator

Page 13: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Customer RTL

RTL

TLM

Mem I/F

Component Color Key

SoC Interconnect Fabric

DDR3 Display

INTC

Timer

CSIDSI

UART

GPUMemory

ControllerSATAUSB3

System

Boot

Peripheral Fabric

USB2

Ethernet

SW Integrator

UARTsTimers

Fast

Processor

Model

A15 x 4 A7 x 2

AXI4 or ACE-Lite Interrupts

Smart DDR

MMP model

eMMC

Interrupt

Manager

TLM

/ RTL

Bridge

Reconfigurable Interconnect

CPU Sub-system RTL I/F

Reset

Manager

TLM

MemorySmart

DDR

Resets

VSP

Palladium®

XP

AV

IPValidate SoC + OS at 5-10 MHz on PXPHigh-performance memory coherency

Execute SW at 100MHzWith standard or custom processor models

Shorten SoC DebugSystem Messages

HW / SW Debuggers

Plug and Play Integration with RTLSoC-specific transactors and RTL I/F

Hybrid Example

Page 14: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Hybrid Performance Compared to an All-RTL in Emulation Configuration

Target Application- Large, compute intensive SoCs

Target Users: - HW-Dependent SW engineers, - System validation engineers

Accuracy SW: Delivers programmers-view accuracy- HW: Full accuracy except for timing between virtual CPU and SoC fabric- Memory: in fast mode, memory transactions are performed back-door.

Thus, hybrid models not recommended for power or performance estimation

Metric All RTL in

Palladium*

Hybrid** Increase Effective

SW exec.

speed

Linux boot (minutes) 30 0.5 60X 48MHz

Android boot (minutes) 900 15 60X 48MHz

Windows RT boot (min.) 1800 30 60X 48MHz

512x512 2D test (min)*** 30 2 15X 11MHz

# Emulation gates used 70 Million 40 Million 0.6X

* 70 million gate application processor, all blocks in Palladium®

** Virtualized CPU sub-system with register model of L1 & L2 caches. All other SoC blocks in Palladium.

*** Includes Linux boot, data preparation, image processing by HW engine and result checking. 1.3 million memory transactions.

All boot numbers are full production images. Linux includes all drivers. Android and Win RT with SW rendering

Page 15: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Fast Models from ARM

Page 16: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

SoC Simulation Views

Detailed Abstraction level Abstract

Slo

wS

imu

lation s

pe

ed

Fa

st

Loosely Timed (LT)

Cycle Accurate (CA)

Approximately Timed (AT)

1-20 KIPS

50-200 KIPS

50-200 MIPS

Programmer’s View• SW Development

• SW Profiling

• Architecture Compliance

System Performance View• High-level performance analysis

Component Performance View

• Architecture exploration• Performance analysis• Benchmarking

System Validation View• HW/SW Co-Verification

Component Validation View• HW Validation• Driver Development• HW/SW Co-Verification

Page 17: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Model Requirements for SW Development

Simulation Speed– Model must be fast enough for

software developers

Model Fidelity– Models must be complete and

faithful to the target implementation

Solution Flexibility– Enable broad range of applications

with varied requirements

Simulation

Speed

Model

Fidelity

Solution

Flexibility

Page 18: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Fixed Virtual Platforms

Foundation Platform for ARMv8– Simple, entry level platform for Linux application

developers– Debug via GDB server

“ARM® Versatile™ Express” (VE) FVP– Versatile Express memory map– Debugger and Trace API

“Base” FVP– VE + power management, system control

Software alignment with ARM, Linaro

Fixed Virtual Platform

ARM®

Cortex®-A57

Fast Model

ARM®

Cortex®-A53

Fast Model

CCI/CCN Fast Model

Peripheral

model

GIC

Fast Model

SMMU

Fast Model

Peripheral

model

Virtual I/O

I/O Accesses

Page 19: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Virtual Platforms Based on Fast Models

ARM Fast Models– De-facto standard for virtual prototyping of

ARM-based SoCs

– Fast, Flexible, Fidelity

– Accurate software representation of the CPUs and fabric IP

– Open APIs to software debuggers and EDA tool

Virtual Platform

ARM®

Cortex®-A57

Fast Model

ARM®

Cortex®-A53

Fast Model

CCI/CCN Fast Model

Custom

IP model

Peripheral

model

Virtual I/O

I/O accesses

GIC

Fast Model

SMMU

Fast Model

Custom

IP model

Custom

IP model

Page 20: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Hybrid Virtual Platforms

ARM Fast Models– Open APIs to

software debuggers and EDA tools

– Standardized TLM 2.0 bridges for AMBA

– Mix and match models at different levels of abstraction

– Concurrent debug of hardware and software

Custom Virtual

Platform

ARM®

Cortex®-A57

Fast Model

ARM®

Cortex®-A53

Fast Model

CCI/CCN Fast Model

Custom

IP model

Peripheral

model

Virtual I/O

I/O accesses

HW

emulators

AMBA

Transactions

GIC

Fast Model

SMMU

Fast Model

Custom

IP model

Custom

IP model

Page 21: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Use Case 1: No PV Model Available

Graphics & Video difficult to model in PV

ARM® Mali™

cores on Emulator

Compute Sub-System on ARM Fast Models

CoreLink™ CCI-400r1

Quad core Cortex-A53

I/O

Coherent

devices

Mali Graphics

ADB-400 ADB-400

MMU-500

ACE

ACE ACE-Lite + DVM

ACE-LiteACE-LiteACE-Lite

ACE-Lite

NIC-400

Other

Slaves

Other

Slaves

Dual core

Cortex-A57

ACE

ACE

AXI4

Configurable: AXI4/AXI3/AHB/APB

GIC-400

ACE-Lite + DVM ACE-Lite + DVM

ADB-400 ADB-400

DMC-400 / 3rd

Party DMCACE-LiteACE-Lite

ETM ETM

STM Trace Bus

DAP

TMC

SWD/JTAG

TPIU

ACE-LiteACE-Lite

DDR3/2

LPDDR2/3

PHY

DDR3/2

LPDDR2/3

PHY

NIC-400

AXI4

AXI4

Displays

MMU-500

Mali

V500

Video

AXI4

ACE-Lite

Non

Coherent

devices

MMU-500

TZC-400

Transactor Virtual Platform Emulation Content

Page 22: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Use Case 2: Component Analysis

Analyse traffic in CCI-400

CCI-400 on emulator

Upstream components in ARM Fast Models

Downstream in Fast Models or stubbed

CoreLink™ CCI-400r1

Quad core Cortex-A53

I/O

Coherent

devices

Mali Graphics

ADB-400 ADB-400

MMU-500

ACE

ACE ACE-Lite + DVM

ACE-LiteACE-LiteACE-Lite

ACE-Lite

NIC-400

Other

Slaves

Other

Slaves

Dual core

Cortex-A57

ACE

ACE

AXI4

Configurable: AXI4/AXI3/AHB/APB

GIC-400

ACE-Lite + DVM ACE-Lite + DVM

ADB-400 ADB-400

DMC-400 / 3rd

Party DMCACE-LiteACE-Lite

ETM ETM

STM Trace Bus

DAP

TMC

SWD/JTAG

TPIU

ACE-LiteACE-Lite

DDR3/2

LPDDR2/3

PHY

DDR3/2

LPDDR2/3

PHY

NIC-400

AXI4

AXI4

Displays

MMU-500

Mali

V500

Video

AXI4

ACE-Lite

Non

Coherent

devices

MMU-500

TZC-400

Transactor Virtual Platform Emulation Content

Page 23: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Use Case 3: Verification speed-up

Use Fast Processor Models to speed simulation

Complete System in Emulator other than CPU

CoreLink™ CCI-400r1

Quad core Cortex-A53

I/O

Coherent

devices

Mali Graphics

ADB-400 ADB-400

MMU-500

ACE

ACE ACE-Lite + DVM

ACE-LiteACE-LiteACE-Lite

ACE-Lite

NIC-400

Other

Slaves

Other

Slaves

Dual core

Cortex-A57

ACE

ACE

AXI4

Configurable: AXI4/AXI3/AHB/APB

GIC-400

ACE-Lite + DVM ACE-Lite + DVM

ADB-400 ADB-400

DMC-400 / 3rd

Party DMCACE-LiteACE-Lite

ETM ETM

STM Trace Bus

DAP

TMC

SWD/JTAG

TPIU

ACE-LiteACE-Lite

DDR3/2

LPDDR2/3

PHY

DDR3/2

LPDDR2/3

PHY

NIC-400

AXI4

AXI4

Displays

MMU-500

Mali

V500

Video

AXI4

ACE-Lite

Non

Coherent

devices

MMU-500

TZC-400

Transactor Virtual Platform Emulation Content

Page 24: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Typical Virtual Platform Flow

Start with example Fast Model project or FVP– Sufficient for OS boot and initial software development

Develop virtual platform features to support software program– Incrementally build out platform capabilities

– Easy deployment as new features completed

Export Fast Model subsystem to SystemC to extend platform and integrate with commercial EDA/ESL tools– e.g. VSP, Palladium

Majority of software features and integration completed ahead of silicon– Fast platform execution – software teams efficient

– Advanced debug and visibility capabilities

Final optimisation and tuning against silicon– Power/performance tuning often the final stages of the project

Page 25: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

ARM Example

Page 26: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

VSP SW Integrator Palladium

Hybrid Palladium / VSP System

• ARM SysBench RTL instance as provided

• Able to boot Linux and run GPU benchmarks on Palladium

GPU Periph.

Memory

ControllerIP IP

RTL Fabric

RTL Design in Palladium (SysBench instance)

CPU

wrapper

ARM

CPU

DDR

GIC

Page 27: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

VSP SW Integrator Palladium

Hybrid Palladium / VSP System (AQUA)

• Virtual-only system

• Able to boot Linux image to

prompt

• No specialized IP or DUT

• May comprise virtualized

components such as virtual

Ethernet, SD card, USB

• Full visibility and debuggabilityVirtualized

CPU

Simple

Memory

Peripherals

GIC

Page 28: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

VSP SW Integrator Palladium

Hybrid Palladium / VSP System

Hybridization of Sysbench RTL instance– Wrappers for CPU and GIC

– Replace DDR memory

GPU Periph.

Memory

ControllerIP IP

RTL Fabric

RTL Design in Palladium (SysBench instance)

CPU

wrapper

SmartMem

GIC

Page 29: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

VSP SW Integrator Palladium

Hybrid Palladium / VSP System

Full hybrid system

– AMBA® ACE bridge connects virtual core to RTL

– Interrupts are forwarded to virtual GIC

– Smart memory provides coherent memory between RTL and VSP

Via mapping on the virtual side, Peripherals and IP in the RTL can be shadowed

Virtualized

CPU

CPU

Bridges

GPU Periph.

Memory

ControllerIP IP

RTL Fabric

AMBA, IRQs,

resets

RTL Design in Palladium (SysBench instance)

CPU

wrapper

Smart

MemorySmartMem

Peripherals GIC

CoherencyGIC

Page 30: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

What was Measured?

Wall clock time for:– Linux boot

– Driver loading & setup

– Benchmark execution for 1st frame

– Benchmark execution for all frames

Platform: Palladium only, Palladium/VSP Hybrid (6 domains)– Measured with frame dumping active and inactive

Page 31: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Results: 50X speedup in Linux Boot

0

50

100

150

200

250

300

350

Linux boot + driver load Boot to 1st frame

Samurai

Egypt

Taiji

0

2

4

6

8

10

12

14

GPU test (all Frames) Boot to test end

Samurai

Egypt

Taiji

Page 32: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Industry Examples

Page 33: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

PALLADIUM/VSP ARM V8 TEGRA HYBRID PROJECT #2

- Pre-silicon Android Validation

- Open GL Graphics Testing

Vikramjeet Singh

Sr System SW Manager

Mobile Devices

Page 34: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Pre-silicon goals for SW

Co-develop and Co-verify pre-silicon HW designs

High quality SW on silicon arrival

No silly bugs

Focus on performance and power optimizations

Cut down time to production

Bring-up production software stack

Android, OpenGL..

Eliminate interface bugs

Early visibility into “eco-system” readiness

Page 35: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

OS

Use

Ca

ses

OG

L T

ests

Arch. MODS

Sim Front

Custom Nvidia

SW

Kernel

NVIDIA Validation with Palladium/VSP Hybrid

Te

st G

en3

Directed HW Arch Tests

Directed Kernel Tests

Kernel Use cases

NVidia OGL Tests

OS boot end to end

OS use cases

Test &

Metrics DB

SA Host fiber

VSP Hybrid

BootLoader

BootROM

Kern

el

Test

Case

s

Page 36: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

OS Validation

Staged 64b’ development

64b’ Kernel with 32b’ User space

Fix interface issues (no silly bugs)

Demo capabilities to partners

32b’

64b’

Android

Boot Times

Kernel = 2 mins

Android = 90 mins

10x faster than PD

Page 37: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

SW Validation Results

Eliminated reliance on other pre-silicon platforms

SW problems found prior to Silicon return

SW race conditions

Code completeness

64b’ <> 32b’ interface bugs

After silicon return

Contributed to smoother bring-up

SW Ready to demo product at SOL

Less bugs resulted in focused effort to tune perf/w

Page 38: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Using Palladium-VSP Hybrid to Accelerate SW DevelopmentMoshe BerkovichJune-2014

X200

Page 39: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

39Confidential © Cambridge Silicon Radio Limited 2014

• Complex SoCs need massive SW development

• Meeting TTM relies on stable working SW

• HW/SW Co-sim is critical for verification

SW development requirements:

1. High frequency platform

2. Debug capabilities

− JTAG/UART interfaces for SW debug

− Full waveform visibility

− Activity logs (like ARM’s tarmac)

3. Fast bring-up

4. Quick turnaround cycle

5. Accuracy

Challenges in pre-silicon SW development

Page 40: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

40Confidential © Cambridge Silicon Radio Limited 2014

Hybrid Platform Exploration

Smart DDR

Minimize DDR access penalty

TLM units often accesses by CPU

DDR

CPU

SoC Interconnect Fabric

ROM

DISPLAY GPU

DDR

RTL

TLM

Mem I/F

Component Color Key

UART

TIMERS

Page 41: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

41Confidential © Cambridge Silicon Radio Limited 2014

• Runtime comparisons

− Palladium compiled at 1.5Mhz, CAKE 1X

Results

0

2000

4000

13501086

3670

16 104 18

PXP

HYBRID

Decompressed

Linux boot

Video test Compressed

Linux boot

x200x84 x10

Page 42: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

NVIDIA Results

Page 43: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Android & AnTuTu on PXP

Multiple customers booting Android on PXP– Brought up OS before tapeout– Validated SW with Full SoC, including GPU rendered desktop– Ran test applications

Observed Android boot times– In-Circuit configuration: 13 hours– Hybrid configuration: 1 hour

Several customers have run AnTuTu on PXP– Characterized SOC performance– Optimized SW stack– AnTuTu run time: 24 hours in ICE configuraiton

Page 44: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Outlook

Page 45: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP

Outlook: Enhancements

Further Virtualization– PCI-e for Hybrid configurations

Hybrid Assembly– Hybrid Model Library

– Graphical User Interface

Smart Memory– User configuration

– Cache Coherency

Embedded Software Debug

Record/Playback at Virtual/RTL boundary– To support replay on standalone virtual platform

– Swap between RTL and Virtual

Page 46: Reducing Time to Point of Interest with Accelerated OS Boot · PDF file · 2017-02-11Reducing Time to Point of Interest with Accelerated OS Boot ... An Example Project Timeline IP