Zynq Ultrascale+ Architecturemeseec.ce.rit.edu/551-projects/fall2017/1-7.pdf · Zynq Ultrascale+...

18
Zynq Ultrascale+ Architecture Stephanie Soldavini and Andrew Ramsey CMPE-550 Dec 2017 Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 1 / 17

Transcript of Zynq Ultrascale+ Architecturemeseec.ce.rit.edu/551-projects/fall2017/1-7.pdf · Zynq Ultrascale+...

Zynq Ultrascale+ Architecture

Stephanie Soldavini and Andrew Ramsey

CMPE-550

Dec 2017

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 1 / 17

Agenda

Heterogeneous Computing

Zynq Ultrascale+

HistoryArchitectureApplications

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 2 / 17

Problem: Flexibility/Performance Trade Off

Performance

Flex

ibili

ty

General PurposeProcessors (GPPs):

Application-SpecificProcessors (ASPs)

Co-ProcessorsApplication Specific Integrated Circuits

(ASICs)

Configurable Hardware

- Type and complexity of computational algorithms(general purpose vs. Specialized)

- Desired level of flexibility - Performance- Development cost - System cost- Power requirements - Real-time constrains

Selection Factors:

Specialization , Development cost/timePerformance/Chip Area/Watt(Computational Efficiency)

Prog

ram

mab

ility

/

Software Hardware

e.g. FPGAs

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 3 / 17

Solution: Use some ofeach in a single system

Problem: Flexibility/Performance Trade Off

Performance

Flex

ibili

ty

General PurposeProcessors (GPPs):

Application-SpecificProcessors (ASPs)

Co-ProcessorsApplication Specific Integrated Circuits

(ASICs)

Configurable Hardware

- Type and complexity of computational algorithms(general purpose vs. Specialized)

- Desired level of flexibility - Performance- Development cost - System cost- Power requirements - Real-time constrains

Selection Factors:

Specialization , Development cost/timePerformance/Chip Area/Watt(Computational Efficiency)

Prog

ram

mab

ility

/

Software Hardware

e.g. FPGAs

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 3 / 17

Solution: Use some ofeach in a single system

Heterogeneous Computing

Combine the use of different devices, for example:

Hardware accelerator used to speed up one function in a programOffload matrix calculations to a GPUCloud system with GPP, GPU, and/or FPGA resources

Allows for each part of a task to run on the device it is best suited for

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 4 / 17

Zynq Ultrascale+ History

Made by Xilinx

“Microheterogenous”

Integrates GPP, GPU, FPGA, Co-Proc, &ASIC in one SoCIncreases speed by reducing off-chip datatransfer

Predecessors

Kintex-UltraScale and Virtex-UltraScale(20/16nm FPGA fabric)Zynq-7000 (Dual-core ARM Cortex A9 &28nm FPGA fabric)

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 5 / 17

General Architecture

Processing System (PS)

Application Processing Unit (APU)

64-bit quad-core or dual-coreARM Cortex-A53

Real-time Processing Unit (RPU)

32-bit dual-core ARM Cortex-R5

Graphics Processing Unit (GPU)

ARM Mali-400

On-Chip Memory (OCM)

256 kB RAM withError-Correcting Codes (ECC)

Programmable Logic (PL)

16nm FinFET+programmable logic

Configurable LogicBlocks (CLB)

36 kb Block RAMs

UltraRAM

DSP Blocks

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 6 / 17

Processing System(PS)

Programmable Logic(PL)

Interconnects & I/O

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 7 / 17

Application Processing Unit (APU)

64-bit quad-core or dual-coreARM Cortex-A53

Up to 1.5 GHz

ARMv8-A Architecture

64-bit mode: A64instruction set32-bit mode: A32/T32instruction set

Single/double precision floating point unit (FPU)

Cache

IL1: 32 kB 2-way set-assoc with parity (independent for each CPU)DL1: 32 kB 4-way set-assoc with ECC (independent for each CPU)L2: 1 MB 16-way set-assoc with ECC (shared between CPUs)

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 8 / 17

Real-time Processing Unit (RPU)

32-bit dual-core ARM Cortex-R5

Up to 600 MHz

ARMv7-R Architecture: A32/T32instruction set

Single/double precision FPU

Caches/Tightly Coupled Memory (TCM)

L1: 32 kB 4-way set-assoc with ECC (independent for each CPU)TCM: 128 kB (independent, but can be combined into one 256 kB)

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 9 / 17

Graphics Processing Unit (GPU)

ARM Mali-400

Up to 667 MHz

One geometry processor

Two pixel processors

Supports OpenGL 1.1 & 2.0, OpenVG 1.1

Advanced anti-aliasing support

Cache: L2: 64 kB

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 10 / 17

Programmable Logic (PL)

16nm FinFET+ programmable logic

Configurable Logic Blocks (CLB)Look Up Tables (LUT)Flip flops (FF)Cascadable adders

36 kb Block RAMsTrue dual-portUp to 72 bits wideConfigurable as dual 18 kb

UltraRAM288 kb72 bits wideECC

DSP Blocks27×18 signed multiply48-bit adder/accumulator27-bit pre-adder

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 11 / 17

Vivado Design Suite

Bright greenshowsconfigurablecomponents

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 12 / 17

Vivado Design Suite

Customizecomponents,for instancethe DDRcontroller

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 13 / 17

Applications

Data Center: Networked Storage/Service Platform[2]

Multimedia, video encoding/decoding[1]

Particle physics[4]

Automotive driver assistance, driver information, and infotainment.

LTE radio and baseband.

Medical diagnostics and imaging.

Video and night vision equipment.

Wireless radio.

Single-chip computer.

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 14 / 17

Application: Data Center

A sample configuration used for a networked storage platform

4.5X performance speed-up & 20X power reduction over x86implementations

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 15 / 17

Questions?

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 16 / 17

References

[1] Gosain, Y. and A. Gupta. 2017. “Xilinx Advanced Multimedia Solutions with VideoCodec/Graphics Engines,” Zynq UltraScale+ MPSoC. Xilinx, October 23.https://www.xilinx.com/support/documentation/white papers/wp497-multimedia.pdf

[2] Hansen, L. 2016. “Unleash the Unparalleled Power and Flexibility of Zynq UltraScale+MPSoCs,” Zynq UltraScale+ MPSoC. Xilinx, June 15.https://www.xilinx.com/support/documentation/white papers/wp470-ultrascale-plus-power-flexibility.pdf

[3] Shaaban, M. “Basics of Computer Design.” Lecture, CMPE-550, Rochester, NY, August 29,2017.

[4] Stamen, R. “The Development of the Global Feature eXtractor (gFEX) for the ATLASLevel 1 Calorimeter Trigger at the LHC” Presented at TWEPP 2017, Santa Cruz, CA, 2017.

[5] Xilinx, “Overview,” Zynq UltraScale+ MPSoC Data Sheet, July 2017.https://www.xilinx.com/support/documentation/data sheets/ds891-zynq-ultrascale-plus-overview.pdf

[6] Xilinx, “Zynq UltraScale+ Device,” Technical Reference Manual, November 2017.https://www.xilinx.com/support/documentation/user guides/ug1085-zynq-ultrascale-trm.pdf

Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 17 / 17