Arm Processor Technology Update and Roadmap€¦ · Cavium Corporate Overview Multi-Core MIPS, ARM...

Post on 07-Oct-2020

5 views 0 download

Transcript of Arm Processor Technology Update and Roadmap€¦ · Cavium Corporate Overview Multi-Core MIPS, ARM...

Arm Processor Technology Update and Roadmap

©2016Cavium,Inc.– ConfidentialandProprietaryInformation

ARM Processor Technology Update and Roadmap

§ Cavium:GiriChukkapalliisaDistinguishedEngineerintheDataCenterGroup(DCG)

Introduction to ARM Architecture for HPC deployment and rationale for the design pointof ThunderX2 Core and SOC in the single thread performance vs throughput space ispresented in this talk. Focus is on the sustained performance per TCO and ease ofexploiting spectral parallelism of HPC applications. Preliminary experience of porting,running and performance analysis of HPC applications will be discussed.

ThunderX2 in HPC

©2017Cavium,Inc.– ConfidentialandProprietaryInformation4

Cavium Corporate Overview

Multi-Core MIPS, ARM Processors, Security, SDN Switch andServer/Storage Connectivity ~$10B TAM

Mobile InfrastructureEnterprise Data Center and Cloud Service Provider Cloud

©2017Cavium,Inc.– ConfidentialandProprietaryInformation

MostWidelyUsed

LicensingModel

ü Over90Bshippedin25yrsü Outshipsx86by20Xper

year

ü Anyonecanbuildü Innovate&Optimize

fortargetedapplications

ARM Servers & ARM for HPCARMforHPC

§ ARM=Choice&pathtomoreoptimizedsolutions

§ MarchtoExascale openingdoorfornewISA

• MassiveparallelismrequiresSWchanges

• ARMHPCprojectsactiveworldwide

• HPChaslargeopensourcecomponent

• ThrivingARMecosystemforHPC

©2017Cavium,Inc.– ConfidentialandProprietaryInformation

Cavium’s Proven Leadership in Silicon Design

Highestperformance,mostwidelysupported,dualsocketARMv8serversinproduction

2coreto48core,varietyofpricepoints,TDP

CommonSWarchitecture

THECPUcompanyforInfrastructure

MulticoreCPUExpert

Performance2SConfig

HighPerfCustomCores

CompletePortfolio

ARMv8architecturallicensee

Power,Perf,AreaOptimized

#1inSecurity&WirelessInfrastructure,#2inEmbedded

OPTIMIZINGARM64SERVERSFORHPC&CLOUDDATACENTER

©2017Cavium,Inc.– ConfidentialandProprietaryInformation

§ Multithreaded,fullyoutoforderhighperformanceARMv8customcores

§ Singleanddualsocketsupport§ Highestmemorybandwidth&capacity§ Serverclassvirtualization§ ServerclassRAS

§ Extensivepowermanagement§ RichIOconfigurations§ ExtensivePowermanagement§ CoreandSocketlevelperformancecompetitivewith

nextgenincumbentserverCPUs§ Comprehensivehardwareandsoftwareecosystem

ARM Leadership –ThunderX2 FIRSTS for ARM Processors

® World’s Highest Performance Xeon Class ARM Server – 2nd generation product from Cavium

7

©2017Cavium,Inc.– ConfidentialandProprietaryInformation

Differentiation

PCIeLanes

MemoryCapacity

MemoryBandwidth

TotalThreads

Cores Highercorecountdelivershigherthroughput

Higherthreadcount=largernumberofvCPUs

More memorybandwidthformemoryintensiveworkloads

Morememorycapacityforin-memoryworkloads

RichIOconnectivityoptionsDirectattachtoVMe devices

IncumbentServerCPUThunderX2

©2017Cavium,Inc.– ConfidentialandProprietaryInformation

:ThrivingHPCEcosystem

StandardsBased SysManagement&FW

IndustryLeadingOperatingSystems

Linux EnterpriseSLE12

OptimizedCompilers&DevEnvironments

Debuggers,Profilers&ClusterMgt

OpenSource&CommunityFocus

©2017Cavium,Inc.– ConfidentialandProprietaryInformation

ThunderX Momentum in HPC Continues to Grow…

Mem

ory

Band

width

Integer

Floatin

gPo

int

Vectors

1.0

2.2X2.5X

3X4X

2-4X better HPC performance

SignificantHPCEngagementsEarlyPressAnnouncements

ServerplatformsatWorld’spremierHPCLabs

Early Performance for HPC Applications

©2017Cavium,Inc.– ConfidentialandProprietaryInformation

ThunderX2 Delivers Compelling Memory Bandwidth

Details: ThunderX2CPULinuxkernel4.8.0-32-generic(4kpages)StreamcompiledwithGCCversion5.4.0-6Ubuntu16.04.4at-O3

0102030405060708090

100

%ofp

eakband

width

%ofcpuloadload

StreamScaling

copy scale add triad

HighestMemorybandwidthenablesmemoryboundapplicationstoscalebetter

©2017Cavium,Inc.– ConfidentialandProprietaryInformation13

OpenBLAS DGEMM

Details: ThunderX2CPULinuxkernel4.8.0-32-generic(4kpages)OpenBLAS compilewithGCCversion5.4.0-6ubuntu1~16.04.4at-O3

Efficientcorescapableofachievingclosetotheoreticalpeakperformance

©2017Cavium,Inc.– ConfidentialandProprietaryInformation14

OpenBLAS SGEMM

Details: ThunderX2CPULinuxkernel4.8.0-32-generic(4kpages)OpenBLAS compilewithGCCversion5.4.0-6ubuntu1~16.04.4at-O3

Efficientcorescapableofachievingclosetotheoreticalpeakperformance

Efficientcorescapableofachievingclosetotheoreticalpeakperformance

©2017Cavium,Inc.– ConfidentialandProprietaryInformation15

ThunderX2 Delivers Best-In-Class HPL Performance

Details: ThunderX2CPULinuxkernel4.8.0-32-generic(4kpages)HPLcompilewithGCCversion5.4.0-6ubuntu1~16.04.4(defaults)mpich v3.2,noopenmp,singlesockettest,processgridforeachtestcasebasedonnumberofcoresintest

Efficientcorescapableofachievingclosetotheoreticalpeakperformance

0

20

40

60

80

100

3 13 19 25 28 38 50 63 75 78 94 100

%ofp

eakpe

rform

ance

%ofsystemload

HPLscaling- %ofpeakGflops

©2017Cavium,Inc.– ConfidentialandProprietaryInformation16

ThunderX2 Performance Scaling on Real Applications

Highmemorythroughputbenefitssimplesimulations

Largehighperformancecorecountcombinedwithhighmemorythroughputbenefitscomplexsimulations