1
Brand-New
Vector Supercomputer
NEC Corporation IT Platform Division
Shintaro MOMOSE
SC13
New Product
2 2 © NEC Corporation, 2013
NEC Released A Brand-New
Vector Supercomputer, SX-ACE Just Now.
Vector Supercomputer for
Memory Bandwidth Intensive Applications
Sustained performance
Usability
Productivity
3
Concept
4 4 © NEC Corporation, 2013 4 4
SX History and Technical Evolutions P
erf
orm
an
ce
1990 2000 2010
SX-2
SX-3
SX-4
SX-5
SX-6
SX-8/8R
SX-9
Bipolar
Water Cooling
Multi node
CMOS
Air Cooling
1 Chip
Vector Processor
3D node
module
Multi-core
All in One Chip
ECO
SX-7
100GF
Processor
NEC has always provided the high sustained
performance by Vector Super-Computer SX series.
ES
ES2
Distributed Parallelization
(MPI-SX)
Support
Over 100 nodes
Cluster
Support >1000 nodes
ECO
Auto Vectorization
Compiler
MPI support
to Multilane IXS
Automatic Parallelization
Function &
SUPER-UX
1.0E-01
1.0E+00
1.0E+01
1.0E+02
1.0E+03
1.0E+04
1.0E+05
1.0E+06
1.0E+07
1.0E+08
1.0E+09
1.0E+10
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020
Linpack [TF]Lipack ave. [TF]# of cores# of cores ave.# of nodes# of nodes ave.Core performance [GF]Core perfromance ave. [GF]Frequency [GHz]Frequency ave. [GHz]
Growing of LINPAC performance has been provided by system enlarging User mast spend their time to extract massively parallelism Smaller # of cores with big cores can reduce the difficulty
Frequency [GHz] 115%/year
Exa Flops
Nearly
constant
Increasing
Trend of TOP500 (1st
~ 10th
system)
5 © NEC Corporation, 2013
big
co
re
few
er
co
res
Required Byte/Flop in Real Applications
According to Japanese Government (MEXT) working group report of wide variety of strategic segment applications, diverse characteristics are observed.
MEXT: Ministry of Education, Culture, Sports, Science & Technology
B/F requirement from each application is differ greatly. Only one architecture cannot cover all application areas.
0.001 0.01 0.1 1 10 100 1000
0.0001
0.001
0.01
0.1
1
10
Required memory capacity [PB] Re
qu
ire
d m
em
ory
ba
nd
wid
th [
Byte
/Flo
p]
Calculation intensive
memory bandwidth intensive
Reference: “Report on Strategic Direction/Development of HPC
in Japan”, March 2012
Structural analysis Fluid dynamics
MD, Weather Cosmo physics Particle physics
Quantum chemistry Nuclear physics
6 © NEC Corporation, 2013
scalar
CPUs
7 7 © NEC Corporation, 2013
Concepts of SX-ACE
Big Core
Providing higher sustained performance
The highest single core performance: 64GF
The largest single core memory bandwidth: 64~256GB/s
Inherit SX-DNA
Low Power Consumption
Higher power efficiency
Small Installation Space
To reduce floor space cost
1
10
1
5
compared to SX-9
Compared to SX-9
Providing Big Core
The SX-ACE inherits SX-DNA and overwhelm other CPUs with its
world’s No.1 CPU core performance and word’s No.1 memory
bandwidth.
SX-ACE
Scalar A
Scalar B
Scalar C
64GF
30
24
15
64GB/s
16
6
5
2~4x 4~13x
Single core comparison
peak performance
2~4x performance
compared to competitors
memory bandwidth
4~13x performance
compared to competitors
8 © NEC Corporation, 2013
9
Architecture
CPU Architecture (Big Core, Large memory bandwidth)
10 © NEC Corporation, 2013
core core core
RCU
MC
MC
MC
MC
MC
MC
MC
MC
MC
MC
MC
MC
MC
MC
MC
MC
crossbar
ADB (Assignable Data Buffer)
SPU VPU
256GB/s
256GB/s
256GB/s
256GB/s
8GB/s x2
8GB/s x2
Memory(DDR3)
Interconnect
Architecture Vector
CORE
Performance 64GFlops
ADB size 1MB
ADB bandwidth 256GB/s
Memory bandwidth 64GB/s~
256GB/s
Memory Byte/Flop 1.0 ~ 4.0
CPU
Cores 4
Performance 256GFlops
Memory bandwidth 256GB/s
Byte/Flop 1.0
Vector Processing Unit
Scalar Processing Unit
Remote access Control Unit
Memory controller
11
Node card
11cm
37cm
11 © NEC Corporation, 2013
All-in-one Processor
Memory Controller
Very large memory BW
256GB/s BW control
World’s largest BW 256GB/s
SX-ACE CPU
memory
Powerful core
Network controller
I/O Controller
World’s fastest CPU core
64GF x 4cores
1MB ADB/core
8GB/s/direction, Fat-tree
Connection to storage device, Ethernet
4 powerful cores and each controller (memory, network, I/O) are integrated in one-CPU = Power saving
Compact card design = Space saving
Performance: 256GF
Memory bandwidth: 256GB/s
Node Card
12 (c) NEC Corporation, 2012
37cm
11cm
CPU 4 cores
256GF
256GB/s
Memory 4GB x 16DIMMs
DDR3 2000MHz
13
System Configuration
13 © NEC Corporation, 2013
Each node is connected with 2 stages full Fat-tree network
Implementing global communication function into HW
SW1 #00
no
de
#000
no
de
#015
SW1 #01
no
de
#016
no
de
#031
SW1 #31
no
de
#496
no
de
#511
SW2 #00 SW2 #15
16 links
16 links
32 links
512 nodes, 2048 cores, 131TFlops, 1B/F
8GB/s /direction
•Fat-tree •HW function of global communication
node
14
Configuration
14 (c) NEC Corporation, 2013
Node Card
1CPU, 256GF, 256GB/s
2-Node Module
2 nodes = 2 CPUs
16-Node Cage
8 modules = 16 nodes = 16 CPUs
Rack
64 nodes = 16TF, 16TB/s
System
Rack Specifications
16TF, 16TB/s, 64 CPUs
0.75m x 1.5m x 2.0m
30KW
16-Node Cage x4
4 cages = 32 modules = 64 nodes = 64CPUs
co
ola
nt
pip
e (
inle
t)
co
ola
nt
pip
e (
ou
tle
t)
power junction
rack manager
OS disk
2 nodes 2 nodes
2 nodes 2 nodes
node manager
2 nodes 2 nodes
2 nodes 2 nodes
2 nodes 2 nodes
2 nodes 2 nodes
node manager
2 nodes 2 nodes
2 nodes 2 nodes
2 nodes 2 nodes
2 nodes 2 nodes
node manager
2 nodes 2 nodes
2 nodes 2 nodes
2 nodes 2 nodes
2 nodes 2 nodes
node manager
2 nodes 2 nodes
2 nodes 2 nodes
network switch
42U
15 (c) NEC Corporation, 2013
Detail of Rack Implementation
16-node cage
2-node module
16 16 © NEC Corporation, 2013
Comparison with same performance (131TF)
Providing 5x smaller space and 10x lower power consumption compared to SX-9 by power saving design and compact implementation.
Downsizing and Power Saving
power 1/10
space 1/5
12m
24m
8m
7m
131TF
288m2
2.4MW
131TF
56m2
0.24MW
SX-9 SX-ACE
25m swimming pool size Meeting room size
80 nodes 512 nodes
17 (c) NEC Corporation, 2013
NEC’s Exhibitor Forum is Today !
Exhibitor Forum Nov. 19th (Tue), 15:30 – 16:00 Room 501/502 NEC’s Brand-New Vector Supercomputer and HPC Roadmap
©NEC Corporation, 2012 Confidential 18
Top Related