What is 3d torus
-
Upload
eurotechhpc -
Category
Technology
-
view
4.858 -
download
2
description
Transcript of What is 3d torus
What is 3D Torus
The switchless interconnection topology
A demanding future for HPC
• Supercomputers are asked to process demanding computational
loads (process and data)
•
• Processor power is paramount but a key aspect of parallel computers
is the communication network that interconnects the computing nodes
• Together with speed, HPC systems are increasingly asked to be more
available
• One additional challenge with large systems is scalability, so the ability
to add nodes to a cluster without affecting performance and reliability
or affecting them as little as possible
• It is also paramount for future machines to consume less energy
Conceptual difference
Switched Infiniband network Switchless Torus network
3D Torus topology
• Connecting nodes using a 3D
Torus configuration means than
each node in a cluster is
connected to the adjacent ones via
short cabling
• The signal is routed directly from
one node to the other with no need
of switches. 3D means that the
communication takes places in 6
different “directions”: X+, X-, Y+,
Y-, Z+, Z-
• In practical terms, each node can
be connected to 6 other nodes: in
this way, the graph of the
connections resembles a tri-
dimensional matrix
3D Torus topology
• Such configuration allows the addition of nodes to a system without
degrading performance.
• Each new node is joint as an addition of a grid, linked to it with no
extensive cabling or switching
• Scaling linearly, with little or no performance loss, is strictly true for
those problems that heavily rely on next neighbor communication
• the addition of a node in a large system happens with much less
working and potential troubles
• Being the connections between nodes short and direct, the latency of
the links is very low
3D Torus advantages
• High speed and low latency
• Linear scalability. Switchless
configuration that avoids bottlenecks
and allows hardware cost reduction
• Improvement of MTBF
• Regular and hidden wiring, leading
to less cabling
• Lower energy usage for
communication
• Good match between physical
communication channels and local
pattern algorithms
• Less energy consumed
3D Torus applications
Place, Date
3D Torus applications
• The maximization of the performance of
the 3D Torus takes place with a subset of
problems which is specific but rather
large.
• These are local pattern problems, which
typically deal with modeling systems
whose functioning/reaction depends on
adjacent systems.
Example: Lattice QCD
• Computer simulations of Lattice QCD
(the theory of strong interactions e.g.
inside protons) is one of the great
challenges for massively parallel
supercomputers and requires a
communication network with high
bandwidth and low latency.
• The equations governing Lattice QDC
describe local interactions (each degree
of freedom interacts with its nearest
neighbors) and this results in a well
balanced computational task in which
each degree of freedom (the value of a
field on a space-time point) obeys the
same equations, which are coupled to a
small number of other degrees of
freedom residing nearby.
Example: Fluid Dynamics
• Fluid Dynamics in turbulent regime
shares the same opportunity of being
“easily” put on a supercomputer in
the formulation defined by what are
known as Lattice Boltzmann Methods
(LB).
• This is a scientific field which is both
intriguing from the point of view of
fundamental science and relevant to
many technological applications.
Additional applications
• Many Monte Carlo simulations and
embarrassingly parallel problems can
exploit the full performance advantage
of the 3D Torus architecture
• Problems that require all to all
dialogue between nodes may exploit
less the full performance of the 3D
torus interconnection
• However, independently from the type
of application and problem, the 3D
torus still bears the massive
advantage of scalability and
serviceability
Eurotech Aurora 3D Torus
Aurora Torus peculiarities
• Unified network architecture:
– the 3D Torus coexists with an Infiniband
network.
– Both local and global MPI calls can be
processed efficiently
– Dedicated synchronization network
– Gigabit Ethernet
• FPGA driven Torus. Based the result of
the work of Aurora Science researchers
who acquired experienced with Janus
and QPACE
• Full duplex communication links
– Allowing sub-tori to create subdomain
• The length of cables kept very short due
to smart backplane design
Aurora 3D Torus Network
• Aurora Science implementation
– Based on FTNW (Pisanti, Schifano, Simma)
• http://sourceforge.net/projects/ftnw
– GPL licensed
– Optimized for nearest-neighbor communication
– Proven technology in LQDC communities
• Extoll implementation on Aurora
– Licensed
– Optimized for all-to-all communication for wide range of
applications
– Future interconnect paving the way to exascale computing
Aurora FPGA: 3D Torus network processor
PCIe 2.0 x8
CPU CPU
PCIe2 x8
40 Gbps
PCIe2 x8
40 Gbps
X+ X- Y+ Y- Z+ Z- 10
Gbps
10
Gbps
4x 4x 4x 4x 4x 4x
10
Gbps
10
Gbps
10
Gbps
10
Gbps
FPGA
phy phy phy phy phy phy
Aurora S– 3D Torus network
Aurora systems
AURORA, a high density – highly efficient family
of supercomputers
48U
One Aurora Rack
256 nodes, 512 CPUs, 3072 cores,
100 TFLOPS @ 100kW
Entirely liquid cooled
Aurora identity card
High computational power
Liquid cooling
Energy efficiency
Reliability and availability
Scalability
Unified network architecture
Compatibility
Aurora identity card
High computational power
Liquid cooling
Energy efficiency
Reliability and availability
Scalability
Unified network architecture
Compatibility
Unified Network Architecture
3D Torus
Ultra low latency
High bandwidth
Nearest neighbor
Unlimited scalability
Regular, massive, local
patterns
Infiniband
Very low latency
High bandwidth
Switched network
Multiple services
Irregular, long distance
patterns (Molecular
Dynamics)
Storage (SAN)
Monitoring (IPMI)
Synch
Very fast channel
Global commands
Subdomain manag.
Low/high level synch
Net processor synch
Thread synch
Global clock
System coordination
Debugging
sele
ct
High performance. We want our customer run their simulations and
applications as quick as the world latest technologies allow.
Energy efficiency and Green. We built products to allow our customer to save
on energy bills and leverage sustainability.
Availability. Intelligent design, quality, support readiness and preventive
maintenance to increase the availability of our HPC systems during their
lifetime.
Scalability. We want our solutions to scale linearly and our customer to grow
according to their needs and budget availability
Cost effectiveness. We concentrate a lot of our efforts to deliver advanced
technology at competitive prices and to allow our customers reducing the total
cost of ownership.
Versatility and compatibility. We designed our products to tackle different
problems in the most effective way possible
Eurotech HPC Principles
www.eurotech.com/aurora