• Blue Gene is an ambitious project to expand the horizons of supercomputing, with the ultimate goal of creating a system that can do one
quadrillion calculations per second, or perform one *petaflop.”
A massively parallel supercomputer using thousands of embedded PowerPC processors supporting a large memory space.
• With standard compilers and message passing environment.• Blue Gene is an IBM project aimed at designing
supercomputers that can reach operating speeds in the PFLOPS(petaFLOPS) range, with low power consumption.
The first supercomputer to beat human in chess.
*A petaflop is a measure of a computer's processing speed and can be expressed as a thousand trillion floating point operations per second.
In December 1999 , IBM announced to build a massively parallel computer, to be applied to study the protein gene sequence.
Major areas of investigation included: The use of this novel platform to meet scientific goals Making of parallel machines more usable Achieving performance targets at reasonable cost through a
novel machine architecture.
Linpack Top 500 Supercomputers
Four Blue Gene projects : BlueGene/L BlueGene/C BlueGene/P BlueGene/Q
The first computer in the Blue Gene series .Designed to deliver the most performance per kilowatt of
power consumed.It is a 16 rack system, with each rack holding 1024
compute nodes and a LINPAC performance of 70.72 *TFLOPS.
Theoretical peak performance of 360 TFLOPS .
*TFLOPS -A tflop , or teraflop, is a parallel supercomputing system that has the ability to compute one trillion floating point operations per second.
Trading the speed of processors for lower power consumption.
Dual processors per node with two working modes: co-processor mode where one processor handles computation and the other handles communication.
System-on-a-chip design. All node components were embedded on one chip, with the exception of 512 MB external DRAM.
Can be scaled up to 65,536 compute or I/O nodes, with 131,072 processors
Each node is a single ASIC with associated DRAM memory chips
Each ASIC has 2 700 MHz IBM PowerPC processors
PowerPC processors Low-frequency, low-power embedded processors,
superior to today's high-frequency, high-power microprocessors by a factor of 2 or more
Double-pipeline-double-precision Floating Point Unit A cache sub-system with built-in DRAM controller
Node CPUs are not cache coherent with one another
FPUs and CPUs are designed for low power consumption
1024 nodes
System Overview
Each node is attached to 3 main parallel communication networks
3D Torus network - peer-2-peer between compute nodes
Collective network – collective & global communication
Ethernet network - I/O and management (such as access to any node for configuration, booting and diagnostics )
System software supports efficient execution of parallel applications
Compiler support for *DFPU (C, C++, Fortran) Compute nodes use a minimal operating system called
“BlueGene/L compute node kernel” A lightweight, single-user operating system Supports execution of a single dual-threaded application compute process Kernel provides a single and static virtual address space to a running
process Because of single-process nature, no context switching required
* DFPU - Double Floating Point Unit
To allow multiple programs to run concurrently Blue Gene/L system can be partitioned into electronically isolated sets of
nodes The number of nodes in a partition must be a positive integer power of 2 To run program – reserve this partition No other program can use till partition is done with current program With so many nodes, component failures are inevitable. The system is
able to electrically isolate faulty hardware to allow the machine to continue to run
Parallel Programming model Message Passing – supported through an
implementation of MPI Only a subset of POSIX calls are supported Green threads are also used to simulate local
concurrency
Renamed to Cyclops64 Massively parallel, supercomputer-on-a-chip
cellular architecture Cellular architecture gives the programmer
the ability to run large numbers of concurrent threads within a single processor.
Each 64-bit Cyclops64 chip (processor) will run at 500 megahertz and contain 80 processors.
Each processor will have
two thread units and
a floating point unit. Five processors share a
32 kB instruction cache.
Architecturally similar to BlueGene /L . Expected to operate around one petaflop. Launched in 2008. In here, the cores are cache coherent and the
chip can operate as a 4-way symmetric multi-processor.
The memory subsystem on the chip consist of small private L2 caches , a central shared 8 MB cache , and dual DDR2 memory controllers.
Third and the Last known supercomputer in the Blue Gene series .
Expected to reach 20 petaflops in 2012. Enhancement to the blue gene/L and P
architecture.
IBM website(www.03.ibm.com/servers/deepcomputing/bluegene.html)
www.supercomp.org/sc2002/paperpdfs/pap.pap207.pdf
http://en.wikipedia.org/wiki/Blue_Gene http://community.anitaborg.org/wiki/images/
9/92/GHC07-BlueGene_salapura.pdf
Top Related