Sumit Kumar Bose, Unisys Scott Brock, Unisys Ronald Leaton Skeoch, Unisys
LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.
-
Upload
baldwin-allison -
Category
Documents
-
view
223 -
download
0
Transcript of LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.
![Page 1: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/1.jpg)
LCSE – UnisysCollaboration
Paul Woodward, LCSE
Unisys BriefingJune 7, 2005
![Page 2: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/2.jpg)
Unisys Donation, March, 2003:Unisys donated a 32-
processorES7000 to the LCSE & one to MSI.
Microsoft donated software,DataCenter 2003 and SQL Server.
Intel donated chips.
![Page 3: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/3.jpg)
Unisys was initiating an HPC program.
LCSE could demonstrate power of the ES7000 on scientific problems using the Windows OS.
LCSE could explore possibility of supporting graphics applications on this machine.
![Page 4: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/4.jpg)
Performance Study for Computational Fluid Dynamics:
LCSE codes ported to ES7000.Computational kernel
performance measured, with excellent results.
Parallel performance study identified issues that were addressed successfully with Unisys assistance.
![Page 5: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/5.jpg)
This is the best performance per CPU that we have obtained anywhere to date.
To achieve this, we did not compromise our code implementation strategy – we can still do completely out-of-core computations on problems of any size.
We worked with Dave Johnson of Unisys to pin our processes down to their CPUs, while we allowed the data read and written to come from and go to any place in the machine.
We are now working to get both 16-CPU partitions computing this efficiently together. Unisys now offers a larger shared memory configuration that solves this problem, but our task would be needed to get multiple of those machines to work together.
![Page 6: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/6.jpg)
Bottom Line:LCSE performance figures are
triple those achieved by NCSA job mix, or by applications represented at the Natl. Academy “Future of Supercomputing” meeting.
Focus on running small job fast exploits unique SMP advantage.
Also, SHMOD allows out-of-core, billion-cell simulation.
![Page 7: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/7.jpg)
ES-7000 doing Billion-Cell Simulations & Many Smaller Ones
Large memory a great advantage.
Many fast attached disks.Highly reliable system.We like Windows.Serves as central hub of the
LCSE.White papers for Unisys &
acknow-ledgements in scientific papers.
![Page 8: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/8.jpg)
First large-scale multifluid PPM simulation.
Billion-Cell Simulation Underway.New interface tracking method implemented for
Los Alamos
![Page 9: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/9.jpg)
A parameter study of turbulent shear layer flowswas made possible by the ES7000
![Page 10: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/10.jpg)
Plans:Integrate ES7000 as data
analysis and central control engine of newly funded prototype system.
Explore possibility of greater Unisys participation in a proposal next January for a full-up system.
![Page 11: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/11.jpg)
New NSF Major Research Instrumentation Project:
$300,000 for 1-year prototyping.
Goal is truly interactive visualization of 2 TB data set on PowerWall at full resolution.
Prototype will handle only 1 panel.
Plan January proposal for 10 panels
Data replication to avoid contention, SATA disks, Infiniband networking.
![Page 12: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/12.jpg)
Motivation:Move from data presentation
to data exploration.Generate PowerWall movies
under interactive user control (just roll the mouse wheel and travel).
Pipeline the data analysis and visualization process, so that it no longer takes days for each step, but is instead immediate.
![Page 13: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/13.jpg)
Motivation for the Motivation:Need to do this because of the
data explosion implied by national supercomputing installations.
Largest machines at NSF centers can easily generate 5 to 10 TB/day (60-120 MB/sec) of useful fluid flow simulation data.
LambdaRail 10 Gbit/s connection can bring this directly to LCSE.
![Page 14: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/14.jpg)
Two Modes of Interactive Use:Pre-computed bricks of bytes
are replicated on the disks at each node and user travels through this 4-D data volume in virtual vehicle.
Upon button click, raw data snapshot drawn into large shared memory, and user travels through this 3-D volume, looking at any desired variable, in virtual vehicle.
![Page 15: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/15.jpg)
Mode A Requirement (full-up system):
2 TB replicated at each node.Local disk system streams data
into graphics card at 400 MB/sec.
80 graphics engines each render at 400 MVoxels/sec to 10 PWall panels.
Peak rendering rate of 32 GVoxel/s produces 2 frames/sec of 8.6 Gvoxel/frame on 10-panel PWall.
(Prototype system does only 1 panel).
![Page 16: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/16.jpg)
Mode B Requirement (full-up system):
SMP memory holds raw data snap shot of 6×2×2048³ B = 96 GB
SMP memory holds 32-bit single variable array of 4×2048³ B = 32 GB
SMP processes data at 80 Gflop/s.
IB4X streams 400 MB/sec to each node simultaneously from SMP
80 graphics engines render at 400 MVoxels/sec to 10 PWall panels.
![Page 17: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/17.jpg)
New NSF Major Research Instrumentation Project:
ES7000 will be the data analysis end of the data processing pipeline.
Goal is interactive visualization directly from raw data, rather than from pregenerated voxel bricks.
Much more I/O intensive.Large memory shared among
many CPUs allows rapid voxel gen.
![Page 18: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/18.jpg)
System now under construction in the LCSE.
Dell PC nodes can act as intelligent storage servers and also as image generation engines.
Dell 670nDual 3.6 Xeon EM64 8GB DDR2 SDRAM
![Page 19: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/19.jpg)
Key issues for Unisys:SATA disks directly attached
or just through IB from PC servers?
PCI Express x16 for high-end graphics engines, or put these into PC nodes & use PCI-e for IB__X?
Infiniband network integrating with other machines and their storage.
Bigger shared memory & more CPUs than present 16.
![Page 20: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/20.jpg)
Potential Unisys Role:Memory network 10X faster
than cluster network, so can work from raw data, which is 10X larger.
Then can see any quantity on demand
Entirely new capability for interactive data exploration.
Unisys SMP would need to drive 80 rendering engines & 960 disks, either directly or in PC nodes on IB switch.
![Page 21: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/21.jpg)
Opportunity (keeping options open for multiple possible suppliers):
NSF encourages us to go back for ~1 M$ after proof of concept.
Schedule gives Unisys time to integrate any essential new technologies – PCI Express x16, Infiniband, SATA.
We can be testbed, working w Unisys.
Major opportunity on horizon.
![Page 22: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/22.jpg)
Prototyping Effort Now:Proposed Linux cluster to NSF;
have 12 Dell nodes, each with:Dual P4 Xeon @ 3.6 GHz8 GB memorynVidia Quadro 4400 graphics
card12 Seagate 400 GB SATA
disks3Ware 12-channel SATA
controller Infiniband 4X (Topspin) HCA
10 IB4X links to Unisys ES7000
![Page 23: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/23.jpg)
Near-Term Goals for ES7000:Infiniband drivers.Get all 32 CPUs cooperating
over IB.Improve performance of A3D
data analysis application.Integrate with Linux cluster
We are fine with WindowsGovernment sponsors insist
on LinuxPipeline data from A3D on
ES7000 to HVR on PCs for raw data rendering.
![Page 24: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/24.jpg)
Middle-Term Goals for ES7000:3Ware SATA controller drivers?Attach SATA drives directly?Measure I/O performance.Experiment, in preparation for
January NSF proposal, with IB on more recent Unisys
model?Potential to drive Nvidia
graphics?Experiment with resource
sharing and on demand (preemptive) visualization
![Page 25: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/25.jpg)
We have not settled on the scaled-up architecture:
Scale up of present system is possible.
IB cluster of Dell nodes is possible.
SMP cluster of Unisys nodes is possible.
Time is short, so options other than first 2 are handicapped.
Other vendors unlikely.
![Page 26: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/26.jpg)
Things that now seem definite:SATA disks (Seagate
partnership under negotiation; buying 200 now).
Programmable graphics engine(s) on PCI Express X16 (nVidia, perhaps with SLI, or perhaps even IBM cell).
Infiniband 4X.(12X in full-up system?)
Linux (our sponsors are determined).
Intel CPUs.
![Page 27: LCSE – Unisys Collaboration Paul Woodward, LCSE Unisys Briefing June 7, 2005.](https://reader035.fdocuments.us/reader035/viewer/2022062409/56649ef35503460f94c04ec9/html5/thumbnails/27.jpg)
Our Guess at a Best Fit Role for Unisys:
Scale up present system with IB__X.
Dell PC nodes act as storage servers.
Dell PC nodes host programmable graphics engines that cooperatively render images to PowerWall display.
Unisys SMP provides large shared memory and 80 Gflop/s processing power to enable interactive visualiza-tion from raw data.