QMy general research area: System-level support for parallel and distributed computing User: run...
-
Upload
dwayne-stokes -
Category
Documents
-
view
214 -
download
0
Transcript of QMy general research area: System-level support for parallel and distributed computing User: run...
![Page 1: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/1.jpg)
My general research area: System-level support for parallel and distributed computing
•User: run my_prog•System: load my_prog on nodes, run it, collect results
Efficiency: make program run as fast as possibleefficient use of system resources (CPU,RAM,disk)
User level abstractions that make the system easy to use
CPU 1 CPU 1
Memory Memory
DiskDisk
CPU 2 CPU 2 CPU 3 CPU 3 CPU 4 CPU 4 CPU n CPU n……
DiskDisk DiskDisk
my_progmy_prog
![Page 2: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/2.jpg)
Nswap: Network RAM for cluster computers
Cluster computer: multiple inexpensive, “independent” machines connected by a network, run system SW that make them look and act like single parallel computer
……
Network
Cluster system softwareCluster system software
![Page 3: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/3.jpg)
General Purpose Cluster Multiple parallel programs share the cluster
•Assigned to some, possibly overlapping, machines•Share NW, Memory, CPU, disk resources
Program workload changes over time•New programs may enter the system•Existing programs may complete and leave
==> imbalances in RAM and CPU usage across nodes some nodes don’t have enough RAM, some have unused RAM
P1P2P3
P1P2P3
![Page 4: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/4.jpg)
When node doesn’t have enough RAM space?
Lots of Data that cannot all fit into Memory OS moves data to/from disk as needed (swapping)
time to access data:CPU: 0.000000005 secs RAM: 100 x slower than CPUDisk: 10 million x slower
-> Swapping is really, really slow
CPUCPU
Memory Memory
DiskDisk
![Page 5: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/5.jpg)
Network Swapping in a Cluster
Disk
CPU
RAMBypass disk and swap pages of RAM to remote idle memory in the cluster
Node 1
CPURAM
Node 3
CPURAM
Node 2
network
• Network Swapping: expand Node 1’s memory using idle RAM of other cluster nodes rather than local disk
![Page 6: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/6.jpg)
Why Nswap?
Swapping to Disk
Nswap
3561.66 seconds 105.50 seconds(speed-up of 33.8)
The network is much faster than disk, so swapping data over NW to remote RAM is much faster than
swapping to local disk
![Page 7: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/7.jpg)
Nswap Architecture Divided into two parts that run on each node
1) Nswap client device driver for network swap “device”
•OS makes swap-in & swap-out requests to it2) Nswap server manages part of RAM for caching remotely swapped pages (Nswap Cache)
Nswap clientNswap client
swap out pageswap out page Nswap serverNswap server
Nswap serverNswap server
Node A Node B
OS spaceOS space
User spaceUser space
Nswap CacheNswap Cache Nswap clientNswap client
Nswap Communication Layer Nswap Communication Layer
NetworkNetwork
![Page 8: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/8.jpg)
This summer Answer questions having to do with policies for
growing/shrinking RAM available for Nswap and implement solution(s):•How do we know when idle RAM is available?•Can we predict when idle RAM will be available for
a long enough time to make it useful for NSWAP?•How much idle RAM should we take for Nswap?•How much RAM should we “give back” to the
system when it needs it Investigate incorporating Flash memory into the
memory hierarchy and using it with Nswap to speed up swap-ins
System-level support for computation involving massive amounts of globally dispersed data (cluster computing on steroids)•Internet scale distributed/parallel computing•Caching, prefetching, programming interface?
![Page 9: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/9.jpg)
How Pages Move Around Cluster
Swap Out:
Swap In:
Migrate from node to node with changes in WL:
Node A (not enough RAM)
Node B (has idle RAM)
A B
A B CWhen node B needs more RAM, it migrates A’s page to node C
![Page 10: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/10.jpg)
Reliable Network RAM Automatically restore remotely swapped
page data lost in node crash
How: Need redundancy •Extra space to store redundant info.
•Avoid using slow disk •Use idle RAM in cluster to store redundant data•Minimize use of idle RAM for redundant data
•Extra computation to compute redundant data•Minimize extra computation overhead
![Page 11: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/11.jpg)
Soln 1: Mirroring
When page is swapped out send it to be stored in idle RAM of 2 nodes•If first node fails, can fetch a copy from second node
+ easy to implement- Requires twice as much idle RAM space for same amount of data
- Requires twice as much network bandwidth
- two page sends across NW vs. one
![Page 12: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/12.jpg)
Soln 2: Centralized ParityEncode redundant info for a set of
pages across diff nodes in a single parity page
If lose data, can recover it using parity page and other data pages in set
0 1 0 0 0 0 0 1 1 0 0 1parity
page
0 1 0 0 0 0 0 1 1 0 0 1 0 1 1
recovered page
![Page 13: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/13.jpg)
Centralized Parity (cont.) A single dedicated cluster node is
the parity server•Stores all parity pages•Implements page recovery on a crash
Parity Logging: regular nodes compute a parity page locally as they swap-out pages, only when parity page is full is it sent to parity server•One extra page send to parity server on every N page swap-outs (vs. 2 on every swap-out for mirroring)
![Page 14: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/14.jpg)
Soln 3: Decentralized Parity
0 1 0 0 0 0 0 1 1 0 0 1
0 0 0 1 0 1 1 0 0 0 0 1
0 1 0 0 1 0 1 1 0 1 1 0
0 1 1 0 0 1 1 1 1 1 0 1
No Dedicated Parity ServerParity Pages distributed across cluster nodes
![Page 15: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/15.jpg)
Centralized vs. Decentralized
![Page 16: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/16.jpg)
Results
![Page 17: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/17.jpg)
Future Work
![Page 18: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/18.jpg)
Acknowlegments
![Page 19: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/19.jpg)
Sequential Programming Designed to run on computers
with one processor (CPU)•CPU knows how to do a small number of simple things (instructions)
Sequential program is ordered set of instructions CPU executes to solve larger problem(ex) Compute 34
1. Multiply 3 and 32. Multiply result and 33. Multiply result and 34. Print out result
CPU CPU
Memory Memory
DiskDisk
![Page 20: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/20.jpg)
Sequential AlgorithmFor each time step do: For each grid element X do:
compute X’s new valueX = f(old X, neighbor 1, neighbor 2,
…)
xx
![Page 21: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/21.jpg)
The Earth SimulatorJapan Agency for Marine-Earth Science and Technology
The Earth SimulatorJapan Agency for Marine-Earth Science and Technology
![Page 22: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/22.jpg)
![Page 23: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/23.jpg)
How Computer Executes Program
CPUCPU
Memory (RAM)
Memory (RAM)
DiskDisk
1. OS loads program code & data from Disk into RAM
2. OS loads CPU withfirst instruction to run
3. CPU starts executinginstructions one at a time
4. OS may need to movedata to/from RAM & Disk as prog runs
11
22
![Page 24: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/24.jpg)
How Fast is this? CPU speed determines max of how
many instructions it can execute•Upper bound: 1 clock cycle: ~ 1 instruction •1 GHertz clock: ~1 billion instructions/sec
Max is never achieved•When CPU needs to access RAM
•takes ~100 cycles
•If OS needs to bring in more data from Disk•RAM is fixed-size, not all program data can fit•Takes ~1,000,000 cycles
![Page 25: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/25.jpg)
Fast desktop machine This is the idea but check these
nubmers!!!!! GigaHertz processor
•Takes ~.000000005 seconds to access data 2 GigaBytes of memory
•231 bytes•Takes ~.000001 seconds to access data
80 GB of disk space•Takes ~ .01 seconds to access data 1 million times slower than if data is on CPU
![Page 26: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/26.jpg)
Requirements of Simulation
Petabytes of data •1 petabyte is 250 bytes (1,125,899,906,842,624
bytes)
Billions of computations at each time step We need help:
•A single computer cannot do one time step in real time
•Need a supercomputer•Lots of processors run simulation program in parallel•Lots of memory space•Lots of disk storage space
![Page 27: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/27.jpg)
Parallel Programming Divide data and computation into
several pieces and let several processors simultaneously compute their piece
3.61.22.32.6…
Processor 1
……
Processor 2
Processor n
![Page 28: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/28.jpg)
Supercomputers of the 90’s
Massively parallel•1,000’s of processors
Custom, state of the art•Hardware •Operating System•Specialized Programming
Languages and Programming Tools
![Page 29: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/29.jpg)
Fastest Computer*
1
10
100
1000
10000
100000
1000000
1990199219941998200020032005
GFlops/sec
*(www.top500.org & Jack Dongara)
computation took 1yr in 1980, takes 16mins in 1995, 27secs in 2000
CrayY-MP
CrayY-MP
ASCI WhiteASCI White
TMCCM-5
TMCCM-5
Blue GeneBlue Gene
Earth SimulatorEarth Simulator
ASCI BlueASCI Blue
Intel ParagonIntel Paragon
![Page 30: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/30.jpg)
1
10
100
1000
10000
1990 1991 1992 1993 1994 1996 1998 1999 2000
GFlops/sec
Fastest Computer*
*(www.top500.org & Jack Dongara)
computation took 1yr in 1980, takes 16mins in 1995, 27secs in 2000
CrayY-MP
CrayY-MP
TMCCM-2
TMCCM-2
ASCI WhiteASCI White
ASCI BlueASCI Blue
Intel ParagonIntel Paragon
TMCCM-5
TMCCM-5
![Page 31: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/31.jpg)
Fastest Computer*
0
50000
100000
150000
200000
250000
300000
200020032005
GFlops/sec
*(www.top500.org & Jack Dongara)
ASCI WhiteASCI White
Blue GeneBlue Gene
Earth SimulatorEarth Simulator
![Page 32: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/32.jpg)
Problems with Supercomputers of the 90’s
Expensive Time to delivery
~2years Out of date soon
![Page 33: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/33.jpg)
Cluster: Supercomputer of the 00’s
Massively parallel Supercomputer out of network of unimpressive PCs•Each node is off-the-shelf hardware running off-
the-shelf OS
NetworkNetwork
![Page 34: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/34.jpg)
Are Clusters Good?+ Inexpensive
•Parallel computing for the masses
+ Easy to Upgrade• Individual components can be easily replaced
• Off-the-shelf parts, HW and SW•Can constantly and cheaply build a faster parallel
computer
- Using Off-The-Shelf Parts•Lag time between latest advances and availability
outside the research lab•Using parts that are not designed for parallel systems
Currently 7 of the world’s fastest 10 computers are clusters
![Page 35: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/35.jpg)
System-level Support for Clusters
Implement view of a single large parallel machine on top of separate machines
Single, big, shared memory on top of n, small, separate ones
Single, big, shared disk on top of n, small, separate ones
NetworkNetwork
![Page 36: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/36.jpg)
Nswap: Network Swapping Implements a view of a single, large,
shared memory on top of cluster nodes’ individual RAM (physical memory)•When one cluster node needs more memory space than it has, Nswap enables it use idle remote RAM of another cluster node(s) to increase its “memory” space
![Page 37: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/37.jpg)
Traditional Memory Managementprocessor
RAM
Disk
swapProgram1
pages
Program2pages
files
OS moves parts (pages) of running
programs in/out of RAM
• RAM: limited size, expensive, fast, storage • Disk: larger, inexpensive, slow (1,000,000 x slower), storage• Swap: virtual memory that is really on disk expand memory using disk
![Page 38: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/38.jpg)
Network Swapping in a Cluster
Disk
processor
RAMSwap pages to remote idle memory in the cluster
Node 1
processorRAM
Node 3
processorRAM
Node 2
network
files
• Network Swapping: expand memory using RAM of other cluster nodes
![Page 39: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/39.jpg)
Nswap Goals: Transparency
•Processes don’t need to do anything special to use Nswap
Efficiency and Scalability•Point-to-Point model (rather then central server)•Don’t require complete state info to make
swapping decisions
Adaptability•Adjusts to local processing needs on each node•Grow/Shrink portion of node’s RAM used for
remote swapping as its memory use changes
![Page 40: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/40.jpg)
Nswap Architecture Divided into two parts that run on each node
1) Nswap client device driver for network swap “device”
•OS makes swap-in & swap-out requests to it2) Nswap server manages part of RAM for caching remotely swapped pages (Nswap Cache)
Nswap clientNswap client
swap out pageswap out page Nswap serverNswap server
Nswap serverNswap server
Node A Node B
OS spaceOS space
User spaceUser space
Nswap CacheNswap Cache Nswap clientNswap client
Nswap Communication Layer Nswap Communication Layer
NetworkNetwork
![Page 41: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/41.jpg)
How Pages Move Around Cluster
Swap Out:
Swap In:
Migrate from server to server:
Node A (client) Node B (server)
Client A Server B
Client A Server B Server CWhen Server B is full, it migrates A’s page to server C
![Page 42: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/42.jpg)
Complications Simultaneous Conflicting
Operations•Ex. Migration and Swap-in for same page
Garbage Pages in the System•When program terminates, need to remove its remotely swapped pages from servers
Node failure•Can lose remotely swapped page data
![Page 43: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/43.jpg)
Currently, our project… Implemented on Linux cluster of 8
nodes connected with a switched 100 Mb/sec Ethernet network•All nodes have faster disk than network
•Disk is up to 500 Mb/sec•Network up to 100 Mb/sec
-> We expect to be slower than swapping to disk
![Page 44: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/44.jpg)
Experiments Workload 1: sequential R & W to
large chunk of memory•Best case for swapping to disk
Workload 2: random R & W to mem•Disk arm seeks w/in swap partition
Workload 3: Workload 1 + file I/O•Disk arm seeks between swap and file partitions
Workload 4: Workload 2 + file I/O
![Page 45: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/45.jpg)
Workload Execution Times
•Nswap faster than swapping to much faster disk for workloads 2, 3 and 4
0
100
200
300
400
500
600
700
800
WL1 WL2 WL3 WL4
Disk500Mb/ sec Nswap100Mb/ sec
![Page 46: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/46.jpg)
Nswap on Faster Networks
Measured on Disk, 10 Mb/s and 100 Mb/sCalculated speed-up values for 1,000 & 10,000 Mb/s
Work
load
Disk 10 Mb/s 100 Mb/s 1,000Mb/s
10,000
Mb/s(1) 12.27 306.69 56.8
speedup 5.4
28.9(10.6)
26.3(11.6)
(2) 266.79 847.74 153.5(5.5)
77.3(10.9)
70.3(12.1)
(4) 6265.39 9605.91 1733.9(5.54)
866.2(11.1)
786.7(12.2)
![Page 47: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/47.jpg)
Conclusions Nswap: Scalable, Adaptable,
Transparent Network Swapping for Linux clusters
Results show Nswap is•comparable to swapping to disk on slow network
•much faster than disk on faster networks•Based on network vs. disk speed trends, Nswap will be even better in the future
![Page 48: QMy general research area: System-level support for parallel and distributed computing User: run my_prog System: load my_prog on nodes, run it, collect.](https://reader035.fdocuments.us/reader035/viewer/2022062519/5697bfc51a28abf838ca6be7/html5/thumbnails/48.jpg)
AcknowledgementsStudents :
Sean Finney ’03 Matti Klock ’03Kuzman Ganchev ’03 Gabe Rosenkoetter ’02 Michael Spiegel ’03 Rafael Hinojosa ’01
Michener Fellowship for Second Semester Leave Support
More information, results, details:EuroPar’03 paper, CCSCNE’03 posterhttp://www.cs.swarthmore.edu/~newhall/