Diffusion of research-based instructional strategies_The case of SCALE-UP
A Data Diffusion Approach to Large Scale Scientific...
Transcript of A Data Diffusion Approach to Large Scale Scientific...
A Data Diffusion A Data Diffusion Approach to Large Scale Approach to Large Scale
Scientific ExplorationScientific ExplorationIoan Raicu
Distributed Systems LaboratoryComputer Science Department
University of Chicago
Joint work with: Yong Zhao: Microsoft
Ian Foster: Univ. of Chicago, CS & CI, Argonne National Laboratory, MCSAlex Szalay: The Johns Hopkins Univ., Dept. of Physics and Astronomy
Funded by: NSF: TeraGrid
DOE: Advanced Scientific Computing ResearchNASA: Ames Research Center
2007 Microsoft eScience Workshop at RENCIOctober 21st, 2007
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 2
Data Diffusion Example:AstroPortal Stacking Service• Purpose
– On-demand “stacks” of random locations within ~10TB dataset
• Challenge– Rapid access to 10-10K
“random” files– Time-varying load
• Solution– Dynamic acquisition of
compute, storage
S4 SloanData
+
+++
+
+
=
+
Web page or Web Service
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 3
Falkon: a Fast and Light-weight tasK executiON framework
• Goal: enable the rapid and efficient execution of many independent jobs on large compute clusters
• Combines three techniques:– multi-level scheduling techniques to enable
separate treatments of resource provisioning– a streamlined task dispatcher able to achieve
order-of-magnitude higher task dispatch rates than conventional schedulers
– performs data diffusion and uses a data-aware scheduler to leverage the co-located computational and storage resources
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 4
Execution Model• Dispatch Policy
– first-available, max-compute-util, max-cache-hit*• Caching Policy
– random, FIFO, LRU, LFU• Replay policy• Resource Acquisition Policy
– one-at-a-time, additive, exponential, all-at-once, optimal*
• Resource Release Policy– distributed, centralized*
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 5
Falkon 2-Tier Architecture
• Tier 1: Dispatcher– GT4 Web Service accepting
task submissions from clients and sending them to available executors
• Tier 2: Executor– Run tasks on local resources
• Provisioner– Static and dynamic resource
provisioningWS
WS
ProvisionerCompute
Resources
Executor 1
Clients
Executor n
ComputeResource m
Compute Resource 1
Dispatcher
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 7
Dispatch Throughput
0
100
200
300
400
500
0 32 64 96 128 160 192 224 256Number of Executors
Thro
ughp
ut (t
asks
/sec
)
WS Calls (no security)Falkon (no security)Falkon (GSISecureConversation)
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 8
Execution Efficiency
0.90.910.920.930.940.950.960.970.980.99
1
0 32 64 96 128 160 192 224 256Number of Executors
Effic
ienc
y
Sleep 64Sleep 32Sleep 16Sleep 8Sleep 4Sleep 2Sleep 1
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 10
Resource Provisioning0. provisioner registration1. task(s) submit2. resource allocation to GRAM3. resource allocation to LRM4. executor registration5. notification for work6. pick up task(s)7. deliver task(s) results8. notification for task result9. pick up task(s) results
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 13
Dynamic Resource Provisioning
GRAM+PBS Falkon-15 Falkon-60 Falkon-120 Falkon-180 Falkon-∞
Ideal (32 nodes)
Queue Time (sec) 611.1 87.3 83.9 74.7 44.4 43.5 42.2Execution Time (sec) 56.5 17.9 17.9 17.9 17.9 17.9 17.8Execution
Time % 8.5% 17.0% 17.6% 19.3% 28.7% 29.2% 29.7%
• End-to-end execution time: – 1260 sec in ideal case– 4904 sec 1276 sec
• Average task queue time: – 42.2 sec in ideal case– 611 sec 43.5 sec
• Trade-off:– Resource Utilization for
Execution Efficiency
GRAM+PBS Falkon-15 Falkon-60 Falkon-120 Falkon-180 Falkon-∞
Ideal (32 nodes)
Time to complete
(sec) 4904 1754 1680 1507 1484 1276 1260Resouce
Utilization 30% 89% 75% 65% 59% 44% 100%Execution Efficiency 26% 72% 75% 84% 85% 99% 100%Resource
Allocations 1000 11 9 7 6 0 0
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 14
Data Diffusion• Data caching at executor
local disk– RANDOM– FIFO– LRU– LFU
• Avoid the need for a shared file system
• Theoretical linear scalability with compute resources
0100200300400500600700800900
10001100
0.000
001
0.000
010.0
001
0.001 0.0
1 0.1 1 10 100
1000
File Size (MB)
Thro
ughp
ut (M
b/s)
Local ReadLocal Read+WriteGPFS ReadGPFS Read+Write
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 15
Shared File SystemPerformance
0
500
1000
1500
2000
2500
3000
3500
4000
1E-06
0.001 0.0
1 0.1 1 10 100
1000
File Size (MB)
Rea
d Th
roug
hput
(Mb/
s)
64 Nodes32 Nodes16 Nodes8 Nodes4 Nodes2 Nodes1 Node
0
500
1000
1500
2000
2500
3000
3500
4000
1E-06
0.001 0.0
1 0.1 1 10 100
1000
File Size (MB)
Rea
d+W
rite
Thro
ughp
ut (M
b/s)
64 Nodes32 Nodes16 Nodes8 Nodes4 Nodes2 Nodes1 Node
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 16
Local Disk Theoretical Performance
0
10000
20000
30000
40000
50000
60000
70000
1E-06
0.001 0.0
1 0.1 1 10 100
1000
File Size (MB)
Rea
d Th
roug
hput
(Mb/
s)
64 Nodes32 Nodes16 Nodes8 Nodes4 Nodes2 Nodes1 Node
0
10000
20000
30000
40000
50000
60000
70000
1E-06
0.001 0.0
1 0.1 1 10 100
1000
File Size (MB)
Rea
d+W
rite
Thro
ughp
ut (M
b/s)
64 Nodes32 Nodes16 Nodes8 Nodes4 Nodes2 Nodes1 Node
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 19
Data Diffusion:Micro-Benchmarks
1. Model (local disk): local disk performance model2. Model (shared file system): shared file system (GPFS) performance model3. Falkon (next-available policy): load balancing across available executors, operates
on shared file system 4. Falkon (next-available policy) + Wrapper: same as (3), but tasks execute through
a wrapper that creates a temp scratch directory on the shared file system, makes symbolic links, executes task, and removes the temp scratch directory and symbolic links
5. Falkon (first-available policy – 0% locality): load balancing across available executors with data caching; 0% data locality with data read from the shared file system
6. Falkon (first-available policy – 100% locality): same as (5), but with the workload from (5) repeating four times; the caches are first populated (not as part of the timed experiment)
7. Falkon (max-compute-util policy – 0% locality): same as (5), but uses the data-aware scheduler to send tasks to the executors that have the most data cached; the workload has 0% data locality
8. Falkon (max-compute-util policy – 100% locality): same as (7), but with the workload repeated four times
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 20
Micro-Benchmarks Results:1-64 Nodes
0
10000
20000
30000
40000
50000
60000
70000
1 2 4 8 16 32 64Number of Nodes
Rea
d Th
roug
hput
(Mb/
s)
1. Model (local disk)2. Model (shared file system)3. Falkon (next-available policy)4. Falkon (next-available policy) + Wrapper5. Falkon (first-available policy – 0% locality)6. Falkon (first-available policy – 100% locality)7. Falkon (max-compute-util policy – 0% locality)8. Falkon (max-compute-util policy – 100% locality)
0
10000
20000
30000
40000
50000
60000
70000
1 2 4 8 16 32 64Number of Nodes
Rea
d+W
rite
Thro
ughp
ut (M
b/s)
1. Model (local disk)2. Model (shared file system)3. Falkon (next-available policy)4. Falkon (next-available policy) + Wrapper5. Falkon (first-available policy – 0% locality)6. Falkon (first-available policy – 100% locality)7. Falkon (max-compute-util policy – 0% locality)8. Falkon (max-compute-util policy – 100% locality)
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 21
Micro-Benchmarks Results:1 Node
0
200
400
600
800
1000
1200
1400
1B 1KB 10KB 100KB 1MB 10MB 100MB 1GBFile Size
Rea
d Th
roug
hput
(Mb/
s)
1. Model (local disk)2. Model (shared file system)3. Falkon (next-available policy)4. Falkon (next-available policy) + Wrapper5. Falkon (first-available policy – 0% locality)6. Falkon (first-available policy – 100% locality)7. Falkon (max-compute-util policy – 0% locality)8. Falkon (max-compute-util policy – 100% locality)
0
200
400
600
800
1000
1200
1400
1B 1KB 10KB 100KB 1MB 10MB 100MB 1GBFile Size
Rea
d+W
rite
Thro
ughp
ut (M
b/s)
1. Model (local disk)2. Model (shared file system)3. Falkon (next-available policy)4. Falkon (next-available policy) + Wrapper5. Falkon (first-available policy – 0% locality)6. Falkon (first-available policy – 100% locality)7. Falkon (max-compute-util policy – 0% locality)8. Falkon (max-compute-util policy – 100% locality)
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 22
Micro-Benchmarks Results:8 Nodes
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1B 1KB 10KB 100KB 1MB 10MB 100MB 1GBFile Size
Rea
d Th
roug
hput
(Mb/
s)
1. Model (local disk)2. Model (shared file system)3. Falkon (next-available policy)4. Falkon (next-available policy) + Wrapper5. Falkon (first-available policy – 0% locality)6. Falkon (first-available policy – 100% locality)7. Falkon (max-compute-util policy – 0% locality)8. Falkon (max-compute-util policy – 100% locality)
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1B 1KB 10KB 100KB 1MB 10MB 100MB 1GBFile Size
Rea
d+W
rite
Thro
ughp
ut (M
b/s)
1. Model (local disk)2. Model (shared file system)3. Falkon (next-available policy)4. Falkon (next-available policy) + Wrapper5. Falkon (first-available policy – 0% locality)6. Falkon (first-available policy – 100% locality)7. Falkon (max-compute-util policy – 0% locality)8. Falkon (max-compute-util policy – 100% locality)
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 23
Micro-Benchmarks Results:64 Nodes
0
10000
20000
30000
40000
50000
60000
70000
1B 1KB 10KB 100KB 1MB 10MB 100MB 1GBFile Size
Rea
d Th
roug
hput
(Mb/
s)
1. Model (local disk)2. Model (shared file system)3. Falkon (next-available policy)4. Falkon (next-available policy) + Wrapper5. Falkon (first-available policy – 0% locality)6. Falkon (first-available policy – 100% locality)7. Falkon (max-compute-util policy – 0% locality)8. Falkon (max-compute-util policy – 100% locality)
0
10000
20000
30000
40000
50000
60000
70000
1B 1KB 10KB 100KB 1MB 10MB 100MB 1GBFile Size
Rea
d+W
rite
Thro
ughp
ut (M
b/s)
1. Model (local disk)2. Model (shared file system)3. Falkon (next-available policy)4. Falkon (next-available policy) + Wrapper5. Falkon (first-available policy – 0% locality)6. Falkon (first-available policy – 100% locality)7. Falkon (max-compute-util policy – 0% locality)8. Falkon (max-compute-util policy – 100% locality)
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 24
Task Dispatch Throughput:64 Nodes, 1 Byte files on GPFS
487 485
21
378 381 379
459
487
438
20
375 373 377
457
0
50
100
150
200
250
300
350
400
450
500
No Data Access 3. Falkon (next-available policy)
4. Falkon (next-available policy) +
Wrapper
5. Falkon (first-available policy –
0% locality)
6. Falkon (first-available policy –
100% locality)
7. Falkon (max-compute-util policy
– 0% locality)
8. Falkon (max-compute-util policy
– 100% locality)
Thro
ughp
ut (t
asks
/sec
)
ReadRead+Write
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 25
Dispatch Policies100%100%100% 100%100%100%
63%
99%100%
31%
99%100%
16%
99%100%
8%
97%100%
4%
97%100%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Cac
he H
it R
ate
(loca
l dis
k)
1 2 4 8 16 32 64Number of Nodes
first-availablemax-compute-utilmax-cache-hit
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 29
Applications via Swift
13
Virtual Node(s)
SwiftScript
Abstractcomputation
Virtual DataCatalog
SwiftScriptCompiler
Specification Execution
Virtual Node(s)
Provenancedata
ProvenancedataProvenance
collector
launcher
launcher
file1
file2
file3
AppF1
AppF2
Scheduling
Execution Engine(Karajan w/
Swift Runtime)
Swift runtimecallouts
CC CC
Status reporting
Swift Architecture
Provisioning
FalkonResource
Provisioner
AmazonEC2
Application #Tasks/workflow #StagesATLAS: High Energy
Physics Event Simulation 500K 1
fMRI DBIC: AIRSN Image Processing 100s 12
FOAM: Ocean/Atmosphere Model 2000 3GADU: Genomics 40K 4
HNL: fMRI Aphasia Study 500 4NVO/NASA: Photorealistic
Montage/Morphology 1000s 16
QuarkNet/I2U2: Physics Science Education 10s 3 ~ 6
RadCAD: Radiology Classifier Training 1000s 5
SIDGrid: EEG Wavelet Processing, Gaze Analysis 100s 20
SDSS: Coadd, Cluster Search 40K, 500K 2, 8
SDSS: Stacking, AstroPortal 10Ks ~ 100Ks 2 ~ 4MolDyn: Molecular Dynamics 1Ks ~ 20Ks 8
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 30
Applications
• Medicine: – fMRI (Falkon vs. GRAM)
• Astronomy: – Montage (Falkon vs. GRAM vs. MPI)– Stacking (Falkon with Data Diffusion vs.
Falkon using shared file system)
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 31
Applications: fMRI
1239
2510
3683
4808
456866 992 1123
120 327 546 678
0
1000
2000
3000
4000
5000
6000
120 240 360 480Input Data Size (Volumes)
Tim
e (s
)
GRAMGRAM/ClusteringFalkon
• GRAM vs. Falkon: 85%~90% lower run time• GRAM/Clustering vs. Falkon: 40%~74% lower run time
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 32
Applications: Montage
0
500
1000
1500
2000
2500
3000
3500
mProjec
t
mDiff/Fit
mBackg
round
mAdd(su
b)
mAdd total
Components
Tim
e (s
)
GRAM/ClusteringMPIFalkon
• GRAM/Clustering vs. Falkon: 57% lower application run time• MPI* vs. Falkon: 4% higher application run time• * MPI should be lower bound
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 33
Applications: Stacking
• 320M+ objects• 1.5M+ files• 3.3 TB compressed / 9TB
uncompressed
0250500750
10001250150017502000225025002750
010
0000
2000
0030
0000
4000
0050
0000
6000
0070
0000
8000
0090
0000
1E+0
61E
+06
1E+0
6Files
Obj
ect C
ount
per
File
• SDSS DR5 Dataset
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 34
Stacking Workloads
• Uniform sample from 320M objects• ZIPF distribution of object popularity
– alpha = 0.1, 1.0, 2.0Stacking Size (# of objects)
Object Distribution
Working Set Size (#
of files)
Working Set Size (# of objects)
Locality (accesses
per file)10000 ZIPF - alpha 0.1 1915 1924 5.2210000 ZIPF - alpha 1.0 2755 2819 3.6310000 ZIPF - alpha 2.0 6110 6259 1.6410000 UNIFORM 9771 10000 1.02
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 35
Stacking Performance Evaluation
0
50
100
150
200
250
300
350
400
450
Web Server Shared FileSystem (WAN)
Shared FileSystem (LAN)
Data Diffusion
Original Dataset Location
Tim
e (s
ec)
ZIPF - alpha 0.1ZIPF - alpha 1.0ZIPF - alpha 2.0UNIFORM
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 36
Object Size Effect
0
50
100
150
200
250
300
350
10x10 100x100 500x500Object Size
Tim
e (s
ec)
Shared File System (LAN)Data Diffusion
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 37
Data Diffusion Effect
0 20 40 60 80 100 120 140
Data Diffusion (UNIFORM)
Data Diffusion (ZIPF 0.1)
Data Diffusion (ZIPF 1.0)
Data Diffusion (ZIPF 2.0)
Shared File System (LAN - UNIFORM)
Shared File System (LAN - ZIPF 0.1)
Shared File System (LAN - ZIPF 1.0)
Shared File System (LAN - ZIPF 2.0)
Time (sec)
1st Time2nd Time
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 38
10K*2 Stacking Workload
0
100
200
300
400
500
600
700
800
900
1000
0 10 20 30 40 50 60 70 80 90 100 110 120Time (sec)
Thro
ughp
ut (C
utou
ts/s
ec)
10K Objects - UNIFORM 1st Time: Instantenous Throughput10K Objects - UNIFORM 1st Time: Average Throughput10K Objects - UNIFORM 2nd Time: Instantenous Throughput10K Objects - UNIFORM 2nd Time: Average Throughput
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 39
100K Stacking Workload
0
50
100
150
200
250
0 100 200 300 400 500 600 700Time (sec)
Thro
ughp
ut (C
utou
ts/s
ec)
100K Objects - UNIFORM: Instantenous Throughput100K Objects - UNIFORM: Average Throughput
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 40
Future Work
• Explore data pre-fetching policies• Update Swift to use data diffusion• 3-Tier Architecture (essential on BG/P)• Extend provisioner to support the Virtual
Workspace Service (opens door to EC2)• Globus Incubator Project• Alternate languages and technologies
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 44
More Information• Web: http://people.cs.uchicago.edu/~iraicu/research/Falkon/• Related Projects:
– Swift: http://www.ci.uchicago.edu/swift/index.php– AstroPortal: http://people.cs.uchicago.edu/~iraicu/research/AstroPortal/index.htm
• Data Diffusion Collaborators:– Ioan Raicu, Computer Science Dept., The University of Chicago– Yong Zhao, Microsoft– Ian Foster, Math and Computer Science Div., Argonne National Laboratory & Computer
Science Dept., The University of Chicago – Alex Szalay, Department of Physics and Astronomy, The Johns Hopkins University
• Other Falkon Collaborators:– Catalin Dumitrescu, Computer Science Dept., The University of Chicago – Mike Wilde, Computation Institute, University of Chicago & Argonne National Laboratory
• Funding:– NASA: Ames Research Center, Graduate Student Research Program (GSRP)– DOE: Mathematical, Information, and Computational Sciences Division subprogram of the
Office of Advanced Scientific Computing Research, Office of Science, U.S. Dept. of Energy– NSF: TeraGrid
10/22/2007 A Data Diffusion Approach to Large Scale Scientific Exploration 45
More Information:Related Documents
• Ioan Raicu, Yong Zhao, Catalin Dumitrescu, Ian Foster, Mike Wilde. “Falkon: a Fast and Light-weight tasK executiON framework”, to appear at IEEE/ACM SuperComputing 2007.
• Ioan Raicu, Catalin Dumitrescu, Ian Foster. Dynamic Resource Provisioning in Grid Environments, to appear TeraGrid Conference 2007.
• Yong Zhao, Mihael Hategan, Ben Clifford, Ian Foster, Gregor von Laszewski, Ioan Raicu, Tiberiu Stef-Praun, Mike Wilde. “Swift: Fast, Reliable, Loosely Coupled Parallel Computation”, to appear at IEEE Workshop on Scientific Workflows 2007.
• Yong Zhao, Mihael Hategan, Ioan Raicu, Mike Wilde, Ian Foster. “Swift: a Parallel Programming Tool for Large Scale Scientific Computations”, under review at Scientific Programming Journal, Special Issue on Dynamic Computational Workflows: Discovery, Optimization, and Scheduling.
• Ioan Raicu, Ian Foster, Alex Szalay. “Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets”, poster presentation, IEEE/ACM SuperComputing 2006.
• Ioan Raicu, Ian Foster, Alex Szalay, Gabriela Turcu. “AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis”, TeraGrid Conference 2006, June 2006.
• Alex Szalay, Julian Bunn, Jim Gray, Ian Foster, Ioan Raicu. “The Importance of Data Locality in Distributed Computing Applications”, NSF Workflow Workshop 2006.