Plank
description
Transcript of Plank
![Page 1: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/1.jpg)
The Storage Fabric of the Grid:The Network Storage Stack
James S. Plank
Director:Logistical Computing and Internetworking
(LoCI) Laboratory
Department of Computer ScienceUniversity of Tennessee
Cluster and Computational Grids for Scientific Computing:September 12, 2002, Le Chateau de Faverges de la Tour, France
![Page 2: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/2.jpg)
Grid Research & The Fabric Layer
Middleware
Application
Resources
The“Fabric”
Layer
![Page 3: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/3.jpg)
What is the Fabric Layer?
• Networking: TCP/IP
• Storage: Files in a file system
• Computation: Processes managed by an OS
![Page 4: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/4.jpg)
What is the Fabric Layer?
• Networking: TCP/IP
• Storage: Files in a file system
• Computation: Processes managed by an OS
Most Grid research accepts these as givens.(Examples: MPI, GridFTP)
![Page 5: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/5.jpg)
LoCI’s Research Agenda
Redefine the fabric layer based onEnd-to-End Principles
Communication Storage Computation
Data / Link /Physical
Network
Transport
Application
Access /Physical
IBP Depot
exNode
LoRS
Application
Access /Physical
IBP NFU
exProc
LoRS
Application
![Page 6: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/6.jpg)
What Should This Get You?
• Scalabililty
• Flexibility
• Fault-tolerance
• Composability
I.E. Better Grids
![Page 7: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/7.jpg)
LoCI Lab Personnel
Directors:Jim PlankMicah Beck
Exec Director:Terry Moore
Grad Students:Erika FuentesSharmila KancherlaXiang LiLinzhen Xuan
Research Staff:Scott AtchleyAlexander BassiYing Ding Hunter HagewoodJeremy MillarStephen SolteszYong Zheng
Undergrad Students:Isaac CharlesRebecca CollinsKent GalbraithDustin Parr
![Page 8: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/8.jpg)
Collaborators
• Jack Dongarra (UT - NetSolve, Linear Algebra)
• Rich Wolski (UCSB - Network Weather Service)
• Fran Berman (UCSD/NPACI - Scheduling)
• Henri Casanova (UCSD/NPACI - Scheduling)
• Laurent LeFevre (INRAI/ENS - Multicast, Active Networking)
![Page 9: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/9.jpg)
The Network Storage Stack
Applications
Logistical File System
Logistical Tools
L-Bone
IBP
Local Access
Physical
exNode
• A Fundamental Organizing Principle
• Like the IP Stack
• Each level encapsulates details from the lower levels, while still exposing details to higher levels
![Page 10: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/10.jpg)
The Network Storage Stack
Applications
Logistical File System
Logistical Tools
L-Bone
IBP
Local Access
Physical
exNode
• A Fundamental Organizing Principle
• Like the IP Stack
• Each level encapsulates details from the lower levels, while still exposing details to higher levels
![Page 11: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/11.jpg)
The Network Storage Stack
The L-bone:Resource discovery& proximity queries
IBP (Internet Backplane Protocol): Allocating and managing network storage
The exNode:A data structurefor aggregation
LoRS: The Logistical Runtime System:Aggregation tools and methodologies
![Page 12: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/12.jpg)
IBP: The Internet Backplane Protocol
• Managing and using state in the network.
• Inserting storage in the network so that:– Applications may use it advantageously.
– Storage owners do not lose control of their resources.
– The whole system is truly scalable and fault-tolerant
Low-level primitives and software for:
![Page 13: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/13.jpg)
The Byte Array:IBP’s Unit of Storage
• You can think of it as a “buffer”.
• You can think of it as a “file”.
• Append-only semantics.
• Transience built in.
![Page 14: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/14.jpg)
The IBP Client API
• Can be used by anyone* who can talk to the server.
• Seven procedure calls in three categories:– Allocation (1)– Data transfer (4)– Management (2)
• * not really, but close...
![Page 15: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/15.jpg)
Client API: Allocation• IBP_allocate(char *host, int maxsize, IBP_attributes attr)
• Like a network malloc()
• Returns a trio of capabilities.– Read / Write / Manage– ASCII Strings (obfuscated)
• No user-defined file names:– Big flat name space.– No registration required to pass capabilities.
![Page 16: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/16.jpg)
Allocation Attributes
• Time-Limited or Permanent
• Soft or Hard
• Read/Write semantics:– Byte Array– Pipe– Circular Queue
![Page 17: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/17.jpg)
Client API: Data Transfer
• IBP_store(write-cap, bytes, size, ...)• IBP_deliver(read-cap, pointer, size, ...)
• IBP_copy(read-cap, write-cap, size, ...)
• IBP_mcopy(...)
2-party:
3-party:
N-party/other things:
![Page 18: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/18.jpg)
IBP Client API: Management
• IBP_manage()/IBP_status()
• Allows for resizing byte arrays.• Allows for extending/shortening the time limit on
time-limited allocations.• Manages reference counts on the read/write
capabilities.• State probing.
![Page 19: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/19.jpg)
IBP Servers
• Daemons that serve local disk or memory.
• Root access not required.
• Can specify sliding time limits or revokability.
• Encourages resource sharing.
![Page 20: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/20.jpg)
Typical IBP usage scenario
![Page 21: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/21.jpg)
Sender ReceiverIBP Network
Logistical Networking Strategies
Sender ReceiverIBPNetwork
Sender ReceiverIBPIBP IBP
#2
#1
#3
Sender ReceiverIBP#4 IBPIBP
![Page 22: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/22.jpg)
XSuffrage on MCell/APST
University of Tennessee, Knoxville
NetSolve+
IBP
University of California, San Diego
GRAM+
GASS
Tokyo Institute of Technology
NetSolve+
NFS
NetSolve+
IBP
APST DaemonAPST Client
[(NetSolve+IBP) + (GRAM+GASS) + (NetSolve+NFS)] + NWS
![Page 23: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/23.jpg)
MCell/APST Experimental Results
Experimental Setting:
MCell simulation with 1,200 tasks:• composed of 6 Monte-Carlo Simulations• input files: 1, 20, 100 MB
4 scenarios: Initially(a) all input files are only in Japan(b) 100MB files staged in California(c) in addition, one 100MB file staged in Tennessee(d) all input files replicated everywhere
workqueue
XSufferage
Scheduling Heuristics match data and tasks in appropriate locations- Automatic staging with IBP effective- Improved overall performance
![Page 24: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/24.jpg)
The Network Storage Stack
The L-bone:Resource Discovery& Proximity queries
IBP: Allocating and managing networkstorage (like a network malloc)
The exNode:A data structurefor aggregation
LoRS: The Logistical Runtime System:Aggregation tools and methodologies
![Page 25: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/25.jpg)
The Logistical Backbone (L-Bone)
• LDAP-based storage resource discovery.
• Query by capacity, network proximity, geographical proximity, stability, etc.
• Periodic monitoring of depots.
• Uses the Network Weather Service (NWS) for live measurements and forecasting.
![Page 26: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/26.jpg)
Snapshot: August, 2002
Approximately 1.6 TB of publicly accessible storage(Scaling to a petabyte someday…)
![Page 27: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/27.jpg)
The Network Storage Stack
The L-bone:Resource Discovery& Proximity queries
IBP: Allocating and managing networkstorage (like a network malloc)
The exNode:A data structurefor aggregation
LoRS: The Logistical Runtime System:Aggregation tools and methodologies
![Page 28: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/28.jpg)
The exNode
• The Network “File” Pointer.• Analogous to the Unix inode.• Map byte-extents to IBP buffers (or other allocations).
• XML-based data structure/serialization.• Allows for replication, flexible decomposition of data.• Also allows for “end-to-end services.”• Arbitrary metadata.
![Page 29: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/29.jpg)
The exNode (XML-based)
A B C
0
300
200
100
IBPDepots
Network
![Page 30: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/30.jpg)
The Network Storage Stack
The L-bone:Resource Discovery& Proximity queries
IBP: Allocating and managing networkstorage (like a network malloc)
The exNode:A data structurefor aggregation
LoRS: The Logistical Runtime System:Aggregation tools and methodologies
![Page 31: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/31.jpg)
Logistical Runtime System
• Aggregation for:– Capacity– Performance (striping)– More performance (caching)– Reliability (replication)– More reliability (ECC)– Logistical purposes (routing)
![Page 32: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/32.jpg)
Logistical Runtime System
• Basic Primitives:
– Upload: Create a network file from local data
– Download: Get bytes from a network file.
– Augment: Add more replicas to a network file.
– Trim: Remove replicas from a network file.
– Stat: Get information about the network file.
– Refresh: Alter the time limits of the IBP buffers.
![Page 33: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/33.jpg)
Upload
![Page 34: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/34.jpg)
Augment to Tennessee
![Page 35: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/35.jpg)
Augment to Santa Barbara
![Page 36: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/36.jpg)
Stat (ls)
![Page 37: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/37.jpg)
Failures do
happen.
![Page 38: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/38.jpg)
Download
![Page 39: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/39.jpg)
Trimming(dead capability removal)
![Page 40: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/40.jpg)
End-To-End Services:
• MD5 Checksums stored per exNode block to detect corruption.
• Encryption is a per-block option.
• Compression is an per-block option.
• Parity/Coding is in the design.
![Page 41: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/41.jpg)
Parity / Coding
IBPBuffers
Network
= + +
= + 2 + 3
ExNode with Coding
![Page 42: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/42.jpg)
Scalability
• No bottlenecks• Really hard problems left unsolved, but for
the most part, the lower levels shouldn’t need changing.– Naming
– Good scheduling
– Consistency / File System semantics
– Computation
![Page 43: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/43.jpg)
Status
Applications
Logistical File System
Logistical Tools
L-Bone
IBP
Local Access
Physical
exNode
• IBP/L-Bone/exNode/Tools all supported.
• Apps: Mail, IBP-ster, Video IBP-ster, IBPvo -- demo at SC-02
• Other institutions (see L-Bone)
![Page 44: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/44.jpg)
What’s Coming Up?
• More nodes on the L-Bone
• More collaboration with applications groups
• Research on performance and scheduling
• Logistical File System
• A Computation Stack
• Code / Information at loci.cs.utk.edu
![Page 45: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/45.jpg)
The Storage Fabric of the Grid:The Network Storage Stack
James S. Plank
Director:Logistical Computing and Internetworking
(LoCI) Laboratory
Department of Computer ScienceUniversity of Tennessee
Cluster and Computational Grids for Scientific Computing:September 12, 2002, Le Chateau de Faverges de la Tour, France
![Page 46: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/46.jpg)
Replication: Experiment #1
UCSB
UCSDTAMU
UTK UNC
Harvard
Turin, ITStuttgart, DE
3 MB file
0
3 MB
UTK 2
UTK 5
UTK 6
UTK 3
UTK 4
UTK 1
UCSB 1
UCSB 2
UCSB 3
UCSD 1
UCSD 3
Harvard
UNC
UTK 5
UTK 2
UTK 5
UTK 6
UTK 3
UCSB 1
UCSB 2
UCSB 3
![Page 47: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/47.jpg)
Replication: Experiment #1
01020304050
708090
100
60
UTK
99.8
5U
CS
DU
CS
BH
arva
rd
UN
C
99.7
1
95.3
1
59.7
7
99.8
8
Fra
gm
ent
Ava
ilab
ility
(%
)
01020304050
708090
100
60
UTK
96.2
7U
CS
DU
CS
BH
arva
rd
UN
C
98.6
0
88. 6
057
.29
97.2
0
Fra
gm
ent
Ava
ilab
ility
(%
)
01020304050
708090
100
60
UTK
99.8
7U
CS
DU
CS
BH
arva
rd
UN
C
99.8
0
90.4
7
57.4
5
99.8
7
Fra
gm
ent
Ava
ilab
ility
(%
)
Depot Availability at UTK
Depot Availability at UCSD
Depot Availability at Harvard
860 Download Attempts
100% Success
857 Download Attempts
100% Success
751 Download Attempts
100% Success
![Page 48: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/48.jpg)
Most Frequent Download Path
From UTK From Harvard
From UCSD
![Page 49: Plank](https://reader036.fdocuments.us/reader036/viewer/2022081502/555d82b5d8b42a3a3b8b4d39/html5/thumbnails/49.jpg)
Replication: Experiment #2
• Deleted 12 of the 21 IBP allocations
• Downloaded from UTK
3 MB file
0
3 MB
UTK 2
UTK 5 100%
UTK 6
UTK 3
UTK 4 99.84%
UTK 1
UCSB 1 93.88%
UCSB 2
UCSB 3
UCSD 1
UCSD 3 100%
Harvard 48.24%
UNC
UTK 5 99.78%
UTK 2
UTK 5 100%
UTK 6 100%
UTK 3
UCSB 1
UCSB 2 94.69%
UCSB 3
1,225 Attempts
93.88% Success