Part Four: The LSC DataGrid

24

description

Part Four: The LSC DataGrid. Part Four: LSC DataGrid. A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool. A: Data Replication. General Principle. Not all pipes are created equal. Neither are all storage locations. Data Requirements. - PowerPoint PPT Presentation

Transcript of Part Four: The LSC DataGrid

Page 1: Part Four: The LSC DataGrid
Page 2: Part Four: The LSC DataGrid

Part Four:The LSC DataGrid

Page 3: Part Four: The LSC DataGrid

Part Four: LSC DataGrid

• A: Data Replication

• B: What is the LSC DataGrid?

• C: The LSCDataFind tool

Page 4: Part Four: The LSC DataGrid

A: Data Replication

Page 5: Part Four: The LSC DataGrid

General Principle

Not all pipes are created equal. Neither are all

storage locations.

Page 6: Part Four: The LSC DataGrid

Data Requirements

• Catalog 108 files and their locations• What files are where (possibly at more than one

place)• Across multiple sites within a Grid

• No single point of failure• No central catalog/server

Page 7: Part Four: The LSC DataGrid

Data Replication Services: Concepts

• Abstract logical file name (LFN) from physical filename (PFN)

• Maintain a local replica catalog (LRC) mapping from LFNs to PFNs only for local files.

• Maintain a replica location index (RLI) mapping LFNs to other sites’ LRCs for files that aren’t local.

Page 8: Part Four: The LSC DataGrid

Replica Location Service

file1→ gsiftp://serverA/file1file2→ gsiftp://serverA/file2

LRC

RLIfile3→ rls://serverB/file3file4→ rls://serverB/file4

rls://serverA:39281

file1file2

site A

file3→ gsiftp://serverB/file3file4→ gsiftp://serverB/file4

LRC

RLIfile1→ rls://serverA/file1file2→ rls://serverA/file2

rls://serverB:39281

file3file4

site B

Page 9: Part Four: The LSC DataGrid

RLS: Replica Location Service

• Globus RLS

• Each RLS server usually runs two catalogs:• LRC: Local Replica Catalog

• Catalog of what files you have (LFNs) and mappings to URL(s) or PFNs

• RLI: Replica Location Index• Catalog of which files (LFNs) that other LRCs in your

data grid know about

Page 10: Part Four: The LSC DataGrid

A Site’s LRC

• Each site has LRC with mappings of LFNs to PFNs• usually contains the “local” mappings• where files are located at the site

• Example: UMW might have this mapping in its LRC:

H-R-792845521-16.gwf → gsiftp://dataserver.phys.uwm.edu/LIGO/H-R-792845521-16.gwf

Page 11: Part Four: The LSC DataGrid

LRCs Inform Each Other

LRC catalog at each site tells remote RLIs what LFNs it has mappings for.

• Example: UWM tells Caltech it has a mapping for H-R-792845521-16.gwf

• So Caltech RLI has mappingH-R-792845521-16.gwf → LRC at Milwaukee

Page 12: Part Four: The LSC DataGrid

How it Works (Under the Hood)

Ask your local LRC: “Do you know about file X?”• If yes, you can ask your local LRC for the

corresponding URL (PFN).• If no,

• Ask your local RLI: “Who do I ask about X?”

• It will answer, “The RLS server at Site Y.”

• Ask the LRC at Site Y, “Do you know about file X?”

• It will return the PFN.

Page 13: Part Four: The LSC DataGrid

SRB: Storage Request Broker

• http://www.sdsc.edu/srb/• Distributed data management solution• Supports management, collaborative (and controlled)

sharing, publication, and preservation of distributed data collections

• Provides rich set of APIs available to higher-level applications

• Provides a management layer on top of a wide variety of storage systems.

Page 14: Part Four: The LSC DataGrid

SRB

• SRB can be thought of as a:• Distributed file system• Datagrid management system• Digital Library system• Semantic Web

Page 15: Part Four: The LSC DataGrid

SRB as Data Grid Management

• Transparent replication

• Archiving, caching, synchs, and backups

• Heterogeneous storage

• Container and aggregated data movement

• Bulk data ingestion

• Third-party copy & move

Page 16: Part Four: The LSC DataGrid

LDR: Lightweight Data Replicator

• http://www.lsc-group.phys.uwm.edu/LDR

• Replicates datasets within a data grid• High-speed data transfers with Globus GridFTP• Globus RLS stored using a MySQL backend• Metadata stored in MySQL backend• Uses GSI for security

Page 17: Part Four: The LSC DataGrid

LDR

• Collections of files to be replicated defined by LRD administrator as a SQL query

• Priority queue for scheduling replication

Page 18: Part Four: The LSC DataGrid

B: What is the LSC DataGrid?

Page 19: Part Four: The LSC DataGrid

What is the LSC DataGrid?

• A collection of LSC computational and storage resources…

• … linked through Grid middleware…

• … into a uniform LSC data analysis environment.

Page 20: Part Four: The LSC DataGrid

LSC DataGrid Sites

• Tier 1: CalTech• Tier 2: UWM and PSU• Tier 3: UT-Brownsville and Salish Kootenai

College (SKC)• Linux clusters at GEO sites Birmingham,

Cardiff and the Albert Einstein Institute (AEI)• LDAS instances at Caltech, MIT, PSU, and

UWM

Page 21: Part Four: The LSC DataGrid

Monitoring the LSC DataGrid

http://watchtower.phys.uwm.edu/ganglia-webfrontend/

Page 22: Part Four: The LSC DataGrid

Lab 4: LSCDataFind

Page 23: Part Four: The LSC DataGrid

Lab 4: LSCDataFind

• In this lab, you’ll:• Verify your DataFind configuration• Find observatories• Find data types• Find actual data (wow!)• Refine a search• Retrieve data you’ve found

Page 24: Part Four: The LSC DataGrid

Credits

• NSF disclaimer

• Portions of this presentation were adapted from the following sources:• GryPhyN Grid Summer Workshop• NEESgrid Sysadmin Workshop