Part Four: The LSC DataGrid

Post on 02-Feb-2016

33 views 0 download

Tags:

description

Part Four: The LSC DataGrid. Part Four: LSC DataGrid. A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool. A: Data Replication. General Principle. Not all pipes are created equal. Neither are all storage locations. Data Requirements. - PowerPoint PPT Presentation

Transcript of Part Four: The LSC DataGrid

Part Four:The LSC DataGrid

Part Four: LSC DataGrid

• A: Data Replication

• B: What is the LSC DataGrid?

• C: The LSCDataFind tool

A: Data Replication

General Principle

Not all pipes are created equal. Neither are all

storage locations.

Data Requirements

• Catalog 108 files and their locations• What files are where (possibly at more than one

place)• Across multiple sites within a Grid

• No single point of failure• No central catalog/server

Data Replication Services: Concepts

• Abstract logical file name (LFN) from physical filename (PFN)

• Maintain a local replica catalog (LRC) mapping from LFNs to PFNs only for local files.

• Maintain a replica location index (RLI) mapping LFNs to other sites’ LRCs for files that aren’t local.

Replica Location Service

file1→ gsiftp://serverA/file1file2→ gsiftp://serverA/file2

LRC

RLIfile3→ rls://serverB/file3file4→ rls://serverB/file4

rls://serverA:39281

file1file2

site A

file3→ gsiftp://serverB/file3file4→ gsiftp://serverB/file4

LRC

RLIfile1→ rls://serverA/file1file2→ rls://serverA/file2

rls://serverB:39281

file3file4

site B

RLS: Replica Location Service

• Globus RLS

• Each RLS server usually runs two catalogs:• LRC: Local Replica Catalog

• Catalog of what files you have (LFNs) and mappings to URL(s) or PFNs

• RLI: Replica Location Index• Catalog of which files (LFNs) that other LRCs in your

data grid know about

A Site’s LRC

• Each site has LRC with mappings of LFNs to PFNs• usually contains the “local” mappings• where files are located at the site

• Example: UMW might have this mapping in its LRC:

H-R-792845521-16.gwf → gsiftp://dataserver.phys.uwm.edu/LIGO/H-R-792845521-16.gwf

LRCs Inform Each Other

LRC catalog at each site tells remote RLIs what LFNs it has mappings for.

• Example: UWM tells Caltech it has a mapping for H-R-792845521-16.gwf

• So Caltech RLI has mappingH-R-792845521-16.gwf → LRC at Milwaukee

How it Works (Under the Hood)

Ask your local LRC: “Do you know about file X?”• If yes, you can ask your local LRC for the

corresponding URL (PFN).• If no,

• Ask your local RLI: “Who do I ask about X?”

• It will answer, “The RLS server at Site Y.”

• Ask the LRC at Site Y, “Do you know about file X?”

• It will return the PFN.

SRB: Storage Request Broker

• http://www.sdsc.edu/srb/• Distributed data management solution• Supports management, collaborative (and controlled)

sharing, publication, and preservation of distributed data collections

• Provides rich set of APIs available to higher-level applications

• Provides a management layer on top of a wide variety of storage systems.

SRB

• SRB can be thought of as a:• Distributed file system• Datagrid management system• Digital Library system• Semantic Web

SRB as Data Grid Management

• Transparent replication

• Archiving, caching, synchs, and backups

• Heterogeneous storage

• Container and aggregated data movement

• Bulk data ingestion

• Third-party copy & move

LDR: Lightweight Data Replicator

• http://www.lsc-group.phys.uwm.edu/LDR

• Replicates datasets within a data grid• High-speed data transfers with Globus GridFTP• Globus RLS stored using a MySQL backend• Metadata stored in MySQL backend• Uses GSI for security

LDR

• Collections of files to be replicated defined by LRD administrator as a SQL query

• Priority queue for scheduling replication

B: What is the LSC DataGrid?

What is the LSC DataGrid?

• A collection of LSC computational and storage resources…

• … linked through Grid middleware…

• … into a uniform LSC data analysis environment.

LSC DataGrid Sites

• Tier 1: CalTech• Tier 2: UWM and PSU• Tier 3: UT-Brownsville and Salish Kootenai

College (SKC)• Linux clusters at GEO sites Birmingham,

Cardiff and the Albert Einstein Institute (AEI)• LDAS instances at Caltech, MIT, PSU, and

UWM

Monitoring the LSC DataGrid

http://watchtower.phys.uwm.edu/ganglia-webfrontend/

Lab 4: LSCDataFind

Lab 4: LSCDataFind

• In this lab, you’ll:• Verify your DataFind configuration• Find observatories• Find data types• Find actual data (wow!)• Refine a search• Retrieve data you’ve found

Credits

• NSF disclaimer

• Portions of this presentation were adapted from the following sources:• GryPhyN Grid Summer Workshop• NEESgrid Sysadmin Workshop