Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc

Post on 15-Jan-2016

29 views 1 download

description

Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc.edu http://www.npaci.edu/DICE/SRB/. SDSC/UCSD/NPACI. A Quick Overview of SRB Data Grid. Federated server system Single client signOn Access to all resources in the federation Data grid owns all files - PowerPoint PPT Presentation

Transcript of Data Grid Interactions with Firewalls Michael Wan Reagan Moore {mwan,moore}@sdsc

Data Grid Interactionswith Firewalls

Michael WanReagan Moore

{mwan,moore}@sdsc.edu

http://www.npaci.edu/DICE/SRB/

SDSC/UCSD/NPACI

A Quick Overview of SRB Data Grid

• Federated server system– Single client signOn– Access to all resources in the federation– Data grid owns all files

• Context management– MCAT server – Metadata catalog– Use traditional DBMS

• Four logical name spaces– Logical resource name (operations on sets of resources)– Distinguished user name space– Logical file name space– Metadata attribute name space (state information)

Federated Servers and Resources

MCAT1

MCAT2

MCAT3Server1.1

Server1.2

Server2.1Server2.2

Server3.1

Federated Data Grids

Data Grid 1

Data Grid 2

Data Grid 3

Types of Data Loss Risks

• Media corruption

• Vendor systemic failure

• Operational error

• Malicious user

• Natural disaster

• Solutions - replication, firewalls, federation

National Archives Persistent Archive

NARA U Md SDSC

MCAT MCAT MCAT

Principle copystored at NARAwith completemetadata catalog

Replicated copyat U Md for improvedaccess, load balancingand disaster recovery

Deep Archive atSDSC, no useraccess, but complete copy

BIRN Virtual Data Grid:BIRN Virtual Data Grid:Source Mark EllismanSource Mark Ellisman

• Defines a Distributed Data Handling System

• Integrates Storage Resources in the BIRN network

• Integrates Access to Data, to Computational and Visualization Resources

• Acts as a Virtual Platform for Knowledge-based Data Integration Activities

• Provides a Uniform Interface

to Users

Worldwide Universities NetworkDavid De Roure, University of Southampton

dder@ecs.soton.ac.ukhttp://www.ecs.soton.ac.uk/~dder

• Implement data grid linking academic universities

• Support collaborative research and education– HASTAC: Humanities, Arts, Science and Technology Advanced

Collaboratory

– Geo-referenced social science data collections

– Earth Science data collections

• Provide data grid registry to promote federation of international data grids

Foundation of the WUN Grid

• SDSC• Manchester• Southampton• White Rose• NCSA• A functioning, general

purpose international Grid

• A hub for federating other data grids Manchester-SDSC mirror

Authentication

• User authenticates to a data grid server– GSI or challenge response– Access controls map constraints between user

distinguished names and logical file names

• Data grid server authenticates to remote data grid server

• Remote data grid server authenticates to remote storage repository under data grid ID

Firewall Interactions• Client behind a firewall

• Client initiated parallel I/O• Client initiated bulk file load

• Server behind a firewall• Paired servers inside and outside the firewall

• Server inside the firewall only responds to messages from outside server

• Server initiated parallel I/O

• Federated data grids• Need to add metadata to forward messages from a paired front-end server to the back-end server

SRBserver1

SRB agent

SRBserver2

Client behind firewall

MCAT

Sput

SRB agent

1

2

3

4

5

6

srbObjCreatesrbObjWrite

1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control

Peer-to-peer

Request

Server(s) SpawningData

Transfer R

SRBserver1

SRB agent

SRBserver2

Client Initiated Parallel I/O

MCAT

Sput -M

SRB agent

1

2

3

4

7

8srbObjPut

1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control

Return socket addr.,

port and cookie

Connect to server

Data transfer

R

5

6

SRBserver

SRB agent

SRBserver2

Client Initiated -Third Party Data Transfer

MCAT

Scp

SRB agent

1

2

3

4

5

srbObjCopy

dataPut- socket addr.,

port and cookie

Connect to server2 Data

transfer

R

6

SRBserver1

SRBserver

SRB agent

R

SRBserver1

SRB agent

SRBserver2

Client Initiated - Bulk Load Operation

MCAT

Sput -b

SRB agent

1

2

3

4

6

Return Resource Location

1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control

Query Resource

Bulk Register

Bulk Data transfer thread

R

8 Mb buffer

Bulk Registration

threads

5

Store Data in a temp file

Unfold temp file

SRBserver1

SRB agent

SRBserver2

Server behind firewall

MCAT

Sput

SRB agent

1

2

3

4

5

6

srbObjCreatesrbObjWrite

1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control

Peer-to-peer

Request

Server(s) SpawningData

Transfer R

SRBserver1

SRB agent

SRBserver2

Server Initiated Parallel I/O

MCAT

Sput -m

SRB agent

1

2

3

4

5

6

srbObjPut+ socket addr , port and cookie

1.Logical-to-Physical mapping2. Identification of Replicas3.Access & Audit Control

Peer-to-peer

Request

Connect to client

Data transfer

R

Federated Data Grids

MCAT1

MCAT2

MCAT3Server1.1

Server1.2

Server2.1Server2.2

Server3.1

Automating redirection toa server in front of a firewall

Data Grid 1

Data Grid 2

Data Grid 3

Client

Container - Archival of Small files

• Performance issues with storing/retrieving large number of small files to/from tape

• Container design– physical grouping of small files– Implemented with a Logical Resource

• A pool of Cache Resource for the frontend resource• An Archival Resource for the backend resource

– Read/Write I/O always done on Cache Resource and sync to the Archival Resource

• Stage to cache if a cache copy does not exist• The entire container is moved between cache and archival and

written to tape • Bulk operation with container - faster

Examples of using container

• Make a container with name “myCont”– Smkcont -S cont-sdsc myCont

• Put a file into “myCont”– Sput -c myCont myLocalSrcFile mySRBTargFile

• Bulk Load a local directory into “myCont”– Sbload -c myCont myLocalSrcDir mySRBTargColl

• Sync “myCont” to archival and purge the cache copy– Ssyncont -d myCont

• Download a file store in “myCont”– Sget mySRBsrcFile myLocalTargFile

• Slscont - list existing containers and contents

Summary of Data Transfer modes

• Serial - default mode

• Parallel - for large files

• Bulk - for large number of small files

• Container - Archiving small files (to tapes).

• Container + bulk - faster archival of small files

Types of Data Transfer

• Local to SRB - Sput, Srsync

• SRB to Local - Sget, Srsync

• SRB to SRB - Scp, Sreplicate, Sbkupsrb, Srsync– Third party transfer

• Server to Server data transfer, client not involved

• Parallel I/O

Other useful Data Management Scommands

• Srsync, Schksum - – Data synchronization using checksum values – similar to UNIX’s rsync

• Sreplicate, Sbkupsrb– generate multiple copies of data using replica– Replica - multiple copies of the same file

• same Logical Path Name - e.g., /home/srb.sdsc/foo

• replica on different resources

• Each replica has different replNum

• Most recently modified flag

Commands Using Checksum

• Registering checksum values into MCAT– at the time of upload

• Sput -k - compute checksum of local source file and register with MCAT

• Sput -K – checkum verification mode

– After upload, compute checksum by reading back uploaded file

– Compare with the checksum generated with locally

– Existing SRB files• Schksum

– compute and register checksum if not already exist

• Srsync - if the checksum does not exist

Srsync command• Synchronize the data

– from a local copy to SRB• Srsync myLocalFile s:mySrbFile

– from a SRB copy to a local file system• Srsync s:mySrbFile myLocalFile

– between two SRB paths.• Srsync s:mySrbFile1 s:mySrbFile2

• Similar to rsync– compare the checksum values of source and target– upload/download source to target if

• target does not exist or checksum differ

– Save checksum values to MCAT

Srsync command (cont)

• Some Srsync options– -r --- recursively Synchronizing a

directory/collection– -s --- use size instead of checksum value for

determining synchronization• Faster - no checksum computation

• Less accurate

– -m, -M --- parallel I/O

Sreplicate, Sbkupsrb commands

• Generate multiple copies of data using replica

• Sreplicate - Generate a new replica each time

• Sbkupsrb– Backups the srb data/collection to the specified

backupResource with a replica– If an up-to-date replica already exists in the

backupResource, nothing will be done

Data and Resource Virtualisation

• Data and Collections Organisation– File Logical Name space -

• UNIX like directories (collections) and files (data)

• Mapping of logical name to physical attributes - host address, physical path.

• UNIX like API and utilities for making collections (mkdir) and data creation (creat)

• Virtualisation of Resources– Mapping of a logical resource name to physical attributes: Resource

Location, Type – Client use a single logical name to reference a resource

Listing Resources

• SgetR – List Configured Resources– SgetR– --------------------------- RESULTS ------------------------------– rsrc_name: unix-sdsc– netprefix: srb.sdsc.edu:NULL:NULL– rsrc_typ_name: unix file system– default_path:

/misc/srb/srb/SRBVault/?USER.?DOMAIN/?SPLITPATH/TEST.?PATH?DATANAME.?RANDOM.?TIMESEC

– phy_default_path: /misc/srb/srb/SRBVault/?USER.?DOMAIN/?SPLITPATH/TEST.?PATH?DATANAME.?RANDOM.?TIMESEC

– phy_rsrc_name: unix-sdsc– rsrc_typ_name: unix file system– rsrc_class_name: permanent– user_name: srb– domain_desc: sdsc– zone_id: sdscdemo– -----------------------------------------------------------------

Serial Mode Data Transfer

• Simple to Implement and Use – Unix-like API – srbObjCreate, srbObjWrite

• Performance Issue– 2 hops data transfer – Single data stream– One file at a time – overhead relatively high for

small files• MCAT interaction – query and registration• Small buffer transfer

• Large files – Single Hop, multiple data streams• Small files – Single Hop, multiple files at a time

Upload a File to a SRB Resource

• Sput –S unix-sdsc localFile srbFile– Default data transfer mode – serial

• Sls -l srbFile– srb 0 unix-sdsc 2764364 2004-08-21-18.19 % srbFile

Small files Data Transfer (Bulk operation)

• Upload/download large number of small files– One file at a time – relative high overhead

• MCAT interaction, Small buffer transfer

• <= 0.5 sec/file for LAN, > 1 sec/files for WAN

• Bulk Operation– Bulk data transfer

• transfer multiple files in a single large buffer (8 Mb)

– Bulk Registration• Register large number of files (1,000) in a single call

– Multiple threads for transfer and registration

– Single Hop

– 3-10 times speedup

– All or nothing type operation

– Specify -b in Sput/Sget

Parallel Mode Data Transfer

• For large file transfer– multiple data streams – Single hop data transfer

• Two sub-modes – Server initiated – Client initiated (for clients behind firewall)

• Up to 5 times speed up for WAN• Two simple API – srbObjPut and srbObjGet• Use –m (Server initiated), -M (Client initiated) options• Available to all Scommands involving data transfer

– As an option – Sput, Sget, Srsync– Automatic – Sreplicate, Scp, Sbkupsrb, SsyncD, Ssyncont