Adapting federated cyberinfrastructure for shared data collection facilities in structural biology
-
Upload
ian-stokes-rees -
Category
Technology
-
view
749 -
download
0
description
Transcript of Adapting federated cyberinfrastructure for shared data collection facilities in structural biology
Adapting federated cyberinfrastructure for shared data collection
facilities in structural biology
GlobusWorld 2012
Ian Stokes-‐Rees [email protected] @ijstokesHarvard Medical School
J. Synchrotron Rad. (2012) 19doi:10.1107/S0909049512009776
Online 6 April 2012
[email protected] Overview - Ian Stokes-Rees
Structural Biology:Study of Protein Structure and Function
1mm
10nm
400m
• Shared scientiRic data collection facility• Data intensive (10-‐100 GB/day)
[email protected] Overview - Ian Stokes-Rees
Cryo Electron Microscopy
• Previously, 1-10,000 images, managed by hand• Now, robotic systems collect millions of hi-res images• estimate 250,000 CPU-hours to reconstruct model• 20-50 TB of data in most extreme case
[email protected] Overview - Ian Stokes-Rees
Molecular Dynamics Simulations
1 fs time step1ns snapshot1 us simulation1e6 steps1000 frames10 MB / frame10 GB / sim20 CPU-years3 months (wall-clock)
[email protected] Overview - Ian Stokes-Rees
SBGrid Consortium
Rice UniversityE. NikonowiczY. ShamooY.J. Tao
CalTechP. BjorkmanW. ClemonsG. JensenD. Rees
StanfordA. BrungerK. GarciaT. Jardetzky
UCSFJJ MirandaY. Cheng
UC DavisH. Stahlberg
UCSDT. NakagawaH. Viadiu
WesternUM. Swairjo
U. WashingtonT. Gonen
Washington U. School of Med.T. EllenbergerD. Fremont
VanderbiltCenter for Structural Biology
Rosalind FranklinD. Harrison
A. LeschzinerK. MillerA. RaoT. RapoportM. SamsoP. SlizT. SpringerG. VerdineG. WagnerL. WalenskyS.WalkerT.WalzJ. WangS. Wong
N. Beglova S. BlacklowB. ChenJ. ChouJ. ClardyM. EckB. FurieR. GaudetM. GrantS.C. Harrison J. HogleD. JeruzalmiD. KahneT. Kirchhausen
Harvard and Affiliates
NE-CATR. OswaldC. ParrishH. Sondermann
R. CerioneB. CraneS. EalickM. JinA. Ke
Cornell U.
Brandeis U.N. Grigorieff
Tufts U.K. Heldwein
UMass MedicalW. Royer
NIHM. Mayer
U. MarylandE. Toth
K. ReinischJ. SchlessingerF. SigworthF. Zhou
T. BoggonD. BraddockY. HaE. Lolis
Yale U.
C. SandersB. SpillerM. Stone
M. Waterman
W. ChazinB. EichmanM. EgliB. LacyM. Ohi
Columbia U.Q. Fan
Rockefeller U.R. MacKinnon
Thomas JeffersonJ. Williams
Not Pictured: University of Toronto: L. Howell, E. Pai, F. Sicheri; NHRI (Taiwan): G. Liou; Trinity College, Dublin: Amir Khan
[email protected] Overview - Ian Stokes-Rees
Boston Life Sciences Hub
• Biomedical researchers• Government agencies• Life sciences• Universities• Hospitals
Tufts
University
School of
Medicin
e
Data Access
Globus Online: High Performance Reliable 3rd Party File Transfer
portal
cluster
desktop laptop
lab fileserver
data collectionfacility
GUMSDN to user mapping
VOMSVO membership
CertiRicate Authorityroot of trust
Globus OnlineRile transfer service
datacollec(on
6+monthstaging+storage
10+TB+per+group+permanent+archive
/stage/sliz
/stage/murphy
/data/murphy
/data/deacon
Sliz+lab
~andy~sue
~sco?
NEBCAT+beamline+at+APS
SBGrid'Science'PortalGlobus'Online
Harvard "Sco?"+fromthe+Sliz+lab
VOMS
/data/sliz
/embarg/2010/
/embarg/2011/
/public/2009/
10+PBpublic+archive
general+public
WWW
Tier'1 Tier'2 Tier'3
Local accounts withinlab infrastructure
Shared (lab level) accounts at facility
User can directly access lab or facility data from laptop
Public access available to archived data through web interface
Embargo policy to hold deposited data for agreed time
Tiered storage
VO management
Architecture✦ SBGrid
• manages all user account creation and credential mgmt• hosts MyProxy, VOMS, GridFTP, and user interfaces
✦ Facility• knows about lab groups
• e.g. “Harrison”, “Sliz”
• delegates knowledge of group membership to SBGrid VOMS• facility can poll VOMS for list of current members
• uses X.509 for user identiRication• deploys GridFTP server
✦ Lab group• designates group manager that adds/removes individuals• deploys GridFTP server or Globus Connect client
✦ Individual• username/password to access facility and lab storage• Globus Connect for personal GridFTP server to laptop• Globus Online web interface to “drive” transfers
Objective
✦ Easy to use high performance data mgmt environment
✦ Fast and reliable Rile transfer• facility-‐to-‐lab, facility-‐to-‐individual, lab-‐to-‐individual
✦ Reduced administrative overhead✦ Better data curation✦ Release data to public after embargo period
Challenges
✦ Access control• visibility• policies
✦ Provenance• data origin• history
✦ Meta-‐data• attributes• searching
Grid Computing Today
Capabilities
✦ Server-‐to-‐server interaction✦ Federated aggregation of CPU power✦ Predictable data patterns✦ Command-‐line access from specialized servers✦ Web-‐based access through “Science Gateways”• pre-‐deRined computing workRlows• e.g. SBGrid Science Portal
Weaknesses
✦ Federated user identity system• X.509 digital certiRicates and associated apparatus
✦ Ease of use for end users✦ Data management tools✦ Support for collaborations
Opportunities
✦ “Last mile” challenge• to the desktop• to the laptop
✦ UniRied identity management• centralized set of credentials for each person
✦ Empower collaborations to self-‐manage✦ Shift of focus from “compute” to “data”• for users• for facilities where data is the main challenge
User Credentials
✦ Analogy to a passport:• Application form• Sponsor’s attestation• Consular services
• veriRication of application, sponsor, and accompanying identiRication and eligibility documents
• Passport issuing ofRice
✦ Portable, digital passport• Rixed and secure user identiRiers
• name, email, home institution• signed by widely trusted issuer• time limited• ISO standard
X.509 Digital CertiRicates
Addressing CertiRicate ProblemsU1U1U1
!"#$"%&'%()*"+',"!&'
-.'.)"*&'/.' 012*%2!' 3%"!'
!"&$!*'&!4,5(*)'*$67"!''
*289:'4)"*&%'
!";("<'!"#$"%&'
;"!(9:'$%"!'"=()(7(=(&:'
,2*>!6'"=()(7(=(&:'
411!2;"',"!&'
*289:'4;4(=47(=(&:'
)"*"!4&"',"!&'5":'14(!'
%()*',"!&'
!"&!(";"',"!&'
"?12!&'%()*"+',"!&'5":'14(!'
U2a
U1
S1
R1
R2
time
!"#$%&'(#!")*# *+,(-,.# /-0.#
(,123#4%&'(#
5,(6.'8'9'7':3#
4++.,;0#&0&90.-<'+=#8.,>+-=#4(%#.,70-#
(,123#
;0.'23#>-0.#07'8'9'7':3#
+.0-0(:#50.:#:,#.0?>0-:#&0&90.-<'+#93#@A#
4%%#@A#:,#!")*#
.0?>0-:#!")*#$B#
.0:>.(#!")*#$B# 4%%#$B#:,#+.,C3#50.:#
.0?>0-:#!"#8.,>+-#4(%#.,70-# U2b
S2
V1
V2
time
VO (Group) Membership Registration
!"#$%&'()' *+",-"#' .-/#'
0/#123'/&14151&1$3'
6",7#8'/&14151&1$3'
4/,/#%$/'6/#$'9/3'+%1#'
-/$'#/$#1/0%&'-/#1%&',:85/#'
;<=*'
#/>:/-$'+"#$%&'%66":,$'
0/#176%?",'/8%1&'-/,$'
/8%1&'0/#17/@'
AB)!'
;<')@81,'
,"?23'%4/,$-'
6#/%$/'&"6%&'%66$'
-14,'6/#$'
!"#$%&'(%)*+%',,#""%-#.#$'/#.%$#"*!$,#"%
-/$';<'#14C$-'
,"?23'%0%1&%51&1$3'
+"#$%&'&"41,'
#/>:/-$'-14,/@'6/#$'
#/$:#,'$#%691,4',:85/#'
%++#"0/'6/#$'
%66":,$'#/%@3',"?76%?",'
#/>:/-$'-14,/@'6/#?76%$/'
#/$:#,'-14,/@'6/#?76%$/' D'E'+%1#'-14,/@'6/#$'1,$"'!F(*GDH'7&/'H'E'6#/%$/'&"6%&'+#"I3'6/#$'
=3!#"I3'
#/41-$/#'+#"I3'6/#$'J1$C'=3!#"I3'
!"#$%&'(%)*+%',,#""%+#0%1*$/'2%
U1
U2*
S1*A1a
A1btime
Process and Design Improvements✦ Single web-‐form application• includes e-‐mail veriRicationn
✦ Centralized and connected credential management• FreeIPA LDAP -‐ user directory and credential store• VOMS -‐ lab, institution, and collaboration afRiliations• MyProxy -‐ X.509 credential store
✦ Overlap administrative roles• system admin• registration agent for certiRicate authority (approve X.509
request)• VO administrator to register group afRiliations
✦ Automation
“Last Mile” andEase of Use
Grid computing at your desk✦ Platforms• Windows• OS X
✦ Web interfaces• SBGrid Science Portal
✦ One-‐stop-‐shop for account management• Centralized services• Automated processes• Administrator intervention• Support
SBGridSciencePortal
laptop
desktop lab fileserver
facility fileserver
Ryan, a postdoc in the Frank Lab at Columbia
Access NRAMM facilities securely and transfer data back to home institute
/data/columbia/frank
/nfs/data/rsmith
/Users/Ryan
Ryan applies for an account at the SBGrid Science Portal
automated X.509application
automated Globus Online application
veriRication of lab membership
request accessto NRAMMfacility
using credential held by SBGrid
check SBGrid for Ryan’s group membership
in Frank Lab, so grant access to Riles
use Globus Online to managetransfer from NRAMM back to lab
initiate transfer at NRAMM
transfer data to lab
notify user of completion
Architecture Diagrams
Service Architecture
DOEGrids CA@Lawrence Berkley Labs
UC San Diego
Apache
GridSite
Django
Sage Math
R-Studio
SBGrid Science Portal @ Harvard Medical School
MyProxy@NCSA, UIUC
Gratia Acct'ing@FermiLab
FreeIPA
LDAP
VOMS
GUMS
GACL
ID mgmt
glideinWMS factory Open Science Grid
fileserver
SQLDB
scp
GridFTP
data
SRM
WebDAV
cluster
Condor
Cycle Server
VDT
Globus
computation
data
computations
interfaces
User
shell CLI
GUMSGUMSGridFTP +
Hadoop
GlobusOnline@Argonne
glideinWMS
Monitoring@Indiana
Ganglia
Nagios
monitoring
RSV
pacct
[email protected] Overview - Ian Stokes-Rees
Acknowledgements✦ Piotr Sliz✦ SBGrid Science Portal• Daniel O’Donovan, Meghan Porter-‐Mahoney
✦ SBGrid System Administrators• Ian Levesque, Peter Doherty, Steve Jahl
✦ Facility Collaborators• Frank Murphy (NE-‐CAT/APS)• Ashley Deacon (JCSG/SLAC)
✦ Globus Online Team• Steve Tueke, Ian Foster, Rachana
Ananthakrishnan, Raj Kettimuthu
✦ Ruth Pordes• Director of OSG, for championing SBGrid