4/7/2013 Aristotelis Vasiliadis Athanasios Balafoutis Riza Fatih Mendilcioglu Ludovic Vanquin.
Globus as a platform for research data management · Globus as a platform for research data...
Transcript of Globus as a platform for research data management · Globus as a platform for research data...
![Page 1: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/1.jpg)
Globus as a platform for research data management
Vas VasiliadisUniversity of Chicago
Best Practices in Data InfrastructureMay 17, 2016
![Page 2: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/2.jpg)
Globus delivers…
Big data transfer, sharing,publication, and discovery…
…directly from your own storage systems…...via software-as-a-service
2
![Page 3: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/3.jpg)
Globus as SaaS
Researcher initiates transfer request; or requested automatically by script, science gateway
1
InstrumentCompute Facility
Globus transfers files reliably, securely
2
Globus controls access to shared
files on existing storage; no need
to move files to cloud storage!
4
Curator reviews and approves; data set
published on campus or other system
7
Researcher selects files to share, selects user or group,
and sets access permissions
3
Collaborator logs in to Globus and accesses shared files; no local
account required; download via Globus
5
Researcher assembles data set;
describes it using metadata (Dublin core and domain-
specific)
6
6
Peers, collaborators search and discover datasets; transfer and share using Globus
8
Publication Repository
Personal Computer
Transfer
Share
Publish
Discover
• SaaSWeb access; low operational costs
• Use storage system of your choice
• Access using your existing credentials
3
![Page 4: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/4.jpg)
Globus as bridging technology to…
• Supercomputing resources: NCSA, NERSC, XSEDE
• Campus HPC facilities• Clouds: Jetstream, AWS, Google• Instruments• Lab clusters, servers, laptops, etc.
4
![Page 5: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/5.jpg)
Scaling up analysis
Move datasets to campus HPC, supercomputer, national facility
Move results to (…)
![Page 6: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/6.jpg)
Bridging to instruments: APS
6Cou
rtesy
of F
ranc
esco
De
Car
lo, A
rgon
ne N
atio
nal L
abor
ator
y (2
016)
Dynamic imaging:>200TB per dataset
![Page 7: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/7.jpg)
APS DMagic
• Simple commands to automate the majority of beamline data management tasks
• Toolbox supports APS Imaging Group; can be easily adapted to any APS beamline
• Given an experiment date, retrieves users from APS scheduling system and automatically sends e-mail with link to the data
• Monitors a directory and copies any new files to a personal or remote server endpoint
• Data can be shared directly from the beamline machine or from a Globus server endpoint
7
![Page 8: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/8.jpg)
Data Distribution: NGS
EC2
![Page 9: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/9.jpg)
Ad Hoc Sharing: NIH
9
helix.nih.gov
![Page 10: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/10.jpg)
CC Storage
Globus Connect
Globus Publication Archivematica
Compute Canada Cloud
Regional Repository
Institutional Repository
MetadataMetadata
Index
Globus Connect
CC Storage
Globus Connect
CC Storage
Repositories: Compute Canada
National ResearchData Repository(Phase 1)
Courtesy of Todd Trann, Compute Canada, 2016
![Page 11: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/11.jpg)
NRDP Features
• Federated Storage Model: Storage and repositories distributed, and owned operated by organizations / institutions
• National Data Discovery: Single search to discover data, regardless of location
• Suitable for broad range of data types
• Archivematica: preservation packages
• Automatic geographic data replication11Adapted from Todd Trann, Compute Canada, 2016
![Page 12: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/12.jpg)
Globus serves as…
A platform for building science gateways, portals and other web applications in support of research and education
12
![Page 13: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/13.jpg)
Identity/Authentication, Group Management
…Globus Toolkit
Glo
bus
API
s
Glo
bus
Con
nectData Publication & Discovery
File Sharing
File Transfer & Replication
Globus as PaaS
13
Enable existing institutional ID systems to be used in external web applications
Integrate file transfer and sharing capabilities into scientific web apps, portals, gateways, etc.
![Page 14: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/14.jpg)
Data Archive: NCAR
![Page 15: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/15.jpg)
Serving a global community
• 17+ PB virtual processing
• 45,000+ custom orders, 4,000 users, 380 TB served in 2014 Courtesy of Thomas Cram, NCAR (2014)
Fully automated delivery via portal using Globus PaaS
![Page 16: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/16.jpg)
PaaS enabled automated workflow
• User logs in w/NCAR or other campus identity
• Selected dataset copied to staging area (shared endpoint)
• Read permission granted to user to access shared endpoint
• User receives email with link to access files
• ACLs deleted after five days
![Page 17: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/17.jpg)
Analysis portal: Sanger
17
![Page 18: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/18.jpg)
Compute Access: OSG
18
![Page 19: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/19.jpg)
Data “dropbox”: BBFC
Studios upload movies for rating• Authenticate to BBFC IdP; issued unique ID• Automatically provision “dropbox”, set ACLs• Auto activate shared endpoint using SSO• Initiate transfer
19
/distributor/paramount/32534
/distributor/wb/65346
![Page 20: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/20.jpg)
Globus today…
5major services
13national labs use Globus
160 PBtransferred
10,000+active endpoints
27 billion files processed
~450 active daily users
40,000registered users
99.9%uptime
50+institutional subscribers
1 PBlargest single
transfer to date
3 months longest
continuously managed transfer
130+federated
campus identities
![Page 21: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/21.jpg)
Thank you to our sponsors!
U . S . D E PA RT M E N T O F
ENERGY
21
![Page 22: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/22.jpg)
Users, usage continue steady growth…
0
500
1000
1500
2000
2500
3000
Num
ber o
f Use
rs
Active Users
![Page 23: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/23.jpg)
…but freemium gap is widening
0
500
1000
1500
2000
2500
3000
Num
ber o
f End
poin
ts
Free
Subscribed
Active Endpoints
![Page 24: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/24.jpg)
Globus Subscriptions• Globus Provider Plan
– Shared endpoints– Data publication– Peer-to-peer transfer/sharing– Management console– Usage reports– Priority support– Application integration
• Branded Web Site• Alternate Identity Provider (InCommon is standard)• Premium Storage Connectors (S3, HPSS, Spectra
Google Drive coming soon)
24
globus.org/provider-plans
![Page 25: Globus as a platform for research data management · Globus as a platform for research data management Vas Vasiliadis University of Chicago. vas@uchicago.edu. Best Practices in Data](https://reader033.fdocuments.us/reader033/viewer/2022042412/5f2c25afd72ea75f4612b1cd/html5/thumbnails/25.jpg)
We hope you will join us…
• Signup and transfer files: globus.org/login• Create endpoints: globus.org/globus-connect-
server• Documentation: docs.globus.org• Need help? support.globus.org• Subscribe to help us make Globus self-sustaining:
globus.org/provider-plans• Follow us: @globusonline
25