Spectroscopic Light Sources 1. Continuum Sources 2. Line Sources 3. Quasi-continuum Sources.
2013 06-21-computing-for-light-sources
-
Upload
rachana-ananthakrishnan -
Category
Technology
-
view
152 -
download
0
description
Transcript of 2013 06-21-computing-for-light-sources
globus online
Globus Online for Managing Tomography Data at APS
Rachana AnanthakrishnanFrancesco De Carlo
Argonne National Lab
We started with reliable, secure, high-performance file transfer …
DataSource
DataDestinatio
n
User initiates transfer request
1
Globus Online moves and syncs files
2
Globus Online notifies user
3
… and then made it simple to share big data off existing storage systems
DataSource
User A selects file(s) to share, selects user or group, and sets permissions
1
Globus Online tracks shared files; no need to move files to cloud storage!
2
User B logs in to Globus
Online and accesses
shared file
3
Transforming data acquisitionCurrent
• Experimental parameters optimized manually
• Collected data combined with visual inspection to confirm optimal condition
• Data reconstructed and sent to users via external drive
• User team starts data reduction at home institution
Transforming data acquisitionEnvisaged
• Experimental parameters optimized automatically
• Collected data available to optimization programs
• Data are automatically reconstructed, reduced, and shared with local and remote participants
• User team leaves the APS with reduced data
Current• Experimental parameters
optimized manually• Collected data combined
with visual inspection to confirm optimal condition
• Data reconstructed and sent to users via external drive
• User team starts data reduction at home institution
Facility data acquisition
Globus Online as enabler
Globus Online transfer service
Reduced data
Analysis/SharingGlobus
Online sharing service
Globus Online dataset service*
* In development
7Credit: Kerstin Kleese-van Dam
Erin Miller (PNNL) collects data at Advanced Photon Source, renders at PNNL, and views at ANL
Looking at how researchers use data
• A single research question often requires the integration of many data elements, that are:– In different locations– In different formats (Excel, text, CDF, HDF, …)– Described in different ways
• Best grouping can vary during investigation– Longitudinal, vertical, cross-cutting
• But always needs to be operated on as a unit– Share, annotate, process, copy, archive, …
How do we manage data today?
• Often, a curious mix of ad hoc methods– Organize in directories using file and directory
naming conventions– Capture status in README files, spreadsheets,
notebooks– Even PowerPoint!
• Time-consuming, complex, error prone
Why can’t we manage our data like we manage our pictures and music?
Introducing the dataset• Group data based on use, not location
– Logical grouping to organize, reorganize, search, and describe usage
• Tag with characteristics that reflect content …– Capture as much existing information as we can
• …or to reflect current status in investigation– Stage of processing, provenance, validation, ..
• Share data sets for collaboration– Control access to data and metadata
• Operate on datasets as units– Copy, export, analyze, tag, archive, …
Expanding Globus Online services
• Ingest and publication– Imagine a DropBox that not only
replicates, but also extracts metadata, catalogs, converts
• Cataloging– Virtual views of data based on user-
defined and/or automatically extracted metadata
• Integration with computation– Associate computational procedures,
orchestrate application, catalog results, record provenance
Builds on catalog as a serviceApproach
• Hosted user-defined catalogs
• Based on tag model<subject, name, value>
• Optional schema constraints
• Integrated with other Globus services
Three REST APIs/query/• Retrieve subjects/tags/• Create, delete,
retrieve tags/tagdef/• Create, delete,
retrieve tag definitions
Builds on USC Tagfiler project (C. Kesselman et al.)
Exemplar: APS Beamlines 32-ID & 2-BM
X-Ray imaging, tomography, ~few µm to 30 nm resolution
Currently can generate up to 100 TB per day
< 1GB/s data rate; ~3-5GB/s in 5-10 years
14
StorageImage processing
(normalization, etc.)
Tomographic reconstruction
Visual inspection
Selection
Beamline 2-BM~1.5um resolution
Beamline 32-ID-C20-50 nm resolution
Image processing (alignment, etc.)
Tomographic reconstruction
Visual inspection
Selection
Selection Multi-scale image fusion
Visual inspection
Up to 100 fps2K x 2K, 16 bits11 GB raw data
1,500 fps2K x 2K, 16 bits1 min readout
11 GB raw data
Multi-scale 3D imaging data fusion at APS
15
APS Imaging Group
APS Software Service Group
Mathematics & Computer Science/Computation Institute
Multi-scale image fusion
Infrastructure LDRD
System integration
Instrument & Data Collection
Data Management Services
Mathematics & Computer Science
Results:Google earth style
zoom in data navigation
Tao of Fusion LDRD
Argonne Collaborations
Timelines• July: – Alpha service available
• August:– Pilot with two groups at APS
• Fall of this year:– Pilot with few other groups at APS– Early beta
Thank You
• Interested in working with us on dataset service:– Email: [email protected]
• Contact: [email protected]• Website: www.globusonline.org