Data Analysis System of HSC HSC-ANA
description
Transcript of Data Analysis System of HSC HSC-ANA
1/29/20081/29/2008 SAC Workshop - HSCSAC Workshop - HSC
Data Analysis System of HSCData Analysis System of HSC
HSC-ANAHSC-ANA
Hisanori Furusawa
Subaru Telescope, NAOJfor HSC development team
ContentsContents
1. Concept and Goals for HSC-ANA
2. Our approaches for development of HSC-ANA
3. Plan for Development
1.1. HSC-ANA HSC-ANA Concept and GoalsConcept and Goals
HSC Data RateHSC Data RateSuprime-Cam Data Rate = 160MB/shot
HSC2deg=176CCD=2.8GB/shot
Data Format is TBD
Difficulty in Quality ControlDifficulty in Quality ControlEven for the current observation, quality control is difficult
Problems:Difficult to make their observing plan to trace their analysis to evaluate their results to correct or update their results
Objective Automated Data analysisSystem
Many trials and errors by human (possibly subjective) interactions
?
Data Analysis Scientific ResultsFor more massive HSC data
HSC-ANA GoalsHSC-ANA Goals1. Maximizing Science Outputs
Provide calibrated data guaranteed Immediate release of best-effort catalogs upon users’
requests at any times
2. Quality Control of Key Survey Data Achieves uniform data quality in long-term survey programs On-site quality assurance tool (seeing, transparency etc) into
Database/FITS headers Traceable analysis by appropriate Database
3. Useful Archive Data Frame selection by quality information attached to data frames
Framework Middleware
HSC-ANA Team in JapanHSC-ANA Team in Japan
H. Aihara, T. Uchida (U-Tokyo)
H. Furusawa, S. Miyazaki, Y. Komiyama (NAOJ)
M. Tanaka, Y. Yasu, S. Yamagata, R. Itoh, S., N. Katayama (KEK - high energy accelerator research organization)
2. Our Approaches 2. Our Approaches
HSC-ANA FrameworkHSC-ANA Framework
or STARS
Command flow
Data flow (image, status, catalog)
HSC archive (or OBCP)
Input : Data frames Analysis configurations
Output: Resutls (Images, Catalogs plots etc), Status
Data RetrieverControl
Analysis
User Interface
DBHouse KeepingWatchdog
Scalable
Master
Gigabit ether?
Operatorsor
Users
System ComponentsSystem Components
1. Analysis Part• Analysis programs• Error Handling Robust system
2. Quality Assurance Tool (QA)• Extracting and registering quality information
3. Framework• Interfacing analysis tasks• Database, Process distribution
4. Other Components• Data retrieval mechanism from archive• U/Is• Watchdog
Component 1Component 1Analysis Part – main pipelinesAnalysis Part – main pipelines
• 1st –stage analysis– 1st a: Data reduction, removing instrumental cha
racteristics (bias sub. sky subtraction)– 1st b: Mosaicking
• 2nd-stage analysis– Photometric/Astrometric Calibrations– Object extraction and Catalog Creation
Developing analysis tasks by dividing the entire procedure
Pipeline1st a (CCD-by-CCD)・ OverScan・ FlatMaking・ FlatFielding・ Distortion correction・ PSF match・ Catalog for each chip
Pipeline1st b (Pointing-by-Pointing)・ Mosaicing・ Sky sub・ Stacking・ Deep Catalog
Pipeline 1st stageIn simulation server
• Preprocessing• Distributing CCD data• Database registration
To exploit the capacity of computing system and to reduce the bottleneck of pipelines
Distribute data & processes
Component 2. Component 2. Quality Assurance (QA) ToolQuality Assurance (QA) Tool
• Obtains quality information for each data frame on a semi-real-time basis DB and FITS header
Transparency Skyprobe @ CFHT
Seeing @ UKIRT
Achieved S/NQuick Coadded Images
Quick-lookObserving Plan
- Uniform survey data- Service/Queue obs.- Time variation analysis
Component 3. FrameworkComponent 3. Framework
• Provides interfacing for analysis commands, constructing, executing, and monitoring pipelines
• Communicates with Databases• Downloads requested data frames• Distributes processes• Provides User Interfaces to each
system component
Organizes the whole system
3. Our Developing Plan3. Our Developing Plan
Prototyping :Prototyping :Zero-version HSC-ANA SystemZero-version HSC-ANA System
1. Implements a minimal pipeline for S-Cam data 1a, 1b, and 2nd –stage analysis + on-site QA tools provided to observers
2. Figures out bottlenecks & potential problems Large inputs of archive Suprime-Cam data Science = critical evaluation
3. Evaluates framework middleware R&D Chain Management (RCM), BASF+ROOT (@KEK)
Evaluation through autumn in 2008 Development of full HSC-ANA system
RCM Pipelines ManagementRCM Pipelines ManagementFramework of Analysis Pipelines
httpsXML-U/I
Record Histories and Search
XML-DB
Annotation Command Log
DB requestPipelineSearch Targets
Database 、 Backup/Restore
Workflow
Load Distribution
normal
busy
medium
Diskfull
analysis servers
Optimalassignments file
servers
Controlserver
prioritypriority
Login To RCM SystemLogin To RCM System
Data accessibility can be controlled by Account and groups
Workflow of HSC-ANAWorkflow of HSC-ANA
Overscan Subtraction
Coadding
Pipeline ExecutionPipeline Execution
Keyword list recorded in DB
Configuration of parameters
Frame Searchby keywords
Viewing ResultsViewing Results
XML-based Summary Viewer - hierarchical structure
Thumbnails of processed images
DS9 is invoked if needed
Some evaluation results and statistical information for processed data will be added
Retry or proceed
Prototyping Quality Assurance ToolPrototyping Quality Assurance Tools for Suprime-Cams for Suprime-Cam
• Application to the Suprime-Cam observations
• As a part of the HSC-ANA (2nd-stage analysis and QA)
• Important project to the observatory a project on a high priority (as a SS)
Functions in SC QAFunctions in SC QA
A) Quick-look & Quality-check Assistance
B) Observation Planning
C) Quality Assessment for Long-term Data
D) Quality Control for Survey Project Data
A. Quick-Look & Quality-Check A. Quick-Look & Quality-Check AssistanceAssistance
Overview of
QL&QC Assist
1. Quick Reduction
1. Bias Sub2. (Flatfielding)3. Coarse Astrometry
2 . Statistics1. Seeing2. Focusing3. Read noise level4. Background level
3 . Photometry & Transmission
1. Standard / ref. stars2. Relative transmission
4 . Depth (limiting mag)1. Image co-adding2. Noise statistics
QA ServersOBCP (DAQ)
DB
Suprime-Cam
5. Observing LogsANA
A-LAN machine
RoadmapRoadmap2008 2009
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5
Prototyping HSC-ANA
1st-stage analysis
2nd –stage analysis
Evaluation with massive data input
On-site quality assurance toolPrototype implementation
Evaluation with real data
Full HSC-ANA system Development
SummarySummary1. For massive HSC data, objective automated
data analysis system is needed2. Conceptual designing of the HSC-ANA is
underway3. Prototyping of the HSC-ANA based on the RCM
middleware is ongoing, evaluated this yearQA system developed and tested this year
4. Inputs from observers and the science community.Consultation/collaboration with experienced domestic and international groups.
Thank you.Thank you.
RCM Workflow (=pipelines)RCM Workflow (=pipelines)
User Login
Pipeline 1a
Entry in the Data base
Pipeline 1b
Pipeline 2
Search / Monitor / Check
Task (chip-by-chip, shell-by-shell etc)Parallel processing can be doneIn the RCM frame work
Algorithm TBDParallel processing architecture willbe discussed and developed.
Calibrations and Catalog MakingUnder developing
Move on
Move on
Move on
A. Quick-Look & Quality-Check AssistA. Quick-Look & Quality-Check Assist
1. Quick Reduction of data frames (FITS validation, Bias sub, Flatfielding, Med-precision a
strometry for photometric, stacking analysis)
2. Statistical values injested to Database (Seeing FWHM, Focusing, Noise level in overscan, Sky b
ackground level)
Statistical values1. Seeing2. Focusing3. Read noise level4. Background level
Automated focusingDB
Quality check and assessmentParameters in the next stages.
A. Quick Look & Quality Check AssistA. Quick Look & Quality Check Assist3. Photometry & Relative Sky Transmission
• Photometric analysis of standard stars Registered in Database
• Relative photometry among frames during the night
Shot 1 Shot 2 Shot 3A
tten
uatio
n (m
ag)
Shot 1 Shot 2 Shot 3Zeropointsif available
QA Database
Trace Same Objects Time-to-time variation
A. Quick Look & Quality Check AssistA. Quick Look & Quality Check Assist4. Estimating Limiting Magnitude
• A particular area of images are co-added which meets users’ query to estimate attained depth until that time
• Suggests necessary exposure times and observing plan to achieve the target depth
DB
1. Filters 5 . Coods, fields2. Seeing 6. Magnitudes3. Transmittance4. Background level
User Input
MosaicStacking
Sky noise stat.Limiting mag.
Target: 26.2magS/N: 3.0 --------------------------
Now: 25.5magS/N: 3.0 --------------------------Exptime to be done: 2500 sec
Recommended Plan: 630sec x 4shots
Output to on-site users
Query
AnalysisServer
A. Quick-Look & Quality-Check AssistA. Quick-Look & Quality-Check Assist
5 . Observing Log• Obtains information on the data frames from
FITS header and other environmental status• Add users’ comments and store the logs in
database for each frame or shot.
HST NAME EXP-ID OBJECT FILTER01 EXPTIME …... SKY SEEING TRANSP RONOISE WEATHER05:57:58 object000 SUPE00555290 DOMEFLAT W-J-B 10.0 12142 N/A N/A 10.5 Clear18:43:52 bias000 SUPE00555300 BIAS W-J-B 0.0 0 N/A N/A 11.0 Clear19:52:14 object001 SUPE00555310 SA107 W-J-B 5.0 8205 N/A 1.00 10.2 Clear20:00:05 object002 SUPE00999980 SXDS_1 W-J-B 900.0 4873 0.75 0.97 10.8 Clear20:17:10 object003 SUPE00999990 SXDS_1 W-J-B 900.0 4911 0.73 0.95 10.5 Clear
User Submit DB
QA servers
Analysis pipelinesObs. Planning, Quality assessmentSurvey data quality control
QA will addAlready being provided
B. Observation PlanningB. Observation Planning
Procedure of Observation Planning
Quality, Depth CheckTarget Depth, Filter, Field
Generate Obs. Plan Editing by observers
Generate Obs. Proc Script (OPE)
B. Observation PlanningB. Observation Planning1. Assists planning observations based on the sky tra
nsmission, limiting mag achieved, target visibility etc
Target: 26.2magS/N: 3.0 --------------------------
Now: 25.5magS/N: 3.0 --------------------------
Exptime to be done: 2500 sec
Recommended Plan: 630sec x 4shots
HST OBJECT FILTER (AZ,EL) ------------------------------------------------------------------------------23:54 STD:PG1633 in W-S-Z+ 3(sec) x 1(shot) (-70, 31) - 2 min23:56 ==>W-J-B (-70, 46) - 5 min24:01 SXDS_1 in W-J-B 630(sec) x 4(shot) (+15, 55) - 46 min24:47 Slew - 2 min24:49 SXDS_2 in W-J-B 600(sec) x 5(shot) (+82, 48) - 55 min------------------------------------------------------------------------------
1. Check achieved depth etc
2. Generate observing plan
B. Observation PlanningB. Observation Planning2. Generate Observing Procedure Scripts based on
the observing plan and users’ inputs
23:54 STD:PG1633 in W-S-Z+ 3(sec) x 1(shot) - 2 min23:56 ==>W-J-B - 5 min24:01 SXDS_1 in W-J-B 630(sec) x 4(shot) - 46 min24:47 Slew - 2 min24:49 SXDS_2 in W-J-B 600(sec) x 5(shot) - 55 min
2. Inputs and editing by observers
1. Automatic genaration of obsplan
3. OCS Proc Script
Submit
Submit
Co-working with science Co-working with science communitycommunity
Analysis Procedure Linked To Science Objectives Output data format, Catalog format
Acceptable uncertainties in photometry, astrometry,& object parameters
Science Objectives↓
Survey Design
Science Community
Target Data Products↓
Analysis ProcedureAlgorithms
HSC-ANA development
Satisfactory Result
Development Hardware EnvironmentDevelopment Hardware Environment
Simulation server
1st Setup
CNT server WEB server
DB server
Simulation server
WEB serverCNT serverDB server
2nd Setup
CPU:intel Xeon dual core 1.8GHz x 2 MEM:2GB, HD:500GB
CPU:intel Xeon quad core 2.6GHzMEM:2GB, HD:500GB
CPU: amd dual core Opteron 2.8GHz x 2MEM:16GB, HD:250GB
KEK/Hilo KEK
Wide-field Imager – Suprime-CamWide-field Imager – Suprime-Cam
• 10 CCDs : MIT/LL 2,048 x 4,096
• Rate = 160MB/shot Good for test data
• FoV = 34‘ x 27’
8.2m Subaru Telescope
Strong Capability of Wide-Field & Deep Imaging AΩ=13.17 e.g., Megacam(9.59), SDSS(22.99), HSC(162)
Standard Reduction ProcedureStandard Reduction Procedure
1. Subtraction of bias (based on overscan region)
2. Making flat frames (objects, domeflat, twilight)
3. Flatfielding
4. Masking or removing Cosmic rays, Bad pixels
7. Sky background subtraction
6. Equalization of PSF among frames
5. Distortion correction based on a formula
Pattern Matching
Determination of• Offset (dX, dY, dtheta)• Flux scaling
CoAdd, Stacking
For each chip
Mosaicking
1. Well works for most extragalacic objects
2. Not optimized to a large data input or very wide field surveys
3. Critical parts are handled by users (frame selection, result check, calibration)
Error HandlingError Handling
task1 check1
task2
check2
Control Server
An
aly
sis
Pip
elin
e
task3
check3
Database
Synchronous check
Un-synchronous check
Maintain analysis historiesRetry
U/I
invoke
pro
cess
es
Alert notification
Failure
Components and StatusComponents and Status
1st-a analysis △ ~○ - Dedicated tools
1st-b analysis △ - Discussin Tools and algorithms
2nd-stage analysis × ~△ - Implementing for the prototyping system on priority- Catalog format under discussion
On-site QA △ - Under development on priority
Database structure △ - Discussing meta-data formats
Error handling × - To be implemented
Framework △ - Evaluating a middleware
U/Is △ - Provided by the framework
- Quick look TBD
Archive and retrieval × ~△ - To be consulted with sys admin
Distributed ProcessingDistributed ProcessingTo exploit the capacity of computing system, reducing the bottleneck of pipelines
Software LayersSoftware Layers• Pursuing a possibility of sharing technologies
between upcoming big projects in NAOJ and KEK
RCM
LSF/NQS
BASF dBASF
Goals with Quality Assurance (QA)Goals with Quality Assurance (QA)
1. Assists observers to quick-look dataReasonable quality evaluation of each data on a semi-real-time and automatic basis
2. Outputs results available to observers for observing planning (# of shots, exptime, bands)
3. Searches for and retrieves necessary archived images and meta-data connecting to DatabasePerforms quality assessment (zeropoint, flatfield)
4. Provides quality-controlled data products in the long-lasting surveys (S/N per pointing, filters)
Controls Quality of Data and Makes observations more efficient
A. Quick Look & Quality Check AssistA. Quick Look & Quality Check Assist
• Assist quality checking by interactive operations with FITS viewers (Zview, ds9)
1.Photometry/Transmit.2.Stacking3.Achieved depth
C. Quality Assessment C. Quality Assessment for Long-term Datasetfor Long-term Dataset
• Inspects time-time variation of characteristics or particular parameters of existing data
• Monitors the system health and secures uniform quality of data products
DB
QA ServersANA
Sends query, Requests assessment
ResultsStores quality
meta-data for old data frames
C. Quality Assessment C. Quality Assessment
• Target Analyses ( TBD)
1. Search and retrieval for particular archived data
2. Time-to-time variation of- Flat patterns- Readout noise- System throughputs and response functions
3. Obtain stacked images and achieved depths for particular range of frames
D.D. Survey Quality ControlSurvey Quality Control• Coworking with QC and Observation Planning, Maint
ains achievements in a survey project for multiple pointings, filters etc
• Provides efficient operations of gigantic programs and service/queue observations.
DB
QA Servers
Summary Output of Achievement
Survey Targets1. Field2. Filter3. Depth4. Area
Estimated Exposures to be done
Input to Observation Planning
D.D. Survey Quality ControlSurvey Quality Control• An example for the outputs from the QA system
– A achievement summary of a certain virtual survey.
DB
QA サーバ
Target example1. SXDS2. W-J-B3. 28.3magAB(3sigma)4. 5 FOVs