John Cunniffe Dunsink Observatory Dublin Institute for Advanced Studies Evert Meurs (Dunsink...

John CunniffeDunsink Observatory

Dublin Institute for Advanced Studies

Evert Meurs (Dunsink Observatory)

Aaron Golden (NUI Galway)

Aus VO 18/11/03

Efficient X-ray Data Mining

2

Once you make doing science with your VO service easy,

everyone will want to use your server.

Analagous to oversubscribed observatory time

- how do users successfully ‘compete’ for query timeQuery modelling in a proposal?

Need data simulators/previewers to run query on.and/or data subset for test run.

3

Future X-ray missionsCurrent Missions - XMM/Chandra/RXTE - download data (typ. few GB/pointing) - processed on local machine

XEUS, Constellation-X, Astro-E2, etc

-very large data sets (few 100GB/pointing)-online data processing

proposed framework involves users submitting web based requests for processing pipelines

-derived data products very importantsource catalogues

images, spectra, lightcurves, etc

4

Efficient X-ray Data MiningEfficient -

Don’t want to reprocess the data archive unless really needed– maximise use of metadata

X-ray - Data processing pipelines more complicated (than e.g. optical)Treatment of faint sources/sky background statistically complexInstrument response complex

(not exclusive to X-ray)

Data Mining -Interested in the sources found in the data but also in the context (i.e. why we found them in that selection)

Not simply interested in finding objects through cone searches and stopping there.

5

Science Use CaseInterested in variable/transient X-ray objects

short-term: e.g. flare stars (~1 dataset) long-term: e.g. variability of normal/active galaxies (multi-dataset)

Current approach:• use http-get scripts to Heasarc - create cross-correlated source cats.• where known objects are not present in a catalogue

– retrieve original dataset & calculate upper flux limit (Expensive) N.B. if source catalogue was generated from the whole data archive then we may need to re-analyse a significant fraction of it.

To understand space density/flaring rate/etc of populations in the catalogues we need to know the volume of space covered by archive:

area coverage (RA, dec) temporal coverage (t1,t2,...,ti)

spectral (Energy) flux limit

6

ROSAT All-Sky SurveyDuration: 1990 June - 1991 Jan

E = 0.1 - 2.4 keV

RASS-BSC (Bright Source Catalogue)

RASS-FSC (Faint Source Catalogue)

Selection Criteria BSC FSC

Count Rate > 0.05/sec BSC

Probability (MaxLik) 15 (~5) 7 (~3)

N(photons) 15 6

Accepted Sources 18,811 105,924

NB: Catalogues have non-uniform sky coverage &

sensitivity.

7

Regions with different sensitivity included in

same source catalogues.

c.f. XMM-Serendipitous Source Cat

(created from pointed mode observations with different exposure

times & instrument modes)

Need a good coverage/sensitivity model of the data archive to understand volume of space

contained in source catalogue.66 binned image of RASS data set

Survey depth

8

Model Method 1: Upper Limit predictor

Combine:Instument model (ARFs, PSFs, modes, ...)Exposure time .... (0-30ksec)

NH information, .…

… source spectral model, ....

create a high resolution flux limit map of the RASS sky …. ….. in progress.

9

Model Method 2: Upper limit flux tabulation

Reprocess the data archive and determine the upper limit statistics from the photon data directly

… combine with ….

NH information, .…

… source spectral model, ....

create a high resolution flux limit map of the RASS sky …. ….. in progress.

10

Results in a sensitivity map of the RASS sky- adds usefulness to the source catalogue

Doing this with RASS is straightforward (though not quick) as the total data archive is a few 10s of GB.

Doing it for future observatories will have to be done on the archive curator’s server

11

The role of Archive/Source Catalogue Metadata

Data

Archive

Source

Catalogue

How should contents (not parameters) of a source catalogue best be described in the metadata?

- why are the sources in it - in it?

- describe the selection criteria

X-ray photon lists/ancilliary instrument data

Computationally expensive to reprocess

Selection

Criteria

12

Flux limit maps, limiting magnitude calculators,

observation simulators …..

VO Data Model?“These are an integral part of the sensitivity/coverage description”

Enhance the metadata (face larger metadata)

Theory?“This is really telescope simulation”

Build separate model/simulator

13

Other wavebandsSimilar challenges otherwavebands.

Complex coverage andsensitivity descriptionsplus catalogue selection criteria.

How many brown dwarves are there?

In general, how much data description should go in the metadata and how much should be left in secondary resources?

14

Final Questions.How big (Kbytes) should data archive metadata be?

– Should it include preview data (e.g. ‘large’ FITS files)?

– Should selection criteria be described in the metadata(or simply a reference to the original publication)

– Provide partially reduced or preview data as externally held addendum to the metadata?

• Much bigger than standard metadata• Much smaller than whole archive

– What other tools are needed to allow astronomers to • assess usefullness of,• justify to Time Allocation Committees

large proposals/queries in a VO context?

John Cunniffe Dunsink Observatory Dublin Institute for Advanced Studies Evert Meurs (Dunsink...

Documents

Transcript of John Cunniffe Dunsink Observatory Dublin Institute for Advanced Studies Evert Meurs (Dunsink...