Organizing Data: A Challenge or just a Bureaucratic...
Transcript of Organizing Data: A Challenge or just a Bureaucratic...
Organizing Data:A Challenge or just a Bureaucratic Task?
João Luiz Kohl Moreira
Observatório Nacional - MCT
Oct-2012
João Luiz Kohl Moreira Organizing Data
Area of Interest
Software for astronomy
Astronomical Data Analysis Software and Systems(ADASS) - Conferences since 1991.
Is it science?
João Luiz Kohl Moreira Organizing Data
Area of Interest
Software for astronomy
Astronomical Data Analysis Software and Systems(ADASS) - Conferences since 1991.
Is it science?
João Luiz Kohl Moreira Organizing Data
What is science?
Popper:1 Collect data (observing)
2 Taxonomy
1 Organize data2 Associate pieces of data with each other
3 Model & Theory
João Luiz Kohl Moreira Organizing Data
What is science?
Popper:1 Collect data (observing)2 Taxonomy
1 Organize data2 Associate pieces of data with each other
3 Model & Theory
João Luiz Kohl Moreira Organizing Data
What is science?
Popper:1 Collect data (observing)2 Taxonomy
1 Organize data
2 Associate pieces of data with each other
3 Model & Theory
João Luiz Kohl Moreira Organizing Data
What is science?
Popper:1 Collect data (observing)2 Taxonomy
1 Organize data2 Associate pieces of data with each other
3 Model & Theory
João Luiz Kohl Moreira Organizing Data
What is science?
Popper:1 Collect data (observing)2 Taxonomy
1 Organize data2 Associate pieces of data with each other
3 Model & Theory
João Luiz Kohl Moreira Organizing Data
Myself
Doing science is to find out paradoxes
João Luiz Kohl Moreira Organizing Data
Where is it being done?
USANOAO: National Optical Astronomy Observatory -Tucson - Az
Manage KPO and CTIO andDevelop an integrated analysis facility: IRAF - ImageReduction and Analysis Facility
STScI: Space Telescope Science Institute
Operate Scientific Data from Hubble Telescope andDevelop an integrated analysis facility: STSDAS - SpaceTelescope Science Data Analysis System
Illinois, Princeton, Caltech, SAO ...
CANADA: CDCEUROPE
StrasbourgMunichUK
João Luiz Kohl Moreira Organizing Data
Virtual Observatory: Who does it?
IVOA: International Virtual Observatory Alliance: 19members
Brasil: BRAVO
João Luiz Kohl Moreira Organizing Data
Data Handle
Data:
Every day generated data in astronomy is measuredthrough units of terabytesMoore’s law for astronomy: data volume and datarate doubles each 1.5 yr
João Luiz Kohl Moreira Organizing Data
Astronomical Moore’s Law
João Luiz Kohl Moreira Organizing Data
Data Analysis
Who does it?What for?
Data mining
V.O.
João Luiz Kohl Moreira Organizing Data
V.O. Operation Schema
Raw data
RegistryCenters
CustomApps
V.O.
João Luiz Kohl Moreira Organizing Data
CosmoBook
CosmoBook
João Luiz Kohl Moreira Organizing Data
CosmoBook
CosmoBook
João Luiz Kohl Moreira Organizing Data
CosmoBook
CosmoBook
João Luiz Kohl Moreira Organizing Data
CosmoBook
CosmoBook
João Luiz Kohl Moreira Organizing Data
Some Facilities (to be) available on CosmoBook
1 2DPHOT (La Barbera etal, 2008)2 Solar V.O. (A. Valio, U. Mackenzie, SP)3 IRAF Work-flow (CEFET, Rio)4 GMA V.O. (F. Ferrari, FURG)
João Luiz Kohl Moreira Organizing Data
Social Network
João Luiz Kohl Moreira Organizing Data
2DPHOT
La Barbera, F. et al (2008) PASP 120, 681-702
João Luiz Kohl Moreira Organizing Data
2DPHOT
João Luiz Kohl Moreira Organizing Data
Data administration
“Astronomer” data −→ DATABASE −→ Custom
Astronomers organize data according to their specificrules (conceptual rules)Database is organized according to specific rules(structured data rules)
João Luiz Kohl Moreira Organizing Data
Data administration
“Astronomer” data −→ DATABASE −→ CustomAstronomers organize data according to their specificrules (conceptual rules)
Database is organized according to specific rules(structured data rules)
João Luiz Kohl Moreira Organizing Data
Data administration
“Astronomer” data −→ DATABASE −→ CustomAstronomers organize data according to their specificrules (conceptual rules)Database is organized according to specific rules(structured data rules)
João Luiz Kohl Moreira Organizing Data
Two problems
�� ��DATABASE −→ Custom
�� ��Astronomical data −→ DATABASE
João Luiz Kohl Moreira Organizing Data
Two problems
�� ��DATABASE −→ Custom�� ��Astronomical data −→ DATABASE
João Luiz Kohl Moreira Organizing Data
Why is it‘ problematic?
Astronomers are concerned about science, thecomprehension of the universe
Database designers are concerned about theintegrity of data
João Luiz Kohl Moreira Organizing Data
Why is it‘ problematic?
Astronomers are concerned about science, thecomprehension of the universeDatabase designers are concerned about theintegrity of data
João Luiz Kohl Moreira Organizing Data
Data −→ Custom
Databasestructured tables
Database views
João Luiz Kohl Moreira Organizing Data
Views
idBands Value
1 V
2 B
3 r
4 i
BandsidObj idBand Value
17711 1 12.7
17728 1 13.9
17711 3 16.5
17711 4 19.2
17728 3 17.1
17711 2 16.9
Mags
Database
idObj mV
mB
mr
mi
17711 12.7 16.9 16.5 19.2
17728 13.9 17.1
... ... ... ... ...
Astronomical
João Luiz Kohl Moreira Organizing Data
Querying database
Tell me, oracle...
João Luiz Kohl Moreira Organizing Data
A query language
SQL (Structured Query Language) - trying to beintuitive
SNINFE: SQL is Not IRAF, Not FORTRAN Even...
João Luiz Kohl Moreira Organizing Data
A query language
SQL (Structured Query Language) - trying to beintuitiveSNINFE: SQL is Not IRAF, Not FORTRAN Even...
João Luiz Kohl Moreira Organizing Data
NVO Advanced SkyQuery
Challenge: Create an effective way for ’real’astronomers to easily query V.O.
João Luiz Kohl Moreira Organizing Data
NVO Advanced SkyQuery
Challenge: Create an effective way for ’real’astronomers to easily query V.O.
João Luiz Kohl Moreira Organizing Data
Dilemma
Easier −→ less options −→ less enlightening
João Luiz Kohl Moreira Organizing Data
Data −→ DATABASE
Database load process
Databasestructured tables
Astronomicaltables
?
João Luiz Kohl Moreira Organizing Data
To take into account
1 Data in astronomical tables are not static;
astronomers use to change data in their tables
2 Load process is then frequent3 Each astronomical dataset needs a specific load
program4 Lives would be wasted in writing load programs
João Luiz Kohl Moreira Organizing Data
To take into account
1 Data in astronomical tables are not static;
astronomers use to change data in their tables
2 Load process is then frequent
3 Each astronomical dataset needs a specific loadprogram
4 Lives would be wasted in writing load programs
João Luiz Kohl Moreira Organizing Data
To take into account
1 Data in astronomical tables are not static;
astronomers use to change data in their tables
2 Load process is then frequent3 Each astronomical dataset needs a specific load
program
4 Lives would be wasted in writing load programs
João Luiz Kohl Moreira Organizing Data
To take into account
1 Data in astronomical tables are not static;
astronomers use to change data in their tables
2 Load process is then frequent3 Each astronomical dataset needs a specific load
program4 Lives would be wasted in writing load programs
João Luiz Kohl Moreira Organizing Data
Semiology: Schema E.R.C.
“Rélation Expression - Contenu”
Roland Barthes (1964), Elements of Semiology, Ed. Hilland Wang
João Luiz Kohl Moreira Organizing Data
Semiology: Schema E.R.C.
Expression Content
Relation
Result: Meta-language
João Luiz Kohl Moreira Organizing Data
Semiology: Schema E.R.C.
Expression Content
Relation
Result: Meta-language
João Luiz Kohl Moreira Organizing Data
A Control File (YAML)
João Luiz Kohl Moreira Organizing Data
Data Structure (YAML)
João Luiz Kohl Moreira Organizing Data
Data file (Tabular)
João Luiz Kohl Moreira Organizing Data
Scenario (ideal) for CosmoBook
User:
creates his own database (based on some availablestandards);generates the syntax for load program to read datafiles and store data in a structured way;retrieve his data from database
João Luiz Kohl Moreira Organizing Data
Scenario (ideal) for CosmoBook
User:
creates his own database (based on some availablestandards);
generates the syntax for load program to read datafiles and store data in a structured way;retrieve his data from database
João Luiz Kohl Moreira Organizing Data
Scenario (ideal) for CosmoBook
User:
creates his own database (based on some availablestandards);generates the syntax for load program to read datafiles and store data in a structured way;
retrieve his data from database
João Luiz Kohl Moreira Organizing Data
Scenario (ideal) for CosmoBook
User:
creates his own database (based on some availablestandards);generates the syntax for load program to read datafiles and store data in a structured way;retrieve his data from database
João Luiz Kohl Moreira Organizing Data
The End
“This is Apollo 13, signing off”James Lovell - Cmt
João Luiz Kohl Moreira Organizing Data