Organizing Data: A Challenge or just a Bureaucratic...

54
Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl Moreira Observatório Nacional - MCT Oct-2012 João Luiz Kohl Moreira Organizing Data

Transcript of Organizing Data: A Challenge or just a Bureaucratic...

Page 1: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Organizing Data:A Challenge or just a Bureaucratic Task?

João Luiz Kohl Moreira

Observatório Nacional - MCT

Oct-2012

João Luiz Kohl Moreira Organizing Data

Page 2: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Area of Interest

Software for astronomy

Astronomical Data Analysis Software and Systems(ADASS) - Conferences since 1991.

Is it science?

João Luiz Kohl Moreira Organizing Data

Page 3: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Area of Interest

Software for astronomy

Astronomical Data Analysis Software and Systems(ADASS) - Conferences since 1991.

Is it science?

João Luiz Kohl Moreira Organizing Data

Page 4: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

What is science?

Popper:1 Collect data (observing)

2 Taxonomy

1 Organize data2 Associate pieces of data with each other

3 Model & Theory

João Luiz Kohl Moreira Organizing Data

Page 5: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

What is science?

Popper:1 Collect data (observing)2 Taxonomy

1 Organize data2 Associate pieces of data with each other

3 Model & Theory

João Luiz Kohl Moreira Organizing Data

Page 6: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

What is science?

Popper:1 Collect data (observing)2 Taxonomy

1 Organize data

2 Associate pieces of data with each other

3 Model & Theory

João Luiz Kohl Moreira Organizing Data

Page 7: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

What is science?

Popper:1 Collect data (observing)2 Taxonomy

1 Organize data2 Associate pieces of data with each other

3 Model & Theory

João Luiz Kohl Moreira Organizing Data

Page 8: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

What is science?

Popper:1 Collect data (observing)2 Taxonomy

1 Organize data2 Associate pieces of data with each other

3 Model & Theory

João Luiz Kohl Moreira Organizing Data

Page 9: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Myself

Doing science is to find out paradoxes

João Luiz Kohl Moreira Organizing Data

Page 10: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Where is it being done?

USANOAO: National Optical Astronomy Observatory -Tucson - Az

Manage KPO and CTIO andDevelop an integrated analysis facility: IRAF - ImageReduction and Analysis Facility

STScI: Space Telescope Science Institute

Operate Scientific Data from Hubble Telescope andDevelop an integrated analysis facility: STSDAS - SpaceTelescope Science Data Analysis System

Illinois, Princeton, Caltech, SAO ...

CANADA: CDCEUROPE

StrasbourgMunichUK

João Luiz Kohl Moreira Organizing Data

Page 11: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Virtual Observatory: Who does it?

IVOA: International Virtual Observatory Alliance: 19members

Brasil: BRAVO

João Luiz Kohl Moreira Organizing Data

Page 12: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Data Handle

Data:

Every day generated data in astronomy is measuredthrough units of terabytesMoore’s law for astronomy: data volume and datarate doubles each 1.5 yr

João Luiz Kohl Moreira Organizing Data

Page 13: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Astronomical Moore’s Law

João Luiz Kohl Moreira Organizing Data

Page 14: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Data Analysis

Who does it?What for?

Data mining

V.O.

João Luiz Kohl Moreira Organizing Data

Page 15: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

V.O. Operation Schema

Raw data

RegistryCenters

CustomApps

V.O.

João Luiz Kohl Moreira Organizing Data

Page 16: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

CosmoBook

CosmoBook

João Luiz Kohl Moreira Organizing Data

Page 17: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

CosmoBook

CosmoBook

João Luiz Kohl Moreira Organizing Data

Page 18: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

CosmoBook

CosmoBook

João Luiz Kohl Moreira Organizing Data

Page 19: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

CosmoBook

CosmoBook

João Luiz Kohl Moreira Organizing Data

Page 20: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Some Facilities (to be) available on CosmoBook

1 2DPHOT (La Barbera etal, 2008)2 Solar V.O. (A. Valio, U. Mackenzie, SP)3 IRAF Work-flow (CEFET, Rio)4 GMA V.O. (F. Ferrari, FURG)

João Luiz Kohl Moreira Organizing Data

Page 21: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Social Network

João Luiz Kohl Moreira Organizing Data

Page 22: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

2DPHOT

La Barbera, F. et al (2008) PASP 120, 681-702

João Luiz Kohl Moreira Organizing Data

Page 23: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

2DPHOT

João Luiz Kohl Moreira Organizing Data

Page 24: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Data administration

“Astronomer” data −→ DATABASE −→ Custom

Astronomers organize data according to their specificrules (conceptual rules)Database is organized according to specific rules(structured data rules)

João Luiz Kohl Moreira Organizing Data

Page 25: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Data administration

“Astronomer” data −→ DATABASE −→ CustomAstronomers organize data according to their specificrules (conceptual rules)

Database is organized according to specific rules(structured data rules)

João Luiz Kohl Moreira Organizing Data

Page 26: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Data administration

“Astronomer” data −→ DATABASE −→ CustomAstronomers organize data according to their specificrules (conceptual rules)Database is organized according to specific rules(structured data rules)

João Luiz Kohl Moreira Organizing Data

Page 27: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Two problems

�� ��DATABASE −→ Custom

�� ��Astronomical data −→ DATABASE

João Luiz Kohl Moreira Organizing Data

Page 28: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Two problems

�� ��DATABASE −→ Custom�� ��Astronomical data −→ DATABASE

João Luiz Kohl Moreira Organizing Data

Page 29: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Why is it‘ problematic?

Astronomers are concerned about science, thecomprehension of the universe

Database designers are concerned about theintegrity of data

João Luiz Kohl Moreira Organizing Data

Page 30: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Why is it‘ problematic?

Astronomers are concerned about science, thecomprehension of the universeDatabase designers are concerned about theintegrity of data

João Luiz Kohl Moreira Organizing Data

Page 31: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Data −→ Custom

Databasestructured tables

Database views

João Luiz Kohl Moreira Organizing Data

Page 32: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Views

idBands Value

1 V

2 B

3 r

4 i

BandsidObj idBand Value

17711 1 12.7

17728 1 13.9

17711 3 16.5

17711 4 19.2

17728 3 17.1

17711 2 16.9

Mags

Database

idObj mV

mB

mr

mi

17711 12.7 16.9 16.5 19.2

17728 13.9 17.1

... ... ... ... ...

Astronomical

João Luiz Kohl Moreira Organizing Data

Page 33: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Querying database

Tell me, oracle...

João Luiz Kohl Moreira Organizing Data

Page 34: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

A query language

SQL (Structured Query Language) - trying to beintuitive

SNINFE: SQL is Not IRAF, Not FORTRAN Even...

João Luiz Kohl Moreira Organizing Data

Page 35: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

A query language

SQL (Structured Query Language) - trying to beintuitiveSNINFE: SQL is Not IRAF, Not FORTRAN Even...

João Luiz Kohl Moreira Organizing Data

Page 36: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

NVO Advanced SkyQuery

Challenge: Create an effective way for ’real’astronomers to easily query V.O.

João Luiz Kohl Moreira Organizing Data

Page 37: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

NVO Advanced SkyQuery

Challenge: Create an effective way for ’real’astronomers to easily query V.O.

João Luiz Kohl Moreira Organizing Data

Page 38: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Dilemma

Easier −→ less options −→ less enlightening

João Luiz Kohl Moreira Organizing Data

Page 39: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Data −→ DATABASE

Database load process

Databasestructured tables

Astronomicaltables

?

João Luiz Kohl Moreira Organizing Data

Page 40: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

To take into account

1 Data in astronomical tables are not static;

astronomers use to change data in their tables

2 Load process is then frequent3 Each astronomical dataset needs a specific load

program4 Lives would be wasted in writing load programs

João Luiz Kohl Moreira Organizing Data

Page 41: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

To take into account

1 Data in astronomical tables are not static;

astronomers use to change data in their tables

2 Load process is then frequent

3 Each astronomical dataset needs a specific loadprogram

4 Lives would be wasted in writing load programs

João Luiz Kohl Moreira Organizing Data

Page 42: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

To take into account

1 Data in astronomical tables are not static;

astronomers use to change data in their tables

2 Load process is then frequent3 Each astronomical dataset needs a specific load

program

4 Lives would be wasted in writing load programs

João Luiz Kohl Moreira Organizing Data

Page 43: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

To take into account

1 Data in astronomical tables are not static;

astronomers use to change data in their tables

2 Load process is then frequent3 Each astronomical dataset needs a specific load

program4 Lives would be wasted in writing load programs

João Luiz Kohl Moreira Organizing Data

Page 44: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Semiology: Schema E.R.C.

“Rélation Expression - Contenu”

Roland Barthes (1964), Elements of Semiology, Ed. Hilland Wang

João Luiz Kohl Moreira Organizing Data

Page 45: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Semiology: Schema E.R.C.

Expression Content

Relation

Result: Meta-language

João Luiz Kohl Moreira Organizing Data

Page 46: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Semiology: Schema E.R.C.

Expression Content

Relation

Result: Meta-language

João Luiz Kohl Moreira Organizing Data

Page 47: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

A Control File (YAML)

João Luiz Kohl Moreira Organizing Data

Page 48: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Data Structure (YAML)

João Luiz Kohl Moreira Organizing Data

Page 49: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Data file (Tabular)

João Luiz Kohl Moreira Organizing Data

Page 50: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Scenario (ideal) for CosmoBook

User:

creates his own database (based on some availablestandards);generates the syntax for load program to read datafiles and store data in a structured way;retrieve his data from database

João Luiz Kohl Moreira Organizing Data

Page 51: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Scenario (ideal) for CosmoBook

User:

creates his own database (based on some availablestandards);

generates the syntax for load program to read datafiles and store data in a structured way;retrieve his data from database

João Luiz Kohl Moreira Organizing Data

Page 52: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Scenario (ideal) for CosmoBook

User:

creates his own database (based on some availablestandards);generates the syntax for load program to read datafiles and store data in a structured way;

retrieve his data from database

João Luiz Kohl Moreira Organizing Data

Page 53: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

Scenario (ideal) for CosmoBook

User:

creates his own database (based on some availablestandards);generates the syntax for load program to read datafiles and store data in a structured way;retrieve his data from database

João Luiz Kohl Moreira Organizing Data

Page 54: Organizing Data: A Challenge or just a Bureaucratic Task?extranet.on.br/jlkm/Lecture_OrganizingData.pdf · Organizing Data: A Challenge or just a Bureaucratic Task? João Luiz Kohl

The End

“This is Apollo 13, signing off”James Lovell - Cmt

João Luiz Kohl Moreira Organizing Data