VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall,...
-
Upload
aron-shelton -
Category
Documents
-
view
214 -
download
1
Transcript of VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall,...
VOTable:Tabular Data for Virtual Observatory
François OchsenbeinRoy Williams
Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta, Tom McGlynn, Alex Szalay,
Andreas Wicenec
The Context
Need of exchanging data in tabular form:• Coming from a wide variety of data servers
and archives (VO context)• Must include the associated metadata in
order to be interpretable by applications• Must deal with potentially millions of
records• Existence of FITS
VOTable History
• Astrores at CDS/ESO (June 1999)• XSIL at Caltech (June 2000)• October 2001: first discussions• December 2001: VOTable 0.1 • January 2002: Interoperability meeting Strasbourg• 15 April 2002: VOTable 1.0
http://cdsweb.u-strasbg.fr/doc/VOTable/
VOTable archives & discussion groups:
http://archives.us-vo.org/VOTable/
Why XML ?
• includes in a single document the data and their associated metadata (descriptive data)
• is of common usage since ~ 3 years
• can be interpreted parsers and tools readily available
• can be visualized (XSL)
• can be encapsulated in messages
A “classical” XML Document
<?xml version="1.0"?><!DOCTYPE VOTABLE SYSTEM "http://us-vo.org/xml/…...dtd"> <RESOURCE name="myResource"> <OBSERVER>William Herschel</OBSERVER> <SOURCE id="mySource"> <STAR-NAME>Procyon</STAR-NAME> <POSITION equinox="J2000" epoch="J2000"> <RA unit="deg">114.827</RA> <Dec unit="deg">+05.227</Dec> </POSITION> <COUNTS> <COUNT>4</COUNT> <COUNT>5</COUNT> <COUNT>3</COUNT> </COUNTS> </SOURCE> ….. </RESOURCE>
Problems of “classical” XML Documents
Each data element is <tagged>, meaning:
• Huge overheads in terms of volume, required resources, and processing time
Not adapted to multi-million row tables
• Need to introduce new elements (tags) for each new parameter, or to cross-match a potentially large set of name spaces
The VOTable way
• The metadata part (data description), essentially as a set of <FIELD> and <PARAMETER> specifications
• The data part (serialisation), which may be in XML, FITS or binary.
VOTables follow the classical tabular presentation where the columns are assumed to be homogeneous in terms of their associated metadata; a VOTable document contains:
<?xml version="1.0"?><!DOCTYPE VOTABLE SYSTEM "http://us-vo.org/xml/VOTable.dtd"><VOTABLE version="1.0"> <DEFINITIONS> <COOSYS ID="myJ2000" equinox="2000." epoch="2000." system="eq_FK5"/> </DEFINITIONS> <RESOURCE> <PARAM name="Observer" datatype="char" arraysize="*" value="William Herschel"> <DESCRIPTION>This parameter is designed to store the observer's name </DESCRIPTION> </PARAM> <TABLE name="Stars"> <DESCRIPTION>Some bright stars</DESCRIPTION> <FIELD name="Star-Name" ucd="ID_MAIN" datatype="char" arraysize="10"/> <FIELD name="RA" ucd="POS_EQ_RA" ref="myJ2000" unit="deg" datatype="float" precision="F3" width="7"/> <FIELD name="Dec" ucd="POS_EQ_DEC" ref="myJ2000" unit="deg" datatype="float" precision="F3" width="7"/> <FIELD name="Counts" ucd="NUMBER" datatype="int" arraysize="2x3x*"/> <DATA> <TABLEDATA> <TR> <TD>Procyon</TD><TD>114.827</TD><TD> 5.227</TD> <TD>4 5 3 4 3 2 1 2 3 3 5 6</TD> </TR> <TR> <TD>Vega</TD><TD>279.234</TD> <TD>38.782</TD><TD>8 7 8 6 8 6</TD> </TR> </TABLEDATA> </DATA> </TABLE> </RESOURCE></VOTABLE>
<RESOURCE> <PARAM …/> … <TABLE> <FIELD…/>… <DATA>
<TABLEDATA> <TR> <TD>… </TR> …</TABLEDATA>
<FITS extnum="n "> <STREAM …></FITS>
<BINARY> <STREAM …></BINARY>
</DATA> </TABLE></RESOURCE>
The <FIELD> and <PARAMETER>
name column label
unit standardized unit
datatype computer type
width character representation
precision character representation
Arraysize repetition factor
ucd standardized parameter category
Describe the metadata attached to columns <FIELD>or to the resource <PARAMETER>
The UCDs
• Interpretation of the table contents• Decide whether values can be compared• Data mining
S. Derrière's talk on Friday
Unified Content Descriptor
Categorisation of the parameters listed in the table
datatype Meaning FITS Bytes
"boolean" Logical L 1
"bit" Bit X *
"unsignedByte" Byte (0 to 255) B 1
"short" Short Integer I 2
"int" Integer J 4
"long" Long integer K 8
"char" ASCII Character A 1
"unicodeChar" Unicode Character
2
"float" Floating point E 4
"double" Double D 8
"floatComplex" Float Complex C 8 "doubleComplex
"Double Complex M 16
FITS Compatibility
• Compatible data types• FITS keywords are represented as <FIELD>,
e.g. width precision arraysize
• Array and variable-length arrays• <DATA> may link to existing FITS data sets
VOTable was designed to be compatible with existing FITS data tables
Data SerializationFITS or BINARY data may be embedded in thedocument, or remote; compression/encodingmay be applied.
Existing tools and Servers
• Several databases are delivering VOTables: HEASARC IPAC NOAO NRAO VizieR SIMBAD (cone search >50 services)
• VOTable parsers in Perl, Java, C (different types of parsers for different applications)
• VOTable validators
• XSLT basic XML/HTML translators
DTD or XML-Schema
• The VOTable rules are existing as a DTD (Document-Type Definition) and in the XML-Schema language (heavily used in developping WebServices applications)
VOTable appendices
1.The LINK conventions describing how to get the correlated data (explanations, images, spectra…) based on substitution of the column contents
Astrores had two features not implemented in VOTables:
…<FIELD name="FileName" datatype="char"…/>…<LINK href="http://server/getFile?${FileName}" …/>…<TR> … <TD>photo/procyon.dat</TD>… </TR><TR> … <TD>photo/vega.dat</TD>… </TR>
VOTable appendices (2)
2. The Query Mechanism using conventions similar to the HTML <FORM> for retrieving the data from user-supplied constraints
<PARAM name="Observer" datatype="char" arraysize="*" /> <TABLE name="Stars"> <DESCRIPTION>Some bright
stars</DESCRIPTION> <FIELD name="Star-Name" ucd="ID_MAIN" datatype="char"
arraysize="10"/> <FIELD name="RA" ucd="POS_EQ_RA" ref="myJ2000" unit="deg"
datatype="float" precision="F3" width="7"/> <FIELD name="Dec" ucd="POS_EQ_DEC" ref="myJ2000" unit="deg"
datatype="float" precision="F3" width="7"/> <FIELD name="Counts" ucd="NUMBER" datatype="int"
arraysize="2x3x*"/> <LINK type="query" action="http://server-node/getResult?" /> </TABLE>
toward more generic WDSL-like solutions ?
Conclusions
• Just version 1.0 … more to come
• Comments ? Proposals ?
Join the discussion group