PyCon 2007

Post on 06-Jun-2015

324 views 2 download

Tags:

description

Presentation given at PyCon 2007 in Dallas, TX, 2007, about pyDAP.

Transcript of PyCon 2007

Accessing and serving scientific datasets with Python

Dr. Rob De Almeida

The Data Access Protocol

● De facto standard for distributing science data on the internet, used by oceanography, meteorology and climate communities

● Simple HTTP-based protocol with XDR encoding for data transmission

● Supports complex dataset structures● Model output, satellite images, in-situ data,

etc.

Protocol details

● A dataset has different URLs describing it● http://server/dataset● http://server/dataset.dds (structure)● http://server/dataset.das (attributes)● http://server/dataset.dods (data)

● Client (usually) retrieves metadata from DDS/DAS responses and downloads data from DODS response as necessary

A simple example

● Dataset with a list “a” of integers from 0 to 9

● Let's also add a few attributes: author, history

● What is the representation of metadata and data?

Dataset Descriptor Structure

Dataset {

Int32 a[a = 10];

} test;

Dataset Attribute Structure

Attributes {

a {

String author "Rob De Almeida";

String history "Created for PyCon 2007";

}

}

DODS response

Dataset {

Int32 a[a = 10];

} test;

Data:

\x00\x00\x00\x0a\x00\x00\x00\x0a

\x00\x00\x00\x00\x00\x00\x00\x01

\x00\x00\x00\x02\x00\x00\x00\x03

\x00\x00\x00\x04\x00\x00\x00\x05

\x00\x00\x00\x06\x00\x00\x00\x07

\x00\x00\x00\x08\x00\x00\x00\x09

Using pyDAP as a client

● The client retrieves and parses the metadata (DAS/DDS), building a dataset object with all the variables than can be introspected

● Data is downloaded on the fly when required

● Uses httplib2 and a custom-made xdrlib based on numpy or array

Example usage

>>> from dap.client import open

>>> dataset = open('http://test.pydap.org/coads.nc', verbose=True)

http://test.pydap.org/coads.nc.dds

http://test.pydap.org/coads.nc.das

>>> print dataset.keys()

['UWND', 'WSPD', 'SST', 'VWND', 'SLP', 'AIRT', 'SPEH', 'COADSX', 'COADSY', 'TIME']

Introspecting the dataset

>>> time = dataset['TIME']

>>> print time.type, time.shape, time.dimensions

Float64 (12,) ('TIME',)

>>> print time.units

>>> print time.units

hour since 0000-01-01 00:00:00

Retrieving data

>>> print time[:]

http://test.pydap.org/coads.nc.dods?TIME[0:1:11]

[ 366. 1096.485 1826.97 2557.455 3287.94 4018.425 4748.91 5479.395 6209.88 6940.365 7670.85 8401.335]

>>> print time[0]

http://test.pydap.org/coads.nc.dods?TIME[0:1:0]

[ 366.]

>>> print time[-2:]

http://test.pydap.org/coads.nc.dods?TIME[10:1:11]

[ 7670.85 8401.335]

Working with sequential data

Dataset {

Sequence {

Int32 id;

Float64 lat;

Float64 lon;

} test;

} test%2Ecsv;

http://test.pydap.org/test.csv.dds

Retrieving data

>>> from dap.client import open

>>> dataset = open('http://test.pydap.org/test.csv', verbose=True)

http://test.pydap.org/test.csv.dds

http://test.pydap.org/test.csv.das

>>> seq = dataset['test']

>>> print seq['lat'][:]

http://test.pydap.org/test.csv.dods?test.lat

[10.1, 10.199999999999999, 10.300000000000001, 10.4, 10.5]

Iterating over sequential data

>>> for struct in seq:

... print struct['lat'].data, struct['lon'].data

...

http://test.pydap.org/test.csv.dods?test.id

http://test.pydap.org/test.csv.dods?test.lat

http://test.pydap.org/test.csv.dods?test.lon

10.1 103.0

10.2 93.0

10.3 83.0

10.4 73.0

10.5 63.0

Filtering sequences (sure way)

>>> fseq = seq.filter('%s<100' % seq.lon.id)

>>> for struct in fseq:

... print struct['lat'].data, struct['lon'].data

...

http://test.pydap.org/test.csv.dods?test.id&test.lon<100

http://test.pydap.org/test.csv.dods?test.lat&test.lon<100

http://test.pydap.org/test.csv.dods?test.lon&test.lon<100

10.2 93.0

10.3 83.0

10.4 73.0

10.5 63.0

Filtering sequences (fun way!)

>>> fseq = (struct for struct in seq if struct['lon'] < 100)

>>> for struct in fseq:

... print struct['lat'].data, struct['lon'].data

...

http://test.pydap.org/test.csv.dods?test.id&test.lon<100

http://test.pydap.org/test.csv.dods?test.lat&test.lon<100

http://test.pydap.org/test.csv.dods?test.lon&test.lon<100

10.2 93.0

10.3 83.0

10.4 73.0

10.5 63.0

Server

● pyDAP comes with a WSGI app that works as a DAP server

● Server is just a thin layer between plugins that handle data formats (netCDF, HFD5, SQL, etc.) and responses (DAS, DDS, DODS, HTML, KML, WMS, etc.)

● Can be deployed with Paster Script template:

● paster create -t dap_server myserver● paster server myserver/server.ini

Plugins and responses

Plugins and responses

http://localhost:8080/file.nc.das

Plugins

● Convert data from different formats to pyDAP types

● Plugins for netCDF, CSV, Matlab 4/5, HDF5, GrADS grib, GDAL, DB API 2, grib2

● EasyInstall (entry point dap.plugin):● easy_install dap.plugins.netcdf

Responses

● Convert from pyDAP types to something else

● “Official” responses: DAS, DDS, DODS● Generate data and metadata from the

dataset created by the plugins● Extra responses can be installed using

EasyInstall (entry point dap.response)

ASCII response

Dataset { Sequence { Int32 id; Float64 lat; Float64 lon; } test;} test%2Ecsv;---------------------------------------------test.id, test.lat, test.lon1, 10.1, 1032, 10.2, 933, 10.3, 834, 10.4, 735, 10.5, 63

http://test.pydap.org/test.csv.ascii

HTML response

● Generates an HTML form to download data

● Redirects user to ASCII response● Useful for users without a DAP client

Example HTML response

JSON response

{"test%2Ecsv": {"attributes": {"filename": "test.csv"}, "type": "Dataset",

"test": {"attributes": {}, "type": "Sequence", "id": {"attributes": {}, "type": "Int32"}, "lat": {"attributes": {}, "type": "Float64"}, "lon": {"attributes": {}, "type": "Float64"}}}}

http://test.pydap.org/test.csv.json

JSON response with data

{"test%2Ecsv": {"attributes": {"filename": "test.csv"}, "type": "Dataset",

"test": {"attributes": {}, "type": "Sequence", "data": [[1, 10.1, 103.0], [2, 10.2, 93.0], [3, 10.3, 83.0], [4, 10.4, 73.0], [5, 10.5, 63.0]], "id": {"attributes": {}, "type": "Int32"}, "lat": {"attributes": {}, "type": "Float64"}, "lon": {"attributes": {}, "type": "Float64"}}}}

http://test.pydap.org/test.csv.json?output_data=1

WMS response

● Returns maps (images) from requested variables and regions

● Works with geo-referenced grids and sequences

● Layers can be composed together● Data can be constrained:

● /coads.nc.wms?SST // annual mean● /coads.nc.wms?SST[0] // january

WMS example request

http://localhost:8080/netcdf/coads.nc.wms?LAYERS=SST&WIDTH=512

KML response

● Generates XML file using the Keyhole Markup Language, pointing to the WMS response

● Nice and simple interface for quick visualizing data

Future

● pyDAP 2.3 almost ready● Dapper compliance● Faster XDR encoding/decoding● Initial support for DDX response and parser

● Build a rich web interface (AJAX) based on JSON + WMS + KML responses

● Not only to pyDAP, but to other OPeNDAP servers using pyDAP as a proxy

Acknowledgments

● OPeNDAP for all the support● PSF for the financial support to be here● Everybody who submitted bugs (bonus

points for submitting patches!)