Central Arizona Phoenix LTER Center for Environmental Studies Arizona State University Database...

19
Center for Environmental Studies Central Arizona Phoenix LTER Database Design Peter McCartney (CAP) RDIFS Training Workshop Sevilleta LTER October 28-30, 2002

Transcript of Central Arizona Phoenix LTER Center for Environmental Studies Arizona State University Database...

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Database Design

Peter McCartney (CAP)

RDIFS Training WorkshopSevilleta LTER

October 28-30, 2002

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Two Perspectives on Design

• Field researcher– Goal is to facilitate entry and analysis of

what they observed– Tend emphasize information that

discriminates between cases– Data collection dictates design

• Data analyst– goal is to retrieve what they are looking for– Tend to emphasize information that

aggregates cases– Data use dictates design

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Two Interfaces

• Field forms– Organized by sampling events– Hierarchical

• Analysis and reporting– Matrix (“flat file”) structure– Aggregated or transformed values

( sums, z-scores, etc)– Encoded or decoded

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

The Data Cycle

RDBMSStorage

ILSDFWEWFDSF FWEFSDFWE DIFJDISLDIJFS

SDFSDFSDFSDIMLMKM FSMDSID MLSDIFMSLDFI

SDFSDFSL LSIDFSLDFM JIJIJIJLSDIFS

Sdfsdfsdfs dfsdfs 8989

sdfsdfsdsdfsd 90

sdfsdsd 4004

dfsdfsdfsdfsdfs dfsd

QueryAnalysis

QA/QCData Entry

Denormalized Normalized

Site, Type,Sample, date,taxon1A, urb, 1,2/2/99, 1451A, urb, 1, 2/2/99, 1231A, ag, 2, 2/3/99, 145…..1..A

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Normalization

• Information does not repeat• Every row can be uniquely

referenced• Attributes are independent• Attributes convey only one piece of

information• Each piece of information is

expressed in only one attribute

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Denormalization

• You can take a good thing too far• Large numbers of tables with small

amount of data will degrade performance and be difficult to read

• Denormalize to avoid excessive links

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Data Modeling

• Concentrate on logical design – independent of vendor

• Entity-Relationship (ER) diagram – Relational databases– Tables=entities, columns=attributes,

keys=relationships, code=procedures

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Entities - Tables

• Two dimensional matrix – fields and observations

• Should contain discrete attributes of information that repeat together as a group – use protocol as guide

• The majority of ecological datasets will have at least three entities– Site description (occurs once for each site)– Sample description (occurs each time a site

is sampled)– Observations (occurs once for each

observation made during a sample event)

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Attributes

• Data types– storage capacity– Precision– Performance

• Avoid dependencies– Do not make the meaning of one

attribute conditional on another• Avoid nesting

– Do not restrict the domain of one attribute based on the value of another

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Attribute properties

• Indexes– Primary Key– Alternate Key– Foreign key– Inversion key

• Null/not null• Default value• Assignment

– Identity– Production rules

• Check Constraints

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Attribute Domains

• Measurement scale• Units• Storage types• Domains

– Ranges– Enumerations– Rules

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Relationships (join conditions)

• Cardinality– Parent occurrences– Child occurrences

• properties– Propagating

• Enforcement– Keys– Triggers

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Some design considerations

• Use related tables to define domains if list is long and may grow, constraints if list is short

• Use alternate keys to enforce complicated nested identifiers, hidden primary keys for linking

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Spatial data

• Most ecological data have some spatial context– Some are just poorly referenced– Some gain spatial context through

relationships

• GIS systems– Powerful for storing and analyzing

spatial geometry – Not so good for managing data

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Spatial Data Entities

• Vector GIS – Attribute table with all the properties

of Tables– Plus some association with a

geometric representation– Geometry may be simple or complex

• Raster GIS– Cell attribute with similar properties

• View, Stored Procedures– Return tabular entities

Center for Environmental Studies

Arizona State University

Central Arizona PhoenixLTER

Strategies for integrating spatial data

• Use a GIS rather than a database– Advisable only if attribute data are minimal

and flat

• Hybrid GIS/relational database– Store geometries (points, lines, polygons) in

a GIS. – Use a key field to relate geometries to

tables in your database

• Use a spatial data server– Stores gis geometries in binary fields in your

database– Uses special software to access spatial and

non-spatial data in the same connection