What is a database? An organized collection of data. This can be in an electronic, paper, or other...
-
Upload
kelly-henderson -
Category
Documents
-
view
216 -
download
0
Transcript of What is a database? An organized collection of data. This can be in an electronic, paper, or other...
What is a database? An organized collection of data. This can be in an electronic, paper, or other format.
Types of databases Operational - constantly changing because entries are dynamic. Example is
customer purchases and inventory control database
Analytical - once data are collected, they remain static. This is typical of scientific databases
Legacy - Also known as inherited database. Created by someone else
Created- My own term for a database you create
Derived- Database you create by importing another database
Flat file databasesThis is commonly the way we first view “databases”. Spreadsheets, word processing documents or simple ASCII files are common examples
Flat file databasesThis is commonly the way we first view “databases”. Spreadsheets, word processing documents or simple ASCII files are common examples
Order ID Order Date
Ship Date
Sales Rep Customer Item 1 Quantity Item 2 Quantity ….
1 10 May, 2003
11 May 2003
Jim MSU Plankton Splitter
1 Ekman Dredge
1 ….
2 May 11, 2003
11 May 2003
Jim Michigan State Ekman Dredge
2 Plankton Splitter
2 ….
3 5/12/2003 11 May 2003
Bill, Jim M.S.U. Plankton net 3 ….
4 5/12/03 11 May 2003
Jim, Bill That other school in Ann Arbor
Zooplankton net
1 ….
This example shows a lot of problems. For example, -Very constrained - only two items allowed per order -Lacks ability to search easily (e.g., finding a specific item ordered is difficult and not always robust) -Lacks database integrity. For example, MSU is not represented consistently
The first and most critical concept is that of a relational database where the data are stored in multiple tables when necessary
Associated with this is the key idea that the data may be stored in adifferent format than how we view the data
We will get back to these ideas again (probably more often than youwould like!)
Overview of Database Design Process
1. Goals and objectives for database2. Analyze current database3. Create data structure4. Establish table relationships5. Define business rules6. Establish views7. Review data integrity
One of the key points is that this is an iterative process – you may need to go back to earlier steps if you find problems
Example of Database Design Process
-Introduction to example data set
1. Goals and objectives
Goal is to be able to determine the catch and size distribution of individual fish species at specific sites or groups of sites in our research program. We also want to be able to describe habitat conditions at these sites and relate them to the fish catches
Objectives:1. To be able to compute catch per effort for each species at individual sites, and for the above barrier sites and for the below barrier sites as a group2. To be able to compute mean size for each species at individual sites, and for the above barrier sites and below barrier sites3. ...
2. Analyze current database
In this case, we have data sheets already filled in, so we will use this to analyze our current (paper) database
Begin by describing how data are collected. During this process, focus on units of observation (entities) or sampling events, and descriptions or measurements.
Create list of all variables (attributes), entities and events
Associate every variable with one or more entity or event
Water flow
Barrier1
1
2
2
3
3
Within a site
Transect 1Width, Depth,50 substrateparticles
Transect 2
Transect 3
Variables Entities or EventsStream name ShockingFish species caught HabitatFish lengthSample datePosition (Above or Below Barrier)Treatment or Reference StreamSegment ID number (=site)Length of segmentCrew membersConductivityWater TemperatureWeather ConditionsWater ConditionsTransect widthTransect depthTransect ID numberParticle size
Refinements
Variables Entities or EventsStream name ShockingFish species caught HabitatFish species caught (Common name, Streams scientific name, family) TransectsFish length SubstrateSample date Year, Month, DayPosition (Above or Below Barrier)Treatment or Reference StreamSegment ID number (=site)Length of segmentCrew members (always three)ConductivityWater TemperatureWeather Conditions (Cloud Cover, Precipitation)Water Conditions (Water color, Water height)Transect widthTransect depthTransect ID numberParticle size
From this preliminary set of entities and descriptors, develop preliminary list of tables and fields
TABLES- contain information on a particular entities or events FIELDS - describe the attributes of entities or eventsRECORD- contains the information or data on an individuals
Characteristics of a “Good” Field
• It represents a characteristic of the subject of the table• It contains only a single value (e.g., if had two instructors for a
course, the instructor field should not contain both names). This is in contrast to MULTIVALUED FIELDS.
• It can not be broken down into smaller components (e.g., the entire address for a person can be broken down into street address, city, state, zip code). This is in contrast to MULTIPART FIELDS.
• It does not contain a calculated value. Fields which are determined by values in other fields are CALCULATED FIELDS.
• The field is unique within the database unless it is needed to link tables
• The field retains all its characteristics if it appears in more than one table
Characteristics of a “Good” Table
• Each table refers to a single class of entities or unit of observation or event
• There is a way to uniquely identify each entry in a table. This is called the PRIMARY KEY.
• It does not contain multipart, multivalued, or calculated fields.
• It does not contain unnecessary fields, or unnecessary redundant data
• It contains all of the fields necessary to link it to other tables you want to link (or relate) it to
Stream Table Stream ID Stream Name Barrier or Reference
Shocking Event Table Stream ID Position (above/below) Segment Date Crew Segment Length Conductivity Water Temperature Weather Water ConditionsHabitat Transect Table
Stream ID Transect number Width Depth ???Substrate???
Fish Table Stream ID Position (above/below) Fish name Length Total Catch
First Cut at Developing Tables
Stream Table Stream ID Stream Name Barrier or Reference
Shocking Event Table Stream ID Sampling Event ID Position (above/below) Segment Date Crew Segment Length Conductivity Water Temperature Weather Water Conditions
Habitat Transect Table Stream ID Sampling Event ID Transect number Width Depth ???Substrate???
Fish Table Stream ID Sampling Event ID Position (above/below) Fish name Fish species code Length Total Catch
Refinements to Tables
Substrate Table Sampling Event ID Transect number Particle ID Particle size code
Another example: Deer habitat use in SE Michigan
Habitat patches-size-cover type
Deer characteristics-Deer ID-age-sex
Telemetry observation-Year-Month-Day-Time-Deer ID-Habitat patch (or lat/lon ?)
Homework
• Develop list of tables and fields for your database project
• With a partner, go over your list to determine if each table and field meets the criteria for being “good”