11/28/2000Information Organization and Retrieval Introduction to Databases and Database Design...

54
11/28/2000 Information Organization and Retrieval Introduction to Databases and Database Design University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of 11/28/2000Information Organization and Retrieval Introduction to Databases and Database Design...

11/28/2000 Information Organization and Retrieval

Introduction to Databases and Database Design

University of California, Berkeley

School of Information Management and Systems

SIMS 202: Information Organization and Retrieval

11/28/2000 Information Organization and Retrieval

Review

• Thesaurus Design and Construction

11/28/2000 Information Organization and Retrieval

Today

• Databases and Files

• Database design

• Database life cycle

• Data Models

11/28/2000 Information Organization and Retrieval

What is a Database?

11/28/2000 Information Organization and Retrieval

Files and Databases• File: A collection of records or documents

dealing with one organization, person, area or subject. (Rowley)– Manual (paper) files– Computer files

• Database: A collection of similar records with relationships between the records. (Rowley)– bibliographic, statistical, business data, images, etc.

11/28/2000 Information Organization and Retrieval

Database

• A Database is a collection of stored operational data used by the application systems of some particular enterprise. (C.J. Date)– Paper “Databases”

• Still contain a large portion of the world’s knowledge

– File-Based Data Processing Systems• Early batch processing of (primarily) business data

– Database Management Systems (DBMS)

11/28/2000 Information Organization and Retrieval

Why DBMS?

• History– 50’s and 60’s all applications were custom built

for particular needs– File based– Many similar/duplicative applications dealing

with collections of business data– Early DBMS were extensions of programming

languages– 1970 - E.F. Codd and the Relational Model– 1979 - Ashton-Tate & first Microcomputer

DBMS

11/28/2000 Information Organization and Retrieval

File Based Systems

Naughty

NiceJust what asked for

CoalEstimation

DeliveryList

Application File

ToysAddresses

Toys

11/28/2000 Information Organization and Retrieval

From File Systems to DBMS

• Problems with File Processing systems– Inconsistent Data– Inflexibility– Limited Data Sharing– Poor enforcement of standards– Excessive program maintenance

11/28/2000 Information Organization and Retrieval

DBMS Benefits

• Minimal Data Redundancy• Consistency of Data• Integration of Data• Sharing of Data• Ease of Application Development• Uniform Security, Privacy, and Integrity

Controls• Data Accessibility and Responsiveness• Data Independence• Reduced Program Maintenance

11/28/2000 Information Organization and Retrieval

Database Environment

CASE Tools

DBMS

UserInterface

ApplicationPrograms

Repository Database

11/28/2000 Information Organization and Retrieval

Database Components

DBMS===============

Design toolsTable CreationForm CreationQuery CreationReport Creation

Procedural language

compiler (4GL)=============

Run timeForm processorQuery processor

Report WriterLanguage Run time

UserInterface

Applications

ApplicationProgramsDatabase

Database contains:User’s DataMetadataIndexesApplication Metadata

Kroenke, DatabaseProcessing

11/28/2000 Information Organization and Retrieval

Terms and Concepts• Database:

– A collection of similar records with relationships between the records. (Rowley)

– A Database is a collection of stored operational data used by the application systems of some particular enterprise. (C.J. Date)

11/28/2000 Information Organization and Retrieval

Terms and Concepts• Enterprise

– Organization• Entity

– Person, Place, Thing, Event, Concept...• Attributes

– Data elements (facts) about some entity– Also sometimes called fields or items or domains

• Data values– instances of a particular attribute for a particular

entity

11/28/2000 Information Organization and Retrieval

Terms and Concepts

• Records– The set of values for all attributes of a particular

entity– AKA “tuples” or “rows” in relational DBMS

• File– Collection of records – Usually a physical file on OS– May also be a “logical file” like a “Relation” or

“Table” in relational DBMS

11/28/2000 Information Organization and Retrieval

Terms and Concepts

• Key– an attribute or set of attributes used to identify

or locate records in a file

• Primary Key– an attribute or set of attributes that uniquely

identifies each record in a file

11/28/2000 Information Organization and Retrieval

Terms and Concepts• Data Independence

– Physical representation and location of data and the use of that data are separated

• The application doesn’t need to know how or where the database has stored the data, but just how to ask for it.

• Moving a database from one DBMS to another should not have a material effect on application program

• Recoding, adding fields, etc. in the database should not affect applications

11/28/2000 Information Organization and Retrieval

Terms and Concepts

• Models– (1) Levels or views of the Database

• Conceptual, logical, physical

– (2) DBMS types• Relational, Hierarchic, Network, Object-Oriented,

Object-Relational

11/28/2000 Information Organization and Retrieval

Models (1)

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

11/28/2000 Information Organization and Retrieval

Terms and Concepts

• Metadata– Data about data

• In DBMS means all of the characteristics describing the attributes of an entity, E.G.:

– name of attribute– data type of attribute– size of the attribute– format or special characteristics

– Characteristics of files or relations• name, content, notes, etc.

11/28/2000 Information Organization and Retrieval

Terms and Concepts

• Data Dictionary– AKA repository– The place where all metadata for a particular

database is stored– may also include information on relationships

between files or tables in a particular database

11/28/2000 Information Organization and Retrieval

Terms and Concepts• Data Administration

– Responsibility for the overall management of data resources within an organization

• Database Administration– Responsibility for physical database design and technical

issues in database management

• Data Steward– Responsibility for some subset of the organization’s data,

and all of the interactions (applications, user access, etc.) for that data

11/28/2000 Information Organization and Retrieval

Terms and Concepts

• DA– Data adminstrator - person responsible for the

Data Administration function in an organization– Sometimes may be the CIO -- Chief

Information Officer

• DBA– Database Administrator - person responsible for

the Database Administration Function

11/28/2000 Information Organization and Retrieval

Database System Life Cycle

Growth,Change, &

Maintenance6

Operations5

Integration4

Design1

Conversion3

PhysicalCreation

2

11/28/2000 Information Organization and Retrieval

Design• Determination of the needs of the

organization

• Development of the Conceptual Model of the database– Typically using Entity-Relationship

diagramming techniques

• Construction of a Data Dictionary

• Development of the Logical Model

11/28/2000 Information Organization and Retrieval

Physical Creation• Development of the Physical Model of the

Database– data formats and types– determination of indexes, etc.

• Load a prototype database and test

• Determine and implement security, privacy and access controls

• Determine and implement integrity constraints

11/28/2000 Information Organization and Retrieval

Conversion

• Convert existing data sets and applications to use the new database– May need programs, conversion utilities to

convert old data to new formats.

11/28/2000 Information Organization and Retrieval

Integration

• Overlaps with Phase 3

• Integration of converted applications and new applications into the new database

11/28/2000 Information Organization and Retrieval

Operations

• All applications run full-scale

• Privacy, security, access control must be in place.

• Recovery and Backup procedures must be established and used

11/28/2000 Information Organization and Retrieval

Growth, Change & Maintenance

• Change is a way of life– Applications, data requirements, reports, etc.

will all change as new needs and requirements are found

– The Database and applications and will need to be modified to meet the needs of changes

11/28/2000 Information Organization and Retrieval

Another View of the Life Cycle

Operations5

Conversion3

PhysicalCreation

2Growth, Change

6

Integration4

Design1

11/28/2000 Information Organization and Retrieval

Database Models

• Models(2): DBMS types– Hierarchical– Network– Relational– Object-Oriented

11/28/2000 Information Organization and Retrieval

Database Data Models

• Hierarchical Model– Similar to data structures in programming

languages.Books

(id, title)

Publisher SubjectsAuthors

(first, last)

11/28/2000 Information Organization and Retrieval

Database Data Models

• Network Model– Provides for single entries of data and

navigational “links” through chains of data.

Subjects Books

Authors

Publishers

11/28/2000 Information Organization and Retrieval

Database Data Models

• Relational Model– Provides a conceptually simple model for data

as relations (typically considered “tables”) with all data visible.

Book ID Title pubid Author id1 Introductio 2 12 The history 4 23 New stuff ab 3 34 Another title 2 45 And yet more 1 5

pubid pubname1 Harper2 Addison3 Oxford4 Que

Authorid Author name1 Smith2 Wynar3 Jones4 Duncan5 Applegate

Subid Subject1 cataloging2 history3 stuff

Book ID Subid1 22 13 34 24 3

11/28/2000 Information Organization and Retrieval

Database Data Models

• Object Oriented Data Model– Encapsulates data and operations as “Objects”

Books(id, title)

Publisher SubjectsAuthors

(first, last)

11/28/2000 Information Organization and Retrieval

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Conceptual requirements

Conceptual requirements

Conceptual requirements

Application 1

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

External Model

External Model

Internal Model

11/28/2000 Information Organization and Retrieval

Database Design Process• Conceptual Requirements

– Systems Analysis Process• Examine all of the information sources used in existing

applications

• Identify the characteristics of each data element– numeric

– text

– date/time

– etc.

• Examine the tasks carried out using the information

• Examine results or reports created using the information

11/28/2000 Information Organization and Retrieval

Database Design Process

• Conceptual Model– Merge the collective needs of all applications

– Determine what Entities are being used• Some object about which information is to maintained

– What are the Attributes of those entities?• Properties or characteristics of the entity

• What attributes uniquely identify the entity

– What are the Relationships between entities• How the entities interact with each other?

11/28/2000 Information Organization and Retrieval

Database Design Process

• Logical Model– How is each entity and relationship represented

in the Data Model of the DBMS• Hierarchic?

• Network?

• Relational?

• Object-Oriented?

11/28/2000 Information Organization and Retrieval

Database Design Process

• Internal Model– Choices of index file structure– Choices of data storage formats– Choices of disk layout

11/28/2000 Information Organization and Retrieval

Database Design Process

• External Model– User views of the integrated database – Making the old (or updated) applications work

with the new database design

11/28/2000 Information Organization and Retrieval

Developing a Conceptual Model• Overall view of the database that integrates all the

needed information discovered during the requirements analysis.

• Elements of the Conceptual Model are represented by diagrams, Entity-Relationship or ER Diagrams, that show the meanings and relationships of those elements independent of any particular database systems or implementation details.

11/28/2000 Information Organization and Retrieval

Entity

• An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information– Persons (e.g.: customers in a business,

employees, authors)– Things (e.g.: purchase orders, meetings, parts,

companies)

Employee

11/28/2000 Information Organization and Retrieval

Attributes

• Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it. (This is the Metadata for the entities.)

Employee

Last

Middle

First

Name SSN

Age

Birthdate

Projects

11/28/2000 Information Organization and Retrieval

Relationships

• Relationships are the associations between entities. They can involve one or more entities and belong to particular relationship types

11/28/2000 Information Organization and Retrieval

Relationships

ClassAttendsStudent

PartSuppliesproject

partsSupplier

Project

11/28/2000 Information Organization and Retrieval

Types of Relationships

• Concerned only with cardinality of relationship

TruckAssignedEmployee

ProjectAssignedEmployee

ProjectAssignedEmployee

1 1

n

n

1

m

Chen ER notation

11/28/2000 Information Organization and Retrieval

Other Notations

TruckAssignedEmployee

ProjectAssignedEmployee

ProjectAssignedEmployee

“Crow’s Foot”

11/28/2000 Information Organization and Retrieval

Other Notations

TruckAssignedEmployee

ProjectAssignedEmployee

ProjectAssignedEmployee

IDEFIX Notation

11/28/2000 Information Organization and Retrieval

More Complex Relationships

ProjectAssignedEmployee 4(2-10) 1

ProjectEvaluationEmployee

Manager

1/n/n

1/1/1

n/n/1

ManagesEmployee

SSN ProjectDate

Manages

Is Managed By

1

n

11/28/2000 Information Organization and Retrieval

Weak Entities

• Owe existence entirely to another entity

Order-lineContainsOrder

Invoice #

Part#

Rep#

QuantityInvoice#

11/28/2000 Information Organization and Retrieval

Supertype and Subtype Entities

ClerkIs one ofSales-rep

Invoice

Other

Employee

Sold

Manages

11/28/2000 Information Organization and Retrieval

Many to Many Relationships

Employee

ProjectIsAssigned

ProjectAssignment

Assigned

SSN

Proj#

SSN

Proj#Hours