11/28/2000Information Organization and Retrieval Introduction to Databases and Database Design...
-
date post
20-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of 11/28/2000Information Organization and Retrieval Introduction to Databases and Database Design...
11/28/2000 Information Organization and Retrieval
Introduction to Databases and Database Design
University of California, Berkeley
School of Information Management and Systems
SIMS 202: Information Organization and Retrieval
11/28/2000 Information Organization and Retrieval
Today
• Databases and Files
• Database design
• Database life cycle
• Data Models
11/28/2000 Information Organization and Retrieval
Files and Databases• File: A collection of records or documents
dealing with one organization, person, area or subject. (Rowley)– Manual (paper) files– Computer files
• Database: A collection of similar records with relationships between the records. (Rowley)– bibliographic, statistical, business data, images, etc.
11/28/2000 Information Organization and Retrieval
Database
• A Database is a collection of stored operational data used by the application systems of some particular enterprise. (C.J. Date)– Paper “Databases”
• Still contain a large portion of the world’s knowledge
– File-Based Data Processing Systems• Early batch processing of (primarily) business data
– Database Management Systems (DBMS)
11/28/2000 Information Organization and Retrieval
Why DBMS?
• History– 50’s and 60’s all applications were custom built
for particular needs– File based– Many similar/duplicative applications dealing
with collections of business data– Early DBMS were extensions of programming
languages– 1970 - E.F. Codd and the Relational Model– 1979 - Ashton-Tate & first Microcomputer
DBMS
11/28/2000 Information Organization and Retrieval
File Based Systems
Naughty
NiceJust what asked for
CoalEstimation
DeliveryList
Application File
ToysAddresses
Toys
11/28/2000 Information Organization and Retrieval
From File Systems to DBMS
• Problems with File Processing systems– Inconsistent Data– Inflexibility– Limited Data Sharing– Poor enforcement of standards– Excessive program maintenance
11/28/2000 Information Organization and Retrieval
DBMS Benefits
• Minimal Data Redundancy• Consistency of Data• Integration of Data• Sharing of Data• Ease of Application Development• Uniform Security, Privacy, and Integrity
Controls• Data Accessibility and Responsiveness• Data Independence• Reduced Program Maintenance
11/28/2000 Information Organization and Retrieval
Database Environment
CASE Tools
DBMS
UserInterface
ApplicationPrograms
Repository Database
11/28/2000 Information Organization and Retrieval
Database Components
DBMS===============
Design toolsTable CreationForm CreationQuery CreationReport Creation
Procedural language
compiler (4GL)=============
Run timeForm processorQuery processor
Report WriterLanguage Run time
UserInterface
Applications
ApplicationProgramsDatabase
Database contains:User’s DataMetadataIndexesApplication Metadata
Kroenke, DatabaseProcessing
11/28/2000 Information Organization and Retrieval
Terms and Concepts• Database:
– A collection of similar records with relationships between the records. (Rowley)
– A Database is a collection of stored operational data used by the application systems of some particular enterprise. (C.J. Date)
11/28/2000 Information Organization and Retrieval
Terms and Concepts• Enterprise
– Organization• Entity
– Person, Place, Thing, Event, Concept...• Attributes
– Data elements (facts) about some entity– Also sometimes called fields or items or domains
• Data values– instances of a particular attribute for a particular
entity
11/28/2000 Information Organization and Retrieval
Terms and Concepts
• Records– The set of values for all attributes of a particular
entity– AKA “tuples” or “rows” in relational DBMS
• File– Collection of records – Usually a physical file on OS– May also be a “logical file” like a “Relation” or
“Table” in relational DBMS
11/28/2000 Information Organization and Retrieval
Terms and Concepts
• Key– an attribute or set of attributes used to identify
or locate records in a file
• Primary Key– an attribute or set of attributes that uniquely
identifies each record in a file
11/28/2000 Information Organization and Retrieval
Terms and Concepts• Data Independence
– Physical representation and location of data and the use of that data are separated
• The application doesn’t need to know how or where the database has stored the data, but just how to ask for it.
• Moving a database from one DBMS to another should not have a material effect on application program
• Recoding, adding fields, etc. in the database should not affect applications
11/28/2000 Information Organization and Retrieval
Terms and Concepts
• Models– (1) Levels or views of the Database
• Conceptual, logical, physical
– (2) DBMS types• Relational, Hierarchic, Network, Object-Oriented,
Object-Relational
11/28/2000 Information Organization and Retrieval
Models (1)
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
11/28/2000 Information Organization and Retrieval
Terms and Concepts
• Metadata– Data about data
• In DBMS means all of the characteristics describing the attributes of an entity, E.G.:
– name of attribute– data type of attribute– size of the attribute– format or special characteristics
– Characteristics of files or relations• name, content, notes, etc.
11/28/2000 Information Organization and Retrieval
Terms and Concepts
• Data Dictionary– AKA repository– The place where all metadata for a particular
database is stored– may also include information on relationships
between files or tables in a particular database
11/28/2000 Information Organization and Retrieval
Terms and Concepts• Data Administration
– Responsibility for the overall management of data resources within an organization
• Database Administration– Responsibility for physical database design and technical
issues in database management
• Data Steward– Responsibility for some subset of the organization’s data,
and all of the interactions (applications, user access, etc.) for that data
11/28/2000 Information Organization and Retrieval
Terms and Concepts
• DA– Data adminstrator - person responsible for the
Data Administration function in an organization– Sometimes may be the CIO -- Chief
Information Officer
• DBA– Database Administrator - person responsible for
the Database Administration Function
11/28/2000 Information Organization and Retrieval
Database System Life Cycle
Growth,Change, &
Maintenance6
Operations5
Integration4
Design1
Conversion3
PhysicalCreation
2
11/28/2000 Information Organization and Retrieval
Design• Determination of the needs of the
organization
• Development of the Conceptual Model of the database– Typically using Entity-Relationship
diagramming techniques
• Construction of a Data Dictionary
• Development of the Logical Model
11/28/2000 Information Organization and Retrieval
Physical Creation• Development of the Physical Model of the
Database– data formats and types– determination of indexes, etc.
• Load a prototype database and test
• Determine and implement security, privacy and access controls
• Determine and implement integrity constraints
11/28/2000 Information Organization and Retrieval
Conversion
• Convert existing data sets and applications to use the new database– May need programs, conversion utilities to
convert old data to new formats.
11/28/2000 Information Organization and Retrieval
Integration
• Overlaps with Phase 3
• Integration of converted applications and new applications into the new database
11/28/2000 Information Organization and Retrieval
Operations
• All applications run full-scale
• Privacy, security, access control must be in place.
• Recovery and Backup procedures must be established and used
11/28/2000 Information Organization and Retrieval
Growth, Change & Maintenance
• Change is a way of life– Applications, data requirements, reports, etc.
will all change as new needs and requirements are found
– The Database and applications and will need to be modified to meet the needs of changes
11/28/2000 Information Organization and Retrieval
Another View of the Life Cycle
Operations5
Conversion3
PhysicalCreation
2Growth, Change
6
Integration4
Design1
11/28/2000 Information Organization and Retrieval
Database Models
• Models(2): DBMS types– Hierarchical– Network– Relational– Object-Oriented
11/28/2000 Information Organization and Retrieval
Database Data Models
• Hierarchical Model– Similar to data structures in programming
languages.Books
(id, title)
Publisher SubjectsAuthors
(first, last)
11/28/2000 Information Organization and Retrieval
Database Data Models
• Network Model– Provides for single entries of data and
navigational “links” through chains of data.
Subjects Books
Authors
Publishers
11/28/2000 Information Organization and Retrieval
Database Data Models
• Relational Model– Provides a conceptually simple model for data
as relations (typically considered “tables”) with all data visible.
Book ID Title pubid Author id1 Introductio 2 12 The history 4 23 New stuff ab 3 34 Another title 2 45 And yet more 1 5
pubid pubname1 Harper2 Addison3 Oxford4 Que
Authorid Author name1 Smith2 Wynar3 Jones4 Duncan5 Applegate
Subid Subject1 cataloging2 history3 stuff
Book ID Subid1 22 13 34 24 3
11/28/2000 Information Organization and Retrieval
Database Data Models
• Object Oriented Data Model– Encapsulates data and operations as “Objects”
Books(id, title)
Publisher SubjectsAuthors
(first, last)
11/28/2000 Information Organization and Retrieval
Database Design Process
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
11/28/2000 Information Organization and Retrieval
Database Design Process• Conceptual Requirements
– Systems Analysis Process• Examine all of the information sources used in existing
applications
• Identify the characteristics of each data element– numeric
– text
– date/time
– etc.
• Examine the tasks carried out using the information
• Examine results or reports created using the information
11/28/2000 Information Organization and Retrieval
Database Design Process
• Conceptual Model– Merge the collective needs of all applications
– Determine what Entities are being used• Some object about which information is to maintained
– What are the Attributes of those entities?• Properties or characteristics of the entity
• What attributes uniquely identify the entity
– What are the Relationships between entities• How the entities interact with each other?
11/28/2000 Information Organization and Retrieval
Database Design Process
• Logical Model– How is each entity and relationship represented
in the Data Model of the DBMS• Hierarchic?
• Network?
• Relational?
• Object-Oriented?
11/28/2000 Information Organization and Retrieval
Database Design Process
• Internal Model– Choices of index file structure– Choices of data storage formats– Choices of disk layout
11/28/2000 Information Organization and Retrieval
Database Design Process
• External Model– User views of the integrated database – Making the old (or updated) applications work
with the new database design
11/28/2000 Information Organization and Retrieval
Developing a Conceptual Model• Overall view of the database that integrates all the
needed information discovered during the requirements analysis.
• Elements of the Conceptual Model are represented by diagrams, Entity-Relationship or ER Diagrams, that show the meanings and relationships of those elements independent of any particular database systems or implementation details.
11/28/2000 Information Organization and Retrieval
Entity
• An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information– Persons (e.g.: customers in a business,
employees, authors)– Things (e.g.: purchase orders, meetings, parts,
companies)
Employee
11/28/2000 Information Organization and Retrieval
Attributes
• Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it. (This is the Metadata for the entities.)
Employee
Last
Middle
First
Name SSN
Age
Birthdate
Projects
11/28/2000 Information Organization and Retrieval
Relationships
• Relationships are the associations between entities. They can involve one or more entities and belong to particular relationship types
11/28/2000 Information Organization and Retrieval
Relationships
ClassAttendsStudent
PartSuppliesproject
partsSupplier
Project
11/28/2000 Information Organization and Retrieval
Types of Relationships
• Concerned only with cardinality of relationship
TruckAssignedEmployee
ProjectAssignedEmployee
ProjectAssignedEmployee
1 1
n
n
1
m
Chen ER notation
11/28/2000 Information Organization and Retrieval
Other Notations
TruckAssignedEmployee
ProjectAssignedEmployee
ProjectAssignedEmployee
“Crow’s Foot”
11/28/2000 Information Organization and Retrieval
Other Notations
TruckAssignedEmployee
ProjectAssignedEmployee
ProjectAssignedEmployee
IDEFIX Notation
11/28/2000 Information Organization and Retrieval
More Complex Relationships
ProjectAssignedEmployee 4(2-10) 1
ProjectEvaluationEmployee
Manager
1/n/n
1/1/1
n/n/1
ManagesEmployee
SSN ProjectDate
Manages
Is Managed By
1
n
11/28/2000 Information Organization and Retrieval
Weak Entities
• Owe existence entirely to another entity
Order-lineContainsOrder
Invoice #
Part#
Rep#
QuantityInvoice#
11/28/2000 Information Organization and Retrieval
Supertype and Subtype Entities
ClerkIs one ofSales-rep
Invoice
Other
Employee
Sold
Manages