C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency:...

24
C6 Databases

Transcript of C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency:...

Page 1: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

C6 Databases

Page 2: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

2

Problems with traditional file environments

• Data Redundancy and Inconsistency: – Data redundancy: The presence of duplicate data in

multiple data files so that the same data are stored in more than one place or location

– Data inconsistency: The same attribute may have different values.

• Program-Data Dependence:– The coupling of data stored in files and the specific

programs required to update and maintain those files such that changes in programs require changes to the data and vice versa

Page 3: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

Lack of Flexibility A traditional file system can deliver routine

scheduled reports after extensive programming efforts, but it cannot deliver ad-hoc reports or respond to unanticipated information requirements in a timely fashion

Poor security Management may have no knowledge of who

is accessing or making changes to the organization’s data

Lack of data sharing and availability: Information cannot flow freely across different

functional areas or different parts of the organization.

3

More problems

Page 4: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

Relational Hierarchical and Network Object-oriented The focus of this lecture is on relational

databases.

4

Types of databases

Page 5: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

The Database Approach to Data Management

• Relational DBMS• Represents data as two-dimensional

tables called relations• Relates data across tables based on

common data element • Examples: Access, DB2, Oracle, MS SQL

Server

6-15

Page 6: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

6-16

The Database Approach to Data Management

Page 7: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

5

High Level

Data hierarchy

Page 8: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

In a database A group of values for the set of fields makes a record

(tuple) (row) A group of records makes a table (file) A group of tables (files) makes a database A field name serves to label each column of each

table

6

Important ideas

Record

Page 9: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

Fields can contain Strings (text characters) Numeric Sometimes very specific formats (e.g. Date)

8

Types of fields

Page 10: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

Select: Creates subset of rows that meet specific criteria

Join: Combines relational tables to provide users with information

requires a field in common between the tables being joined

Project: Create a subset consisting of certain columns of the table

results in a new smaller table

9

Types of operations in a relational database

Page 11: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

10

Page 12: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

6-18

The Database Approach to Data Management

Page 13: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

Selections are related to choosing table rows. Projections are related to choosing table

columns Joins are related to choosing records that

have a common value in a field shared by two tables.

11

Summary on db operations

Page 14: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.
Page 15: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

Conceptual design: Abstract model of database from a business perspective

Physical design: how data are actually structured on physical storage media

Entity-relationship diagram: Methodology for documenting databases illustrating relationships between database entities

Normalization: Process of creating small stable data structures from complex groups of data

Primary Keys: Each table requires a unique identifier (a field or a set of fields) 11

Designing a database

Page 16: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

Data definition language: Specifies content and structure of database and defines each data element

Data manipulation language: Used to process data in a database; permits users to extract data

Data dictionary: Stores definitions of data elements and data characteristics; can indicate usage and ownership

13

Data Base Management Systems

Page 17: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

The Database Approach to Data Management 6-29

Distributed database:• A database that is stored in more than one

physical location• Reduce the vulnerability of a single, massive

central site • Increase service and responsiveness to local

users• Can often run on smaller, less expensive

computers• Depend on high-quality telecommunications lines

Page 18: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

Also called Online Analytical Processing (OLAP)

Supports manipulation and analysis of large volumes of data from multiple dimensions/perspectives

14

Multidimensional data analysis

Page 19: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

15

Example of OLAP

Page 20: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

16

• A massive database that stores current and historical data

• Data are standardized into a common data model

• Consolidated across entire enterprise for management analysis and decision making

Data Warehouse

Page 21: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

17

Example of a data warehouse

Page 22: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

Tools for analyzing large pools of data Find hidden patterns and infer rules to

predict trends

18

Data mining

Page 23: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

Managing Data Resources

• Establishing an information policy• Specifies the organization’s rules for

sharing, disseminating, acquiring, standardizing, classifying, and inventorying information

• Data administration is responsible for specific policies and procedures through which data is managed

• Data governance• Database administration 6-40

Page 24: C6 Databases. 2 Problems with traditional file environments Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple.

Managing Data Resources

• Ensuring Data Quality• Data Quality Audit

– Structured survey of the accuracy and completeness of data in an information system

• Data cleansing– consists of activities for detecting and

correcting data in an information system

6-41