Overview Thanks to Dr. Raj and Dr. Liu for sharing course materials; also S. Miner from Gordon...

31
Overview Thanks to Dr. Raj and Dr. Liu for sharing course materials; also S. Miner from Gordon College Database Concepts

Transcript of Overview Thanks to Dr. Raj and Dr. Liu for sharing course materials; also S. Miner from Gordon...

Overview

Thanks to Dr. Raj and Dr. Liu for sharing course materials; also S. Miner from Gordon

College

Database Concepts

Course Overview

• Broad introduction to database management systems (DBMS) and the design, implementation, and applications of databases.

• Topics:– an overview of DBMS architectures;– concepts and implementations of the relational model; – SQL; – database design and modeling techniques; – As time allows, others, such as transaction management,

indexing…

Who are you?

• Your name, program/year, hometown• Why are you taking this course?• Anything else??

Data Management

• What are some data management problems?• Previous solutions:– Pen and Paper– Rolodex Cards

Image Credit: ConcordSupplies.com

Data Management

• Why not a simple text file?• An Excel spreadsheet?• How about a more sophisticated structure:– Linked list– BST– Hash– Indexed file

• What are the problems with each of the structures mentioned above?

Problems

• Integrity constraints buried inapplication logic – hard to add to or change• Atomicity problems – what happens when the system crashes during an important operation?Concurrency issues – when multiple users work with the same data at the same timeSecurity issues – how to give someone access to some, but not all, of the data

Other Issues

Text file gets big - Searches become slowCustomer information changesDuplicationRelationship between data elements“Must-exist” data and “must-not-access” dataAd hoc analysis and retrieval

Why can’t we just use files?

• What do you think?

File Systems for Data: Drawbacks

• Data redundancy and inconsistency– Multiple file formats, information duplication

• Difficulty in accessing data – Must write a new program to carry out each new

task• Integrity issues– Integrity constraints embedded in programs

• Data updates– may leave database in inconsistent state

File Systems for Data: Drawbacks

• Concurrent access by multiple users– Needed for performance, but can lead to inconsistency

• Security problems– Who gets to see the data and modify it

• Update inconsistencies if multiple copies• Data may be in > 1 file, in > 1 location• Different implementations and structures; now need

people able to understand all in order to support• …

Database Management System (DBMS)

• Decouples applications from the files on the file system• • Programs go through the database to access data

stored in the underlying files• Manages very large amounts of data

– Collection of interrelated data about particular enterprise• Access to very large amounts of data

– Efficient, concurrent, secure and atomic access • Two people withdrawing money from the same account at

different ATMs

• Usable, convenient, and efficient environment• Provides set of programs for data access

DBMS Applications

• Banking– All transactions

• Airlines– Reservations, schedules

• Universities– Registration, grades

• Payroll Management– Manage employees, pay, taxes, and so on

• …

DBMS Time Line

• Pre-1960– Transition from punched card and tape

• 1960s– From file management to databases– Hierarchical Data Model– Multi-user access with network

• 1970s– Codd (IBM) Relational Model– Chen introduced Entity Relationship Model– Query languages developed (SQL)

DBMS Time Line (continued)

• 1980s– Client/Server DBs, Oracle, DB2• PC databases, DBase, Paradox, and more

– SQL standard for definition and manipulation• 1990s– Web-based information delivery– Trends• Expert/object/distributed/XML DBMS

• 2000s– Add scale to 1990s– Data integration, mining, and privacy

• Current Problem: Big Data

DBMS Users (Loosely Based on Interaction)

• Application programmers– Interact with system through Data Manipulation

Language (DML) calls• Naïve users– Invoke one of the previously written “permanent”

application programs• Sophisticated users– Form requests in a database query language

• Specialized users– Write specialized database applications that do not fit

into traditional data processing framework

Data Dictionary• In addition to storing data, the DBMS also storesmetadata – data about the data – in a data dictionary• A standard name for each data item that applicationsuse to access it• Where the data item is stored (which file and where inthe file)• Security constraints – rules about who is allowed toaccess which data can be applied at the data item level;these are enforced by the DBMS• Integrity constraints – which values are valid for dataitems; enforced by the DBMS

Abstraction

• Physical – where the data is actually stored (files)• Logical (conceptual) – describes data and datarelationships in the data• View – targeted end-user interfaces to database that highlights some data, hides others, and may include virtual fields computed from the data.• Data independence – changes at one abstractionlayer should not impact other layers

Database Administrator (DBA)

• Coordinates all enterprise DBMS activities– Understands data resources and needs

• Database administrator's duties– Schema and physical definition, organization, and

modification– Storage structure and access method– Specifying integrity constraints and security– Granting access– Monitors performance & maintenance

• DBA role– Varies from organization to organization

High-level View of DBMS Components

• Storage management– Data organization on secondary storage for processing

queries efficiently• Query processing component– Efficient query execution, algorithms for implementing

relational operators• Transaction management– Concurrency control; failure and recovery

• Application interface– APIs, query tools, administration tools

Application Architectures

• Two-tier architecture– Client programs communicate with DBMS server

• Three-tier architecture– Client programs communicate with middleware

server– Middleware server communicates with DBMS

server

Relational DBMS

• Relational database– Consist of a set of relations (tables)

• Languages– Programming languages– DBMS languages• Data Definition Language (DDL)• Data Manipulation Language (DML)

• Database application programming

A Sample Relational Database

tag

name

age

mom

12 Fido 2 0909 Fifi 8 0707 Fion

a12 null

13 Frisky

1 09

Dog

address

size color

13 Elm big red

4 Ash small

white

7 Oak big green

DogHouse

address

tag

13 Elm 12

4 Ash 09

7 Oak 07

null 13

HouseToDog

Data Definition Language (DDL)

• Notation to specify database schema– Includes storage space, usage, indexes, keys

create table dog (tag integer,name char(10), age integer,mom integer

)

Data Manipulation Language (DML)

• Language for accessing and manipulating data– DMLs also known as query languages• SQL, the most widely used query language

• Two classes of languages – Procedural or Low-level • User specifies what data is required and how to get

those data – Nonprocedural or High-level• User specifies what data is required without specifying

how to get the data• SQL is predominantly non-procedural

Data Manipulation is CRUD

• Create

• Retrieve

• Update

• Delete

insert into dog (tag,name,age) values (12, ‘Lady’, 5)

select name from dog where age > 3

update dog set age = age + 1 where tag = 12

delete from dog where age > 1

DBMS Programming

• Application programs access DBMS via– Language extensions to allow embedded SQL– Application program interfaces (APIs)• ODBC/JDBC that allow SQL queries to be sent to the

DBMS

Drawbacks

• The additional DBMS software layer comes with some costs• Each application incurs overhead by going through thedatabase to access its data• Applications cannot optimize access to data stored in thefiles on their own• Designers and programmers need more (albeit standardized)knowledge of how a DBMS works• Additional layer can lead to increased complexity (at least inthe short term)• Database and file systems are not “either / or” solutions, more like “both / and”

Recap - Concerns

• Data redundancy• Data inconsistency• Security constraints• Integrity constraints• Concurrency constraints

Problems

• Data redundancy - wasted space• Update issues – every copy ofthe data needs to be modified• Data inconsistency – sometimesevery copy is not modified• Data access issues (getting to justthe right data)• “There’s no program for that.”• Data isolation – merging data from disparate sources together