Database Management Systems CSE 590DB Introduction March 30, 1998.
-
Upload
sybil-lawson -
Category
Documents
-
view
215 -
download
0
Transcript of Database Management Systems CSE 590DB Introduction March 30, 1998.
Staff
Instructor: Alon Levy Sieg, Room 310, [email protected] Office hours: by appointment
TA: Rachel Pottinger Sieg 223, [email protected] Office hours: TBA
Mailing list: cse590db@csWeb page:
http://www.cs.washington.edu/education/courses/590db/98sp/
Purpose and Format
Purpose: Foundations of database management systems. Introduction to current research issues in databases.
Format: Lectures introducing the main topics Student presentations of selected research papers. Projects (for 3-credit takers)
Textbooks (none required)
Database Management Systems (Ramakrishnan) Foundations of Databases (Abiteboul, Hull & Vianu) Fundamentals of Database Systems (Elmasri and
Navathe) Database Systems (Silberschatz, Korth and
Sudarshan) Data and Knowledge based Systems (volumes I, II)
(Ullman) Readings in Database Systems (Stonbraker) Proceedings of SIGMOD, VLDB, PODS confrences.
Real Prerequisites
Operating systemsData structures and
algorithmsDistributed systemsComplexity theoryMathematical LogicKnowledge
Representation
User interface design
Programming languages
Artificial Intelligence (Search)
Greek, Hebrew, French
Why Use a DBMS?
• Large amounts of data (Giga’s)• Data is very structured• Persistent data• Valuable data• Performance requirements• Concurrent access to the data• Restricted access to data
All programs manipulate data, so why use a database?
Functionality of a DBMS
Persistent storage managementMultiple abstraction levels of the data (in
particular, provides a logical view).High level query and data manipulation
languageEfficient query processingTransaction managementResiliency: recovery from crashes.Interface with programming languages
Persistent Storage
Becomes a hard problem because of the interaction with the other levels of the DBMS: What are we storing? Efficient indexing Special issues due to resiliency requirements Exploit “semantic” knowledge
Issue: interaction with the operating system. Should we rely on the OS?
Levels of Abstraction
External Schema1 External Schema 2
Conceptual Schema
Physical Schema
Disk
•Conceptual schema: tables and their attributes•Physical schema: files, indexes hash tables.•External schema: views of the different applications, classes of users.
System catalog: la The component of the database that manages the meta data about the different levels of abstraction.
The Relational Model
Student Course Quarter
Charles CS 444 Fall, 1997
Dan CS 142 Winter,1998
… … …
Data is organized into tables with attributes. Rows in the tables are tuples.
The power of simplicity!
Logical Model Issues
What data model should we use? Relational, object-oriented, object-relational,
deductive database model, semi-structuredHow do we design a good conceptual schema?
(normal forms, index selection)Are we really providing an abstraction?How does this abstraction interact with the
programming language? (the impedance mismatch).
Querying a Database
Find all the students who have taken CSE444 in Winter, 1998.
S(tructured) Q(uery) L(anguage) select E.name from Enroll E where E.course=CSE444 and E.quarter=“Winter, 1998”
SQL also provides an update facilities. SQL: an acquired taste (try datalog first)
Issues in Query Languages
Does it provide the appropriate functionality? SQL books get thicker and thicker.
Expressive power of a query language.Ease of use (query by example)DeclarativityProvide guidance in writing “good”
queries?
Query Optimization
A query is a declarative specification of “what” you want.
A query execution plan is an imperative program to produce the answer.
Query optimization: produce an efficient query execution plan.
Issues: large search space of plans, cost estimation, semantic transformations
Real goal: avoid the bad plans.
Transaction Processing and Recovery
For efficient use of resources, we want concurrent access to data.
Systems sometimes crash.A “real” database guarantees ACIDACID:
Atomicity: all or nothing of a transaction. Consistency: always leave the DB consistent. Isolation: every transaction runs as if it’s the
only one in the system. Durability: if committed, then we really mean it.
Database Industry
Relational databases are a great success of theoretical ideas.
“Big 3” DBMS companies are among the largest software companies in the world.
IBM (with DB2) and Microsoft (SQL Server, Microsoft Access) are also important players.
$20B industryMoving to warehousing, decision support.
Why Use a DBMS?
Data independence and efficient access.
Reduced application development time.
Data integrity and security.Uniform data administrationConcurrent access and recovery from
crashes.
DBMS Development
Issues in scaleup: Indexing and storing large amounts of data. Algorithms: sorting, joins
“Novel” issues: Modeling data (models, constraints, schema
design). Query languages Optimization: from a declarative
specification to an efficient program.
Course (Rough) OutlineData models and their associated query
languages: Relational: SQL, datalog, relational algebra Object-oriented: OQL Object-relational: novel features in SQL3. Semi-structured: languages for querying graphs.
Storage (very briefly)Query optimization: foundations and current
limitations.