Advanced Topics: Indexes & Transactions
description
Transcript of Advanced Topics: Indexes & Transactions
Indexes
cs3431
Why Indexes With or without indexes, the query answer should be the
same
Indexes are needed for efficiency and fast access of data
cs3431
SELECT *
FROM Student
WHERE sNumber = 76544357;
Assume we have 10,000 students
Without index, we check all 10,000 students
With index, we can reach that student directly
Direct Access vs. Sequential Access
cs3431
SELECT *
FROM Student
WHERE sNumber = 76544357;
Without index, we check all 10,000 students
(sequential access)
With index, we can reach that student directly
(direct access)
What is an Index A index is an auxiliary file that makes it more efficient to search for
a record in the data file
The index is usually specified on one field of the file Although it could be specified on several fields
The index is stored separately from the base table
Each table may have multiple indexes
cs3431
sNumber sName address pNum
1 Dave 320FL 1
2 Greg 320FL 1
3 Matt 320FL 2
Student Can create an index on sNumber
Can create a second index on sName
Example: Index on sNumber
sNumber sName address pNum
1 Dave 320FL 1
2 Greg 320FL 1
100 Matt 320FL 2
10 Matt … …
4 John … ..
3 Dave … ..
Student
1
2
3
4
10
100
Index on sNumber
Index file is always sorted
Index size is much smaller than the table size
Now any query (equality or range) on sNumber can be efficiently answered (Binary search on the index)
Example: Index on sName
sNumber sName address pNum
1 Dave 320FL 1
2 Greg 320FL 1
100 Matt 320FL 2
10 Matt … …
4 John … ..
3 Dave … ..
Student
Dave
Dave
Greg
John
Matt
Matt
Index on sName
Duplicates values have duplicate entries in the index
Now any query (equality or range) on sName can be efficiently answered (Binary search on the index)
Creating an Index
sNumber sName address pNum
1 Dave 320FL 1
2 Greg 320FL 1
100 Matt 320FL 2
10 Matt … …
4 John … ..
3 Dave … ..
Student
Create Index <name> On <tablename>(<colNames>);
Create Index sNumberIndex On Student(sNumber);
Create Index sNameIndex On Student(SName);
DB System knows how to:
1- create the index2- when and how to use it
DB System knows how to:
1- create the index2- when and how to use it
Multiple Predicates
cs3431
sNumber sName address pNum
1 Dave 320FL 1
2 Greg 320FL 1
100 Matt 320FL 2
10 Matt 50WA …
4 John 50WA ..
3 Dave 200LA ..
Student
Create Index addessIndex On Student(address);
SELECT *
FROM Student
WHERE address = ‘320FL’
AND sName = ‘Dave’;
1- The best the DBMS can do is using addressIndex ‘320FL’2- From those tuples, check sName = ‘Dave’1- The best the DBMS can do is using addressIndex ‘320FL’2- From those tuples, check sName = ‘Dave’
Multi-Column Indexes Columns X, Y are frequently queried together (with AND)
Each column has many duplicates
Then, consider creating a multi-column index on X, Y
sNumber sName address pNum
1 Dave 320FL 1
2 Greg 320FL 1
100 Matt 320FL 2
10 Matt 50WA …
4 John 50WA ..
3 Dave 200LA ..
Create Index nameAdd On Student(sName, address);
SELECT *
FROM Student
WHERE address = ‘320FL’
AND sName = ‘Dave’;
Directly returns this record only
Using an Index DBMS automatically figures out which index to use based
on the query
cs3431
SELECT *
FROM Student
WHERE sNumber = 76544357; sNumber sName address pNum
1 Dave 320FL 1
2 Greg 320FL 1
100 Matt 320FL 2
10 Matt … …
4 John … ..
3 Dave … ..
Student
Create Index sNumberIndex On Student(sNumber);
Create Index sNameIndex On Student(SName);
Automatically uses SNumberIndex
How Do Indexes Work?
cs3431
Types of Indexes
Primary vs. Secondary
Single-Level vs. Multi-Level (Tree Structure)
Clustered vs. Non-Clustered
cs3431
Primary vs. Secondary Indexes Index on the primary key of a relation is called primary index (only one)
Index on any other column is called secondary index (can be many)
In primary index, all values are unique
In secondary indexes, values may have duplicates
SSN sNumber sName address
pNum
11111 1 Dave 320FL 1
22222 2 Greg 320FL 1
33333 100 Matt 320FL 2
44444 10 Matt … …
55555 4 John … ..
66666 3 Dave … ..
StudentIndex on SSN is a Primary Index
Index on sNumber is a Secondary Index
Index on sName is a Secondary Index
Single-Level Indexes Index is one-level sorted list
Given a value v to query Perform a binary search in the index to find it (Fast) Follow the link to reach the actual record
sNumber sName address pNum
1 Dave 320FL 1
2 Greg 320FL 1
100 Matt 320FL 2
10 Matt … …
4 John … ..
3 Dave … ..
Student
1
2
3
4
10
100
Index on sNumber
Multi-Level Index Build index on top of the index (can go multiple levels)
When searching for value v: Find the largest entry ≤ v, and follow its pointer
cs3431
sNumber sName address pNum
1 Dave 320FL 1
2 Greg 320FL 1
100 Matt 320FL 2
10 Matt … …
4 John … ..
3 Dave … ..
Student
1
2
3
4
10
100
Index on sNumber
1
4
1st level
2nd level
Clustered vs. Non-Clustered
Assume there is index X on column CIf the records in the table are stored sorted based on C
X Clustered index Otherwise, X Non-Clustered index
Primary index is a clustered index
SSN sNumber sName address
11111 1 Dave 320FL
22222 2 Greg 320FL
33333 100 Matt 320FL
44444 10 Matt …
55555 4 John …
66666 3 Dave …
Student
11111
22222
33333
44444
55555
66666
1
2
3
4
10
100
Clustered index Non-Clustered index
Index Maintenance Indexes are used in queries
But, need to be maintained when data change Insert, update, delete
DBMS automatically handles the index maintenance When insert new records the indexed field is added to the index When delete records their values are deleted from the index When update an indexed value delete the old value from index &
insert the new value
There is a cost for maintaining an index, however its benefit is usually more (if used a lot)
cs3431
Summary of Indexes Indexes are auxiliary structures for efficient searching and
querying
Query answer is the same with or without index
What to index depends on which columns are frequently queried (in Where clause)
Main operations
cs3431
Create Index <name> On <tablename>(<colNames>);
Drop Index <name>;
Transactions
cs3431
What is a Transaction A set of operations on a database that are treated as one unit
Execute All or None
Transactions have semantics at the application level Want to reserve two seats in a flight Transfer money from account A to account B …
What if two users are reserving the same flight seat at the same time???
Transactions solve these problems
Transactions
By default, each SQL statement is a transaction
Can change the default behavior
SQL > Start transaction;
SQL > Insert ….
SQL > Update …
SQL > Delete ..
SQL > Select …
SQL> Commit | Rollback;
End transaction successfully
Cancel the transaction
All of these statements are now one unit(either all succeed all fail)
Transaction Properties Four main properties
Atomicity – A transaction if one atomic unit Consistency – A transaction ensures DB is consistent Isolation – A transaction is considered as if no other transaction was executing simultaneously Durability – Changes made by a transaction must persist
ACID: Atomicity, Consistency, Isolation, Durability
ACID properties are enforced by the DBMS
cs3431
Consistency Issue
Many users may update the data at the same time How to ensure the result is consistent
Update T
Set x = x + 2;
Update T
Set x = x * 3;
x
2
3
4
10
100
1 2
3
x
12
15
14
32
302
Wrong, Inconsistent data
Wrong, Inconsistent data
What is the right answer???
What is the right answer???
Serial Order of Transactions Given N concurrent transactions T1, T2, …TN
Serial order is any permutation of these transactions (N!) T1, T2, T3, …TN T2, T3, T1, …, TN …
DBMS will ensure that the end-result from executing the N transactions (concurrently) matches one of the serial order execution That is called Serializability As if transactions are executed in serial order
cs3431
Serializable Execution Given N concurrent transactions T1, T2, …TN
DBMS will execute them concurrently (at the same time) But, the final effect matches one of the serial order executions
Update T
Set x = x + 2;
Update T
Set x = x * 3;
x
2
3
4
10
100
x
12
15
18
36
306
x
8
11
14
32
302
Isolation Levels
Read Uncommitted
Read Committed
Repeatable Read
Serializable
cs3431
Gets stronger & avoids problems
That is the default in DBMS
That is the default in DBMS
1- READ UNCOMMITTED
Session 1
-------BEGIN TRANSACTION----- update cust set color='blue' where id=500;
-----------COMMIT------------
Session 2-------BEGIN TRANSACTION----- select color from cust where id=500; color ------red
select color from cust where id=500; color ----- blue
select color from cust where id=500; color ----- blue -----------COMMIT------------
||||V
Time
Dirty read(bad)
NonRepeatable read (bad)
2- READ COMMITTED
Session 1
-------BEGIN TRANSACTION----- update cust set color='blue' where id=500;
-----------COMMIT------------
Session 2-------BEGIN TRANSACTION----- select color from cust where id=500; color ------red
select color from cust where id=500; color ----- red
select color from cust where id=500; color ----- blue -----------COMMIT------------
||||V
Time
NonRepeatable read (bad)
Dirty Read SolvedDirty Read Solved
2- READ COMMITTED
Session 1
-------BEGIN TRANSACTION----- delete cust where id=500;
-----------COMMIT------------
Session 2-------BEGIN TRANSACTION----- select color from cust where id=500; color ------red
select color from cust where id=500; color ----- red
select color from cust where id=500; color ----- -----------COMMIT------------
||||V
Time
Phantom (bad)
3- REPEATABLE READ
Session 1
-------BEGIN TRANSACTION----- update cust set color='blue' where id=500;
-----------COMMIT------------
Session 2-------BEGIN TRANSACTION----- select color from cust where id=500; color ------red
select color from cust where id=500; color ----- red
select color from cust where id=500; color ----- red-----------COMMIT------------
||||V
Time
NonRepeatable Read SolvedNonRepeatable Read Solved
3- REPEATABLE READ
Session 1
-------BEGIN TRANSACTION----- delete cust where id=500;
-----------COMMIT------------
Session 2-------BEGIN TRANSACTION----- select color from cust where id=500; color ------red
select color from cust where id=500; color ----- red
select color from cust where id=500; color ----- red-----------COMMIT------------
||||V
Time
Phantom (For Delete) SolvedPhantom (For Delete) Solved
3- REPEATABLE READ
Session 1
-------BEGIN TRANSACTION----- Insert into cust(id, color) values (500, ‘blue’);
-----------COMMIT------------
Session 2-------BEGIN TRANSACTION----- select id from cust where color=‘blue’; id --
select id from cust where color=‘blue’; id--
select id from cust where color=‘blue’; id-- 500-----------COMMIT------------
||||V
Time
Phantom Insert (bad)
4- SERIALIZABLE
Session 1
-------BEGIN TRANSACTION----- Insert into cust(id, color) values (500, ‘blue’);
-----------COMMIT------------
Session 2-------BEGIN TRANSACTION----- select id from cust where color=‘blue’; id --
select id from cust where color=‘blue’; id--
select id from cust where color=‘blue’; id--
-----------COMMIT------------
||||V
Time
Phantom SolvedPhantom Solved
Summary of Transactions Unit of work in DBMS
Either executed All or None
Ensures consistency among many concurrent transactions
Ensures persistent data once committed (using recovery techniques)
Main ACID properties Atomicity, Consistency, Isolation, Durability
cs3431
END !!!
cs3431
Final Exam Dec. 13, at 8:15am – 9:30am (75 mins) Closed book, open sheet Answer in the same exam sheet
Material Included ERD SQL (Select, Insert, Update, Delete) Views, Triggers, Assertions Cursors, Stored Procedures/Functions
Material Excluded Relational Model & Algebra Normalization Theory ODBC/JDBC Indexes and Transactions
Friday’s Lecture (Revision + short Quiz)
Friday’s Lecture (Revision + short Quiz)