Advanced Topics: Indexes & Transactions

cs3431

Advanced Topics: Indexes & Transactions

Instructor: Mohamed Eltabakh [email protected]

Indexes

cs3431

Why Indexes With or without indexes, the query answer should be the

same

Indexes are needed for efficiency and fast access of data

cs3431

SELECT *

FROM Student

WHERE sNumber = 76544357;

Assume we have 10,000 students

Without index, we check all 10,000 students

With index, we can reach that student directly

Direct Access vs. Sequential Access

cs3431

SELECT *

FROM Student

WHERE sNumber = 76544357;

Without index, we check all 10,000 students

(sequential access)

With index, we can reach that student directly

(direct access)

What is an Index A index is an auxiliary file that makes it more efficient to search for

a record in the data file

The index is usually specified on one field of the file Although it could be specified on several fields

The index is stored separately from the base table

Each table may have multiple indexes

cs3431

sNumber sName address pNum

1 Dave 320FL 1

2 Greg 320FL 1

3 Matt 320FL 2

Student Can create an index on sNumber

Can create a second index on sName

Example: Index on sNumber


1 Dave 320FL 1

2 Greg 320FL 1

100 Matt 320FL 2

10 Matt … …

4 John … ..

3 Dave … ..

Student

1

2

3

4

10

100

Index on sNumber

Index file is always sorted

Index size is much smaller than the table size

Now any query (equality or range) on sNumber can be efficiently answered (Binary search on the index)

Example: Index on sName


1 Dave 320FL 1

2 Greg 320FL 1

100 Matt 320FL 2

10 Matt … …

4 John … ..

3 Dave … ..

Student

Dave

Dave

Greg

John

Matt

Matt

Index on sName

Duplicates values have duplicate entries in the index

Now any query (equality or range) on sName can be efficiently answered (Binary search on the index)

Creating an Index


1 Dave 320FL 1

2 Greg 320FL 1

100 Matt 320FL 2

10 Matt … …

4 John … ..

3 Dave … ..

Student

Create Index <name> On <tablename>(<colNames>);

Create Index sNumberIndex On Student(sNumber);

Create Index sNameIndex On Student(SName);

DB System knows how to:

1- create the index2- when and how to use it

DB System knows how to:

1- create the index2- when and how to use it

Multiple Predicates

cs3431


1 Dave 320FL 1

2 Greg 320FL 1

100 Matt 320FL 2

10 Matt 50WA …

4 John 50WA ..

3 Dave 200LA ..

Student

Create Index addessIndex On Student(address);

SELECT *

FROM Student

WHERE address = ‘320FL’

AND sName = ‘Dave’;

1- The best the DBMS can do is using addressIndex ‘320FL’2- From those tuples, check sName = ‘Dave’1- The best the DBMS can do is using addressIndex ‘320FL’2- From those tuples, check sName = ‘Dave’

Multi-Column Indexes Columns X, Y are frequently queried together (with AND)

Each column has many duplicates

Then, consider creating a multi-column index on X, Y


1 Dave 320FL 1

2 Greg 320FL 1

100 Matt 320FL 2

10 Matt 50WA …

4 John 50WA ..

3 Dave 200LA ..

Create Index nameAdd On Student(sName, address);

SELECT *

FROM Student

WHERE address = ‘320FL’

AND sName = ‘Dave’;

Directly returns this record only

Using an Index DBMS automatically figures out which index to use based

on the query

cs3431

SELECT *

FROM Student

WHERE sNumber = 76544357; sNumber sName address pNum

1 Dave 320FL 1

2 Greg 320FL 1

100 Matt 320FL 2

10 Matt … …

4 John … ..

3 Dave … ..

Student

Create Index sNumberIndex On Student(sNumber);

Create Index sNameIndex On Student(SName);

Automatically uses SNumberIndex

How Do Indexes Work?

cs3431

Types of Indexes

Primary vs. Secondary

Single-Level vs. Multi-Level (Tree Structure)

Clustered vs. Non-Clustered

cs3431

Primary vs. Secondary Indexes Index on the primary key of a relation is called primary index (only one)

Index on any other column is called secondary index (can be many)

In primary index, all values are unique

In secondary indexes, values may have duplicates

SSN sNumber sName address

pNum

11111 1 Dave 320FL 1

22222 2 Greg 320FL 1

33333 100 Matt 320FL 2

44444 10 Matt … …

55555 4 John … ..

66666 3 Dave … ..

StudentIndex on SSN is a Primary Index

Index on sNumber is a Secondary Index

Index on sName is a Secondary Index

Single-Level Indexes Index is one-level sorted list

Given a value v to query Perform a binary search in the index to find it (Fast) Follow the link to reach the actual record


1 Dave 320FL 1

2 Greg 320FL 1

100 Matt 320FL 2

10 Matt … …

4 John … ..

3 Dave … ..

Student

1

2

3

4

10

100

Index on sNumber

Multi-Level Index Build index on top of the index (can go multiple levels)

When searching for value v: Find the largest entry ≤ v, and follow its pointer

cs3431


1 Dave 320FL 1

2 Greg 320FL 1

100 Matt 320FL 2

10 Matt … …

4 John … ..

3 Dave … ..

Student

1

2

3

4

10

100

Index on sNumber

1

4

1st level

2nd level

Clustered vs. Non-Clustered

Assume there is index X on column CIf the records in the table are stored sorted based on C

X Clustered index Otherwise, X Non-Clustered index

Primary index is a clustered index

SSN sNumber sName address

11111 1 Dave 320FL

22222 2 Greg 320FL

33333 100 Matt 320FL

44444 10 Matt …

55555 4 John …

66666 3 Dave …

Student

11111

22222

33333

44444

55555

66666

1

2

3

4

10

100

Clustered index Non-Clustered index

Index Maintenance Indexes are used in queries

But, need to be maintained when data change Insert, update, delete

DBMS automatically handles the index maintenance When insert new records the indexed field is added to the index When delete records their values are deleted from the index When update an indexed value delete the old value from index &

insert the new value

There is a cost for maintaining an index, however its benefit is usually more (if used a lot)

cs3431

Summary of Indexes Indexes are auxiliary structures for efficient searching and

querying

Query answer is the same with or without index

What to index depends on which columns are frequently queried (in Where clause)

Main operations

cs3431

Create Index <name> On <tablename>(<colNames>);

Drop Index <name>;

Transactions

cs3431

What is a Transaction A set of operations on a database that are treated as one unit

Execute All or None

Transactions have semantics at the application level Want to reserve two seats in a flight Transfer money from account A to account B …

What if two users are reserving the same flight seat at the same time???

Transactions solve these problems

Transactions

By default, each SQL statement is a transaction

Can change the default behavior

SQL > Start transaction;

SQL > Insert ….

SQL > Update …

SQL > Delete ..

SQL > Select …

SQL> Commit | Rollback;

End transaction successfully

Cancel the transaction

All of these statements are now one unit(either all succeed all fail)

Transaction Properties Four main properties

Atomicity – A transaction if one atomic unit Consistency – A transaction ensures DB is consistent Isolation – A transaction is considered as if no other transaction was executing simultaneously Durability – Changes made by a transaction must persist

ACID: Atomicity, Consistency, Isolation, Durability

ACID properties are enforced by the DBMS

cs3431

Consistency Issue

Many users may update the data at the same time How to ensure the result is consistent

Update T

Set x = x + 2;

Update T

Set x = x * 3;

x

2

3

4

10

100

1 2

3

x

12

15

14

32

302

Wrong, Inconsistent data

Wrong, Inconsistent data

What is the right answer???

What is the right answer???

Serial Order of Transactions Given N concurrent transactions T1, T2, …TN

Serial order is any permutation of these transactions (N!) T1, T2, T3, …TN T2, T3, T1, …, TN …

DBMS will ensure that the end-result from executing the N transactions (concurrently) matches one of the serial order execution That is called Serializability As if transactions are executed in serial order

cs3431

Serializable Execution Given N concurrent transactions T1, T2, …TN

DBMS will execute them concurrently (at the same time) But, the final effect matches one of the serial order executions

Update T

Set x = x + 2;

Update T

Set x = x * 3;

x

2

3

4

10

100

x

12

15

18

36

306

x

8

11

14

32

302

Isolation Levels

Read Uncommitted

Read Committed

Repeatable Read

Serializable

cs3431

Gets stronger & avoids problems

That is the default in DBMS

That is the default in DBMS

1- READ UNCOMMITTED

Session 1

-------BEGIN TRANSACTION----- update cust set color='blue' where id=500;

-----------COMMIT------------

Session 2-------BEGIN TRANSACTION----- select color from cust where id=500; color ------red

select color from cust where id=500; color ----- blue

select color from cust where id=500; color ----- blue -----------COMMIT------------

||||V

Time

Dirty read(bad)

NonRepeatable read (bad)

2- READ COMMITTED

Session 1


-----------COMMIT------------


select color from cust where id=500; color ----- red

select color from cust where id=500; color ----- blue -----------COMMIT------------

||||V

Time

NonRepeatable read (bad)

Dirty Read SolvedDirty Read Solved

2- READ COMMITTED

Session 1

-------BEGIN TRANSACTION----- delete cust where id=500;

-----------COMMIT------------



select color from cust where id=500; color ----- -----------COMMIT------------

||||V

Time

Phantom (bad)

3- REPEATABLE READ

Session 1


-----------COMMIT------------



select color from cust where id=500; color ----- red-----------COMMIT------------

||||V

Time

NonRepeatable Read SolvedNonRepeatable Read Solved

3- REPEATABLE READ

Session 1

-------BEGIN TRANSACTION----- delete cust where id=500;

-----------COMMIT------------



select color from cust where id=500; color ----- red-----------COMMIT------------

||||V

Time

Phantom (For Delete) SolvedPhantom (For Delete) Solved

3- REPEATABLE READ

Session 1

-------BEGIN TRANSACTION----- Insert into cust(id, color) values (500, ‘blue’);

-----------COMMIT------------

Session 2-------BEGIN TRANSACTION----- select id from cust where color=‘blue’; id --

select id from cust where color=‘blue’; id--

select id from cust where color=‘blue’; id-- 500-----------COMMIT------------

||||V

Time

Phantom Insert (bad)

4- SERIALIZABLE

Session 1

-------BEGIN TRANSACTION----- Insert into cust(id, color) values (500, ‘blue’);

-----------COMMIT------------

Session 2-------BEGIN TRANSACTION----- select id from cust where color=‘blue’; id --



-----------COMMIT------------

||||V

Time

Phantom SolvedPhantom Solved

Summary of Transactions Unit of work in DBMS

Either executed All or None

Ensures consistency among many concurrent transactions

Ensures persistent data once committed (using recovery techniques)

Main ACID properties Atomicity, Consistency, Isolation, Durability

cs3431

END !!!

cs3431

Final Exam Dec. 13, at 8:15am – 9:30am (75 mins) Closed book, open sheet Answer in the same exam sheet

Material Included ERD SQL (Select, Insert, Update, Delete) Views, Triggers, Assertions Cursors, Stored Procedures/Functions

Material Excluded Relational Model & Algebra Normalization Theory ODBC/JDBC Indexes and Transactions

Friday’s Lecture (Revision + short Quiz)

Friday’s Lecture (Revision + short Quiz)

Advanced Topics: Indexes & Transactions

Documents

Transcript of Advanced Topics: Indexes & Transactions