Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University...

34
Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR

Transcript of Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University...

Page 1: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Optimized Transaction Time Versioning Inside a Database Engine

Intern: Feifei Li, Boston University

Mentor: David Lomet, MSR

Page 2: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Transaction Time Support Provide access to prior states of a database:

Auditing the database Querying the historical data Mining the pattern of changes to a database

General approach: Build it outside the database engine Build it inside the database engine

Page 3: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Overview of A Versioned Database

Page Header

0Dynamic Slot Array

Record A

Record B

1

A.1

A.0

B.1

B.2

B.0

Page 4: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Key Challenges Timestamping

Eager timestamping vs. lazy timestamping Record takes the transaction commit timestamp

Recovery of timestamping information when system crashes

Indexing both current versions and historical versions simultaneously Storage utilization Query efficiency

Page 5: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Talk Outline Even “lazier” timestamping Deferred-key-split policy in the TSB tree Auditing the database

Page 6: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Talk Outline Even “lazier” timestamping Deferred-key-split policy in the TSB tree Auditing the database

Page 7: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Lazy Timestamping When do we timestamp records affected by a

transaction? Maintain a list of updated records and timestamp

them when transaction commits may lead to additional I/Os

Timestamp records when they are accessed by other queries, updates, page reads and writes later on.

Where to get the timestamping information?

Page 8: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Volatile timestamp table (VTT) and Persistent timestamp table (PTT)

Transaction 23 begins

Insert a record A

Transaction

commits

Insert a record B

DiskMain

memory

TID Ttime Refcnt

… … …

… … …23 NA 0

VTT

Record ATimestamp=

TID.23

Record BTimestamp=

TID.23

23 NA 123 NA 223 178432

2

TID Ttime

… …

… …

PTT

23 178432

1. Ensure that we can recoverthe timestamping informationif system crashes (VTT is gone!)

Page 9: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Timestamping the RecordTransacti

on 45 begins

Insert a record C

Transaction

commits

Update record A

Disk

Main memory

TID Ttime Refcnt

… … …

… … …23 NA 0

VTT

Record ATimestamp=

TID.23

Record BTimestamp=

TID.23

23 NA 123 NA 223 178432

2

TID Ttime

… …

… …

PTT

23 178432

45 NA 0

Record CTimestamp=

TID.45

Record ATimestamp=

178432

Record DTimestamp=

TID.88

88 342234Update

record D

Record DTimestamp=

342234

45 923121

45 NA 1

23 178432

1

45 923121

1

Page 10: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

The Checkpointing Process

Time

kth checkpoint

k-1th checkpoint

k-2th checkpoint

EOLLSN(U)LSN(P)

All the log records have been removed from the log and it is impossible to recover information earlier than LSN(P).

The dirty pages between LSN(P) and LSN(U) have been all flushed into the disk prior to our current checkpoint

The current checkpoint may not finish yet and log records with LSNs between LSN(U) and EOL are not guaranteed to be stable yet.  

LSN(P) LSN(U) EOL

k+1th checkpoint

Page 11: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Garbage Collection

LSN(P) LSN(U) EOL

Time

k-2 k-1checkpoint checkpoint checkpoint

k

PTT: Do Nothing VTT: Keep

PTT: Do Nothing VTT: Keep

PTT: Do Nothing VTT: Keep

PTT: Delete Entry VTT: Drop Entry

RefCnt != 0

RefCnt == 0

At the end of current checkpoint (EOL) interval, update: LSN(U) = EOL and LSN(P) = LSN(U)

rcz_lsn rcz_lsn

rcz_lsn rcz_lsnrcz_lsn

PTT: Delete Entry VTT: Drop Entry

PTT: Do Nothing VTT: Keep

rcz_lsn

Page 12: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Let’s Be Even More Lazier Don’t write an entry to PTT when transaction

commits Piggyback timestamping information to the

commit log record so that we still can recover if necessary

Batch updates entries from VTT to PTT at the checkpoint

Why this is better? Batch update using one transaction is faster

than write to PTT on a per transaction basis; A lot of entries have their Refcnt down to zero

by the time of checkpointing less number of writes to PTT

Page 13: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

The New StoryTransacti

on 23 begins

Insert a record A

Transaction

commits

DiskMain

memory

TID Ttime Refcnt

… … …

… … …23 NA 0

VTT

Record ATimestamp=

TID.23

23 NA 123223 178432

123 178432

0

TID Ttime

… …

… …

PTT

Transaction 76 begins

Insert a record B

Transaction

commits

Record BTimestamp=

TID.76

76 NA 076 NA 176 287544

1

Update A

76 287544

Record ATimestamp=

178432

Checkpoint

Page 14: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Be Careful When Updating the VTT and PTT at the Checkpoint

LSN(P) LSN(U) EOL

Time

k-2 k-1checkpoint checkpoint checkpoint

k

PTT: Do nothingVTT: Keep

PTT: Insert EntryVTT: Keep

RefCnt != 0

At the end of current checkpoint (EOL) interval, update:LSN(U) = EOL and LSN(P) = LSN(U)

cmt_lsn

cmt_lsn

PTT: Do nothingVTT: Keep

cmt_lsn

Page 15: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Be Careful When Updating the VTT and PTT at the Checkpoint

LSN(P) LSN(U) EOL

Time

k-2 k-1checkpoint checkpoint checkpoint

k

PTT: Do Nothing VTT: Keep

PTT: Do Nothing VTT: Drop Entry

RefCnt == 0

At the end of current checkpoint (EOL) interval, update:LSN(U) = EOL and LSN(P) = LSN(U)

rcz_lsn

rcz_lsncmt_lsn

cmt_lsn

cmt_lsn rcz_lsn

Insert Entry into PTT

cmt_lsn rcz_lsnDelete Entry from PTT and drop it from VTT

cmt_lsn rcz_lsn

PTT: Do Nothing VTT: Keep

Page 16: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Improvement Each record is 200 bytes The database is initialized with 5,000 records Generate workload containing up to 10,000

transactions Each transaction is an insert or an update (to

a newly inserted record by another transaction)

One checkpoint every 500 transactions Cost metrics:

Execution time Number of writes to PTT Number of batched updates

Page 17: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Execution Time

Audit Mode: Always keep everything in PTT

Page 18: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Number of Writes to PTT

Page 19: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Batched Update Analysis

Page 20: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Talk Outline Even “lazier” timestamping Deferred-key-split policy in the TSB tree Auditing the database

Page 21: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Time Split B (TSB) Tree Indexing both the current version pages and

historical version pages simultaneously Time split:

Create a new page and historical records in the current page is pushed into the new page

Key split: Proceed as the normal B+ tree key split

When to do time split and key split?

Page 22: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

What Happens Now

Page Header

0Dynamic Slot Array

1

A.1

A.0B.2

B.1

B.0

Record C

Insert C but page is full

Current page

2

What if the current page exceeds the key split threshold?

Record C

Page Header

Dynamic Slot Array

01

Current page

Page Header

0Dynamic Slot Array

1

A.1

A.0B.2

B.1

B.0

Historical page

Page 23: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Why We need a Key Split Threshold? Wait till the page is full then do the key split:

Leads to too many time splits and hence lots of replicas in the historical versions

What is the best value for the key split threshold? Too high: overall utilization drops Too low: current version utilization is reduced Find a balance

Page 24: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Could We Do Better? Key split immediately follows the time split

Leads to two pages with utilization 0.5threshksplit

If the new pages are not filled up quickly, storage utilization is wasted for no good reason

A fix Deferring the key split until the next time that

the page requires a key split Simulate as if a key spit has been performed on

previous occasion as it is in the current situation

Page 25: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Deferring the Key Split

Page Header

0Dynamic Slot Array

1

A.1

A.0B.2

B.1

B.0

Record C

Insert C but page is full

Historical page Current page

2

What if the current page exceeds the key split threshold?

Record C

Page Header

Dynamic Slot Array

01Current page

We still insert the record

A.0’

B.1’

D

2

Page is full again.Update D

Now we key split if last time the page has already satisfied the key split requirement.

D.1

D.0

2

Page Header

0Dynamic Slot Array

1

A.1

A.0B.2

B.1

B.0

We use the key splitvalue from the last occasionwhen a key split should hashappened.

3

Page 26: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Analytical Result We can show the following:

)1( maxmaxmaxno-deferno-deferdefer SVCU

incrup

inSVCUSVCU

Where in is the insertion ratio, up is the update ratio and cr is the compression ratio.

)]2ln

1(2ln

[2ln

SVCU

incrup

inSVCUSVCU

no-deferavg

no-deferavgdefer

avg

Page 27: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

The Goal of Our Design To ensure that for any particular version the

version utilization is at least kept above a specified threshold value.

Page 28: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Experiment 50,000 transactions Each transaction inserts or updates a record Varying the insert / update ratio in the

workload Each record is 200 bytes Utilize the delta-compression technique to

compress the historical versions (as they share a lot of common bits with newer version)

Page 29: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Single Version Current Utilization (SVCU)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

SVCU: Deferred-Key-Split vs. No Deferred-Key-Split

uncompressed deferuncompressed no deferAnalytical un-compressedCR=90% deferCR=90% no deferAnalytical CR=90%

Percent of Update

SV

CU

Page 30: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Multi-Version Utilization (MVU)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3

0.8

1.3

1.8

2.3

2.8

3.3

SVCU: Defer Key Split vs. No Defer Key Split

uncompressed deferuncompressed no deferCR=50% deferCR=50% no deferCR=90% deferCR=90% no defer

Percent of Update

MV

U

Page 31: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Talk Outline Even “lazier” timestamping Deferred-key-split policy in the TSB tree Auditing the database

Page 32: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Auditing A Database Transaction versioning support enables the

check of any prior state of a database Store the user id in PTT for each transaction

entry Any change to the database is traceable User id is grabbed from the current session

that a transaction belongs to

Page 33: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Conclusion Transaction versioning support inside a

database engine is one step closer to be even more practical

Other interesting applications that will become possible now with transaction versioning support?

Page 34: Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.

Thanks!