CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email:...

CPSC-608 Database Systems

Fall 2008

Instructor: Jianer ChenOffice: HRBB 309BPhone: 845-4259Email: chen@cs.tamu.edu

Notes #6

Optimizing Disk Access

(By reducing the seek time and rotational delay

via Operating Systems and Disk Controllers)

Optimizing Disk Access

(By reducing the seek time and rotational delay

via Operating Systems and Disk Controllers)3 or 5x

1 NCylinders Traveled

Average seek time = 20 msShortest seek time = 5 ms

Average rotational delay = 8 ms

Dealing with many random accesses(disk scheduling) • Suppose that we have a large (dynamic)

sequence of disk read/write tasks, on blocks randomly distributed in the disk.

• How do we order the tasks so that the total time can be minimized?

• Elevator algorithm

Elevator algorithm

Disk head makes sweeps across the disk, stops at a cylinder if a task reads/writes a block in the cylinder, and reverses its direction if no read/write tasks (at that moment) ahead.

Intuitively good, in particular when there are a large number of tasks reading/writing blocks uniformly distributed in the disk

Real-time mannerPrecise analysis is difficult

Dealing with a long sequence of data on disk • Data in consecutive cylinders• Larger buffer• Pre-fetch/double buffering• Disk arrays• Mirrored disks

Example. Sorting on disk (again)

• A relation R of 10M tuples takes 100K blocks

• Main memory can store 6400 blocks• A disk block read/write: 40 ms

seek time = 31 ms, rotational delay = 8 ms transfer time = 1 ms

• Two-phase Multiway Sorting on randomly distributed blocks takes about 4.5 hours.

• Also assume that a track holds 500 blocks, and that traversing one cylinder takes 5 ms

Data in consecutive cylinders

In phase 1, suppose that we have the input relation stored in consecutive tracks

We can read/write 6400 consecutive blocks between main memory and disks

Phase 1 read/write: 2*(100K/6400)(31 + 8 + 12*5 + 6400*1) ≈ 208000 ms < 4 minutes (save 2 hour)

Larger Buffer

In phase 2, we have 16 sublists, each takes a block in main memory, with 6384 blocks left.

If we use all these 6384 blocks for output buffer, and write them to disk only when they are all full:

Phase 2 writing: (100K/6384)(31 + 8 + 12*5 + 6384*1) = 104000 ms < 2 minutes (save 1

However

The reading in phase 2 seems harder to improve: it is kind of random.

Double Buffering

For applications where the read/write is predictable.

Have a program» Process B1» Process B2» Process B3

Single Buffer Solution

1. Read B1 Buffer2. Process Data in Buffer3. Read B2 Buffer4. Process Data in Buffer ...

Let P = time to process/blockR = time to read in 1 blockn = # blocks

Single buffer time = n(P+R)

Double Buffering

Memory:

Disk: A B C D GE F

Double Buffering

Memory:

Disk: A B C D GE F

Double Buffering

Memory:

Disk: A B C D GE F

process

Double Buffering

Memory:

Disk: A B C D GE F

process

if P R

• Double buffering time = R + nP

• Single buffering time = n(R+P)

P = Processing time/blockR = IO time/blockn = # blocks

Disk ArraysTaking the advantage that disk read/write

can be done in parallel between a single CPU and multiple disks.

logically one disk

Would not help if the interesting blocks are in the same disk

Mirrored Disks

Duplicating disks so that multiple reads in the same disk can be done in parallel.

A A B B

Writing is more (but not much) expensive

Disk Failures

• Partial Total• Intermittent Permanent

Coping with Disk Failures

• Detection– Checksum

• Correction– Redundancy

At what level do we cope?

Operating System Level (Stable Storage)

Logical block Copy A Copy B

Database System Level (Log File)

Current DB Yesterday’s DB

Intermittent Failure Detection (Checksums)

• Idea: add n parity bits every m data bits– Ex.: m=8, n=1

• Block A: 01101000:1 (odd # of 1’s)• Block B: 11101110:0 (even # of 1’s)

• But suppose: Block A instead contains• Block A’: 01000000:1 (also has odd # of 1’s) 50% change of detection per parity bit

• More parity bits decrease the probability of an

undetected failure 1/2n (with n ≤ m independent parity bits)

Disk Crash (Disk Arrays)• RAIDs (Redundant Arrays of Inexpensive Drives)

logically one disk

Disk Arrays• RAID Level 1 (Mirroring)

– Keep exact copy of data on redundant disks

AA BB AA BB

Disk Arrays• RAID Level 4

– Keep only one redundant disk– Entire parity blocks on redundant disk

AA BB CC PP

Parity Blocks & Modulo-2 Sums

• Have an array of 3 data disks– Disk 1, block 1: 11110000– Disk 2, block 1: 10101010– Disk 3, block 1: 00111000

• … and 1 parity disk– Disk 4, block 1: 01100010

Note: - Sum over each column is always an even # of 1’s

- Mod-2 sum can recover any missing single row (e.g., a logical block)

Using Mod-2 Sums for Error Recovery

– Suppose we have:– Disk 1, block 1: 11110000

– Disk 2, block 1: ????????– Disk 3, block 1: 00111000– Disk 4, block 1: 01100010 ( Parity)

– Mod-2 sums for block 1 over disks 1,3,4: Disk 2, block 1: 10101010

Disk Arrays

• RAID Level 5 (Striping)– Like level 4, but balanced read & write

DDCCBBAA

Parity partition on each disk

Disk Arrays

• RAID Level 6 (error correction code) more powerful, can recover from

more than one task crashes.

CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email:...

Documents

Transcript of CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Email:...

Group 9 96306051 Annie Chen 96306075 Eric Chen 97306012 Christine Tsou 96308013 Charmian Chen

LICENTIATES - LICENCIES · Cheema, Marvi Komal Cheeseman, Sarah Jean Chehade, Alissa Chen, Bo Yang Chen, Chieh Heng Henry Chen, Chien-Shun Chen, Jacky Chen, Kevin Min Chen, Li Hong

ORDER SELECTION IN FINITE MIXTURE MODELS 1 Jiahua Chen ...jhchen/paper/ChenKhalili06.pdf · McLachlan (1987), Dacunha-Castelle and Gassiat (1999), Chen and Chen (2001), Chen, Chen

chen - WordPress.com · chen Author: Sala Subject: chen Keywords: chen Created Date: 20120220183844 ...

Zhiming Chen and Chongping Chen

Clinical Case #6 By Chen, chun-Yu (Kim) Chen, I -chun (Afra) Chen, I -chun (Afra)

Intriguing Relationship between Topology and Geometry Ergun Akleman & Jianer Chen A Story of Our Discovery that involves three continents and countless.

Group 9 96306051 Annie Chen 96306075 Eric Chen 97306012 Christine Tsou 96308013 Charmian Chen 98301073 Becky Wang.

This exam recognized by the Canadian Institute of Actuaries. · 118. Chen, Shuyi 119. Chen, Tingwei 120. Chen, Xinyue 121. Chen, Yi Yun 122. Chen, Zhi 123. Cheng, Ching -Wen 124.

Group 9 96306051 Annie Chen 96306075 Eric Chen 97306012 Christine Tsou 96308013 Charmian Chen

Liming Cai a,2, Jianer ChenaJ, Rodney G. Downeyb’4 ...mrfellows.net/papers/CCDF97_AdviceClasses.pdf · Liming Cai a,2, Jianer ChenaJ, ... parameter value k ... an oracle Turing

Brief Overview on Bigdata, Hadoop, MapReduce Jianer Chen CSCE-629, Fall 2015.

Advice Classes of Parameterized Tractabilityhomepages.mcs.vuw.ac.nz/~downey/publications/advice.pdf · Advice Classes of Parameterized Tractability Liming Cai and Jianer Chen y Department

I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 , Chih-Cheng Hung 3

Examining Uncertainty and Misspecification of Chen-Miao Chen

Kayaking Albert Chen Period 4 Albert Chen Period 4.

Darren J. Hartl, Ph.D.maestrolab.tamu.edu/.../sites/7/2016/12/Hartl_CV.pdf · Darren J. Hartl, Ph.D. O ce Address Department of Aerospace Engineering Phone: 979.862.7087 745 HRBB

Introduction to Tractability and Approximability of Optimization Problems JiAnER ChEn Computer

Survive YuwenTu 、 Liling Chen 、 Yuhan Chen 、 Haoyu Fang.

Randomized Process of Unknowns and Implicitly Enforced Bounds on Parameters Jianer Chen Department of Computer Science & Engineering Texas A&M University.