Improve Run Generation Overlap input,output, and internal CPU work. Reduce the number of runs...
-
date post
22-Dec-2015 -
Category
Documents
-
view
221 -
download
1
Transcript of Improve Run Generation Overlap input,output, and internal CPU work. Reduce the number of runs...
Improve Run Generation• Overlap input,output, and internal CPU work.• Reduce the number of runs (equivalently, increase average
run length).
DISK
MEMORY
DISK
Internal Quick Sort6 2 8 5 11 10 4 1 9 7 3
Use 6 as the pivot (median of 3).
Input first, middle, and last blocks first.
In-place partitioning.
Input blocks from the ends toward the middle.
Sort left and right groups recursively.
Can begin output as soon as left most block is ready.
4 2 3 5 1 6 10 11 9 7 8
Steady State Operation
Read from disk
Write to disk
Run generation
• Synchronization is done when the active input buffer gets empty (the active output buffer will be full at this time).
DISK
MEMORY
DISK
New Strategy
• Use 2 input and 2 output buffers.
• Rest of memory is used for a min loser tree.
Input 1Input 0
Output 0 Output 1
Loser Tree
Steady State Operation
Read from disk
Write to disk
Run generation
• Synchronization is done when the active input buffer gets empty (the active output buffer will be full at this time).
5
O0
2
3
Generate Run 1
4 3 6 8 5 7 3 6 9 4 5 2 5 8
4
6
8
3
5
3
7
2
5
5
8
3
6
4
9
O1
I0 I1
3
5
4
1
5
4
45
O0
2
3
Generate Run 1
4 3 6 8 5 7 3 6 9 4 5
2
5 8
4
6
8
3
5
3
7
2
5
5
8
3
6
4
9
O1
I0 I1
3
5
4
1
5
4
5
O0
2
34 3 6 8 5 7 3 6 9 4 5
2
5 8
4
6
8
3
5
3
7
4
5
5
8
3
6
4
9
O1
I0 I1
1
5
4
4
1
9
2
Continue With Run 1
O1
3
45
O0
2
4 3 6 8 5 7 3 6 9 4 5
2
5 8
4
6
8
3
5
3
7
4
5
5
8
4
6
4
9
I0 I1
1
5
1
1
9
2
Continue With Run 1
1
5
1
O1
3
45
O0
2
4 3 6 8 5 7
3
6 9 4 5
2
5 8
4
6
8
3
5
3
7
4
5
5
8
4
6
4
9
I0 I1
1
5
1
1
9
2
Continue With Run 1
5
9
9
5
7
91
O1
3
45
O0
2
4
3
6 8 5 7
3
6 9 4 5
2
5 8
4
6
8
3
5
3
7
4
5
5
8
4
6
4
9
I0 I1
1
5
1
1
9
2
Continue With Run 1
5
9
5
7
2
91
O1
3
45
O0
4
3
6 8 5 7
3
6 9 4 5 5 8
4
6
8
3
5
3
7
4
5
5
8
4
6
4
9
I0 I1
5
1
6
1
3
5
9
5
7
2
Continue With Run 1
2
2 91
O1
3
45
O0
4
3
6 8 5 7
3
6 9 4 5 5 8
4
6
8
3
5
3
7
4
5
5
8
4
6
4
9
I0 I1
5
1
6
1
3
5
9
5
7
Continue With Run 1
2
6
6
5
2 91
O1
3
45
O0
4
3
6 8 5 7
3
6 9
4
5 5 8
4
6
8
3
5
3
7
4
5
5
8
4
6
4
9
I0 I1
5
1
6
1
3
5
9
5
7
Continue With Run 1
2
6
6
5
1
1
9
5
Buffer Reduction
• We may reduce the number of buffers to 3.
• At any time, 1 us used to read into, 1 to write from, and the 3rd both feeds the loser tree and is filled from the tree.
• The 3 physical buffers rotate through these three roles.
• Let k be number of external nodes in loser tree.
• Run size >= k.
• Sorted input => 1 run.
• Reverse of sorted input => n/k runs.
• Average run size is ~2k.
• Memory capacity = m records.
• Run size using fill memory, sort, and output run scheme = m.
• Use loser tree scheme. Assume block size is b records. Need memory for 4 buffers (4b records). Loser tree k = m – 4b. Average run size = 2k = 2(m – 4b). 2k >= m when m >= 8b.
• Total internal processing time using fill memory, sort, and output run scheme = O((n/m) m log m) = O(n log m).
• Total internal processing time using loser tree = O(n log k).
• Loser tree scheme generates runs that differ in their lengths.