Improve Run Generation

download Improve Run Generation

of 30

Transcript of Improve Run Generation

  • 7/27/2019 Improve Run Generation

    1/30

    Improve Run Generation

    Overlap input,output, and internal CPU work. Reduce the number of runs (equivalently, increase

    average run length).

    DISK

    MEMORY

    DISK

  • 7/27/2019 Improve Run Generation

    2/30

    Internal Quick Sort

    6 2 8 5 11 10 4 1 9 7 3

    Use 6 as the pivot (median of 3).

    Input first, middle, and last blocks first.In-place partitioning.

    Input blocks from the ends toward the middle.

    Sort left and right groups recursively.

    Can begin output as soon as left most block is ready.

    4 2 3 5 1 6 10 11 9 7 8

  • 7/27/2019 Improve Run Generation

    3/30

    Alternative Internal Sort Scheme

    DISK

    DISK

    B1 B2 B3

    Partition into 3 areas, each may

    be more than 1 block in size.

  • 7/27/2019 Improve Run Generation

    4/30

    Steady State Operation

    Read from

    disk

    Write to

    disk

    Run

    generation

    Synchronization is done when the current input area gets

    full (the current output area will be empty at this time).

  • 7/27/2019 Improve Run Generation

    5/30

    DISK

    MEMORY

    DISK

    New Strategy

    Use 2 input and 2 output buffers.

    Rest of memory is used for a min loser tree.

    Input 1Input 0

    Output 0 Output 1

    Loser Tree

    Actually, 3 buffers adequate.

  • 7/27/2019 Improve Run Generation

    6/30

    Steady State Operation

    Read from

    disk

    Write to

    disk

    Run

    generation

    Synchronization is done when the active input buffer gets

    empty (the active output buffer will be full at this time).

  • 7/27/2019 Improve Run Generation

    7/30

    4 3 6 8 1 5 7 3 2 6 9 4 5 2 5 8

    4

    3

    8

    O0 O1

    I0

    I1

    Initialize

  • 7/27/2019 Improve Run Generation

    8/30

    4 3 6 8 1 5 7 3 2 6 9 4 5 2 5 8

    4

    6

    8

    3

    5

    1

    7

    Initialize

    O0 O1

    I0

    I1

  • 7/27/2019 Improve Run Generation

    9/30

    4 3 6 8 1 5 7 3 2 6 9 4 5 2 5 8

    4

    6

    8

    3

    5

    3

    7

    1

    6

    2

    9

    Initialize

    O0 O1

    I0

    I1

  • 7/27/2019 Improve Run Generation

    10/30

    4 3 6 8 1 5 7 3 2 6 9 4 5 2 5 8

    4

    6

    8

    3

    5

    3

    7

    2

    5

    2

    8

    1

    6

    4

    9

    Initialize

    O0 O1

    I0

    I1

  • 7/27/2019 Improve Run Generation

    11/30

    4 3 6 8 1 5 7 3 2 6 9 4 5 2 5 8

    4

    6

    8

    3

    5

    3

    7

    2

    5

    5

    8

    1

    6

    4

    9

    Initialize

    O0 O1

    I0

    I1

  • 7/27/2019 Improve Run Generation

    12/30

    4 3 6 8 1 5 7 3 2 6 9 4 5 2 5 8

    4

    6

    8

    3

    5

    3

    7

    2

    5

    5

    8

    2

    6

    4

    9

    Initialize

    O0 O1

    I0

    I1

  • 7/27/2019 Improve Run Generation

    13/30

    Generate Run 1

    14 3 6 8 5 7 3 2 6 9 4 5 2 5 8

    4

    6

    8

    3

    5

    3

    7

    2

    5

    5

    8

    2

    6

    4

    9

    O0 O1

    I0 I1

    3

    5

    4

  • 7/27/2019 Improve Run Generation

    14/30

    Generate Run 1

    4 3 6 8 5 7 3 2 6 9 4 5 2 5 8

    4

    6

    8

    3

    5

    3

    7

    2

    5

    5

    8

    2

    6

    4

    9

    O0 O1

    I0 I1

    3

    5

    4

    1

    3

    3

  • 7/27/2019 Improve Run Generation

    15/30

    5

    O0

    2

    3

    Generate Run 1

    4 3 6 8 5 7 3 6 9 4 5 2 5 8

    4

    6

    8

    3

    5

    3

    7

    2

    5

    5

    8

    3

    6

    4

    9

    O1

    I0 I1

    3

    5

    4

    1

    5

    4

  • 7/27/2019 Improve Run Generation

    16/30

    45

    O0

    2

    3

    Generate Run 1

    4 3 6 8 5 7 3 6 9 4 5

    2

    5 8

    4

    6

    8

    3

    5

    3

    7

    2

    5

    5

    8

    3

    6

    4

    9

    O1

    I0 I1

    3

    5

    4

    1

    5

    4

  • 7/27/2019 Improve Run Generation

    17/30

    5

    O0

    2

    34 3 6 8 5 7 3 6 9 4 5

    2

    5 8

    4

    6

    8

    3

    5

    3

    7

    2

    5

    5

    8

    3

    6

    4

    9

    O1

    I0 I1

    1

    5

    4

    4

    1

    9

    2

  • 7/27/2019 Improve Run Generation

    18/30

    5

    O0

    2

    34 3 6 8 5 7 3 6 9 4 5

    2

    5 8

    4

    6

    8

    3

    5

    3

    7

    4

    5

    5

    8

    3

    6

    4

    9

    O1

    I0 I1

    1

    5

    4

    4

    1

    9

    2

    Continue With Run 1

  • 7/27/2019 Improve Run Generation

    19/30

    O1

    3

    45

    O0

    2

    4 3 6 8 5 7 3 6 9 4 5

    2

    5 8

    4

    6

    8

    3

    5

    3

    7

    4

    5

    5

    8

    4

    6

    4

    9

    I0 I1

    1

    5

    1

    1

    9

    2

    Continue With Run 1

    1

    5

  • 7/27/2019 Improve Run Generation

    20/30

    1

    O1

    3

    45

    O0

    2

    4 3 6 8 5 7

    3

    6 9 4 5

    2

    5 8

    4

    6

    8

    3

    5

    3

    7

    4

    5

    5

    8

    4

    6

    4

    9

    I0 I1

    1

    5

    1

    1

    9

    2

    Continue With Run 1

    5

    9

    9

    5

    7

  • 7/27/2019 Improve Run Generation

    21/30

    91

    O1

    3

    45

    O0

    2

    4

    3

    6 8 5 7

    3

    6 9 4 5

    2

    5 8

    4

    6

    8

    3

    5

    3

    7

    4

    5

    5

    8

    4

    6

    4

    9

    I0 I1

    1

    5

    1

    1

    9

    2

    Continue With Run 1

    5

    9

    5

    7

    2

  • 7/27/2019 Improve Run Generation

    22/30

    91

    O1

    3

    45

    O0

    4

    3

    6 8 5 7

    3

    6 9 4 5 5 8

    4

    6

    8

    3

    5

    3

    7

    4

    5

    5

    8

    4

    6

    4

    9

    I0 I1

    5

    1

    6

    1

    3

    5

    9

    5

    7

    2

  • 7/27/2019 Improve Run Generation

    23/30

    91

    O1

    3

    45

    O0

    4

    3

    6 8 5 7

    3

    6 9 4 5 5 8

    4

    6

    8

    3

    5

    3

    7

    4

    5

    5

    8

    4

    6

    4

    9

    I0 I1

    5

    1

    6

    1

    3

    5

    9

    5

    7

    2

    Continue With Run 1

    2

  • 7/27/2019 Improve Run Generation

    24/30

    2 91

    O1

    3

    45

    O0

    4

    3

    6 8 5 7

    3

    6 9 4 5 5 8

    4

    6

    8

    3

    5

    3

    7

    4

    5

    5

    8

    4

    6

    4

    9

    I0 I1

    5

    1

    6

    1

    3

    5

    9

    5

    7

    Continue With Run 1

    2

    6

    6

    5

  • 7/27/2019 Improve Run Generation

    25/30

    2 91

    O1

    3

    45

    O0

    4

    3

    6 8 5 7

    3

    6 9

    4

    5 5 8

    4

    6

    8

    3

    5

    3

    7

    4

    5

    5

    8

    4

    6

    4

    9

    I0 I1

    5

    1

    6

    1

    3

    5

    9

    5

    7

    Continue With Run 1

    2

    6

    6

    5

    1

    1

    9

    5

  • 7/27/2019 Improve Run Generation

    26/30

    Let kbe number of external nodes in loser

    tree.

    Run size >= k.

    Sorted input => 1 run.

    Reverse of sorted input => n/kruns.

    Average run size is ~2k.

  • 7/27/2019 Improve Run Generation

    27/30

    Memory capacity = m records.

    Run size using fill memory, sort, and output

    run scheme = m.

    Use loser tree scheme.

    Assume block size isb records.

    Need memory for4 buffers (4b records).

    Loser tree k = m 4b.

    Average run size = 2k = 2(m 4b).

    2k >= m when m >= 8b.

  • 7/27/2019 Improve Run Generation

    28/30

    Assumeb = 100.

    m 600 1000 5000 10000

    k 200 600 4600 9600

    2k

    400 1200 9200 19200

  • 7/27/2019 Improve Run Generation

    29/30

    Total internal processing time using fillmemory, sort, and output run scheme

    = O((n/m) m log m) = O(n log m).

    Total internal processing time using losertree = O(n log k).

    Loser tree scheme generates runs that differ

    in their lengths.

  • 7/27/2019 Improve Run Generation

    30/30

    4 3 6 9

    Merging Runs Of Different Length

    4 3

    6

    9

    7 15

    22

    7

    13

    22