Sorting

98
1 Sorting • We have actually seen already two efficient ways to sort:

description

Sorting. We have actually seen already two efficient ways to sort:. A kind of “insertion” sort. Insert the elements into a red-black tree one by one Traverse the tree in in-order and collect the keys Takes O(nlog(n)) time. Heapsort (Willians, Floyd, 1964). Put the elements in an array - PowerPoint PPT Presentation

Transcript of Sorting

Page 1: Sorting

1

Sorting

• We have actually seen already two efficient ways to sort:

Page 2: Sorting

2

A kind of “insertion” sort

• Insert the elements into a red-black tree one by one

• Traverse the tree in in-order and collect the keys

• Takes O(nlog(n)) time

Page 3: Sorting

3

Heapsort (Willians, Floyd, 1964)

• Put the elements in an array• Make the array into a heap• Do a deletemin and put the

deleted element at the last position of the array

Page 4: Sorting

4

Put the elements in the heap

65

79

26

3323

2924 1519

79 65 26 24 19 15 29 23 33 40 7

740

Q

Page 5: Sorting

5

65

79

26

3323

2924 1519

79 65 26 24 19 15 29 23 33 40 7

740

Q

Make the elements into a heap

Page 6: Sorting

6

Make the elements into a heap

65

79

26

3323

2924 1519

79 65 26 24 19 15 29 23 33 40 7

740

Q

Heapify-down(Q,4)

Page 7: Sorting

7

65

79

26

3323

2924 157

79 65 26 24 7 15 29 23 33 40 19

1940

Q

Heapify-down(Q,4)

Page 8: Sorting

8

65

79

26

3323

2924 157

79 65 26 24 7 15 29 23 33 40 19

1940

Q

Heapify-down(Q,3)

Page 9: Sorting

9

65

79

26

3324

2923 157

79 65 26 23 7 15 29 24 33 40 19

1940

Q

Heapify-down(Q,3)

Page 10: Sorting

10

65

79

26

3324

2923 157

79 65 26 23 7 15 29 24 33 40 19

1940

Q

Heapify-down(Q,2)

Page 11: Sorting

11

65

79

15

3324

2923 267

79 65 15 23 7 26 29 24 33 40 19

1940

Q

Heapify-down(Q,2)

Page 12: Sorting

12

65

79

15

3324

2923 267

79 65 15 23 7 26 29 24 33 40 19

1940

Q

Heapify-down(Q,1)

Page 13: Sorting

13

7

79

15

3324

2923 2665

79 7 15 23 65 26 29 24 33 40 19

1940

Q

Heapify-down(Q,1)

Page 14: Sorting

14

7

79

15

3324

2923 2619

79 7 15 23 19 26 29 24 33 40 65

6540

Q

Heapify-down(Q,1)

Page 15: Sorting

15

7

79

15

3324

2923 2619

79 7 15 23 19 26 29 24 33 40 65

6540

Q

Heapify-down(Q,0)

Page 16: Sorting

16

79

7

15

3324

2923 2619

7 79 15 23 19 26 29 24 33 40 65

6540

Q

Heapify-down(Q,0)

Page 17: Sorting

17

19

7

15

3324

2923 2679

7 19 15 23 79 26 29 24 33 40 65

6540

Q

Heapify-down(Q,0)

Page 18: Sorting

18

19

7

15

3324

2923 2640

7 19 15 23 40 26 29 24 33 79 65

6579

Q

Heapify-down(Q,0)

Page 19: Sorting

20

Summery

• We can build the heap in linear time (we already did this analysis)

• We still have to deletemin the elements one by one in order to sort that will take O(nlog(n))

Page 20: Sorting

21

Quicksort (Hoare 1961)

Page 21: Sorting

22

quicksort

Input: an array A[p, r]

Quicksort (A, p, r) if (p < r)

then q = Partition (A, p, r) //q is the position of the pivot element

Quicksort (A, p, q-1) Quicksort (A, q+1, r)

Page 22: Sorting

23

2 8 7 1 3 5 6 4

i j

2 8 7 1 3 5 6 4

i j

2 8 7 1 3 5 6 4

i j

2 8 7 1 3 5 6 4

i j

2 1 7 8 3 5 6 4

i j

p r

Page 23: Sorting

24

2 1 7 8 3 5 6 4

i j

2 1 3 8 7 5 6 4

i j

2 1 3 8 7 5 6 4

i j

2 1 3 8 7 5 6 4

i j

2 1 3 4 7 5 6 8

i j

Page 24: Sorting

25

2 8 7 1 3 5 6 4p r

Partition(A, p, r) x ←A[r]

i ← p-1 for j ← p to r-1

do if A[j] ≤ x then i ← i+1 exchange A[i] ↔ A[j] exchange A[i+1] ↔A[r] return i+1

Page 25: Sorting

26

Analysis

• Running time is proportional to the number of comparisons

• Each pair is compared at most once O(n2)

• In fact for each n there is an input of size n on which quicksort takes cn2 Ω(n2)

Page 26: Sorting

27

But

• Assume that the split is even in each iteration

Page 27: Sorting

28

T(n) = 2T(n/2) + bn

How do we solve linear recurrences like this ? (read Chapter 4)

Page 28: Sorting

29

Recurrence tree

T(n/2)

bn

T(n/2)

Page 29: Sorting

30

Recurrence tree

bn/2

bn

bn/2

T(n/4)T(n/4)T(n/4)T(n/4)

Page 30: Sorting

31

Recurrence tree

bn/2

bn

bn/2

T(n/4)T(n/4)T(n/4)T(n/4)logn

In every level we do bn comparisonsSo the total number of comparisons is O(nlogn)

Page 31: Sorting

34

Observations

• We can’t guarantee good splits

• But intuitively on random inputs we will get good splits

Page 32: Sorting

35

Randomized quicksort

• Use randomized-partition rather than partition

Randomized-partition (A, p, r) i ← random(p,r)

exchange A[r] ↔ A[i] return partition(A,p,r)

Page 33: Sorting

36

• On the same input we will get a different running time in each run !

• Look at the average for one particular input of all these running times

Page 34: Sorting

37

Expected # of comparisons

Let X be the expected # of comparisons

This is a random variable

Want to know E(X)

Page 35: Sorting

38

Expected # of comparisons

Let z1,z2,.....,zn the elements in sorted order

Let Xij = 1 if zi is compared to zj and 0 otherwise

So,

1n

1i

n

1ijijXX

Page 36: Sorting

39

n 1 n n 1 n

ij iji 1 j i 1 i 1 j i 1

E X E X E X

by linearity of expectation

n 1 n

i ji 1 j i 1

Pr{z is compared to z }

Page 37: Sorting

40

n 1 n n 1 n

ij iji 1 j i 1 i 1 j i 1

E X E X E X

by linearity of expectation

n 1 n

i 1 j ii j

1

Pr{z is compared to z }

Page 38: Sorting

41

Consider zi,zi+1,.......,zj ≡ Zij

Claim: zi and zj are compared either zi or zj is the first chosen in Zij

Proof: 3 cases:– {zi, …, zj} Compared on this

partition, and never again.

– {zi, …, zj} the same

– {zi, …, zk, …, zj} Not compared on this partition. Partition separates them, so no future partition uses both.

Page 39: Sorting

42

= 1/(j-i+1) + 1/(j-i+1)= 2/(j-i+1)

Pr{zi is compared to zj}

= Pr{zi or zj is first pivot chosen from Zij} just explained

= Pr{zi is first pivot chosen from Zij} +

Pr{zj is first pivot chosen from Zij}

mutually exclusivepossibilities

Page 40: Sorting

43

1n

1i

n

1ij 1ij

2XE

n 1 n i+1

i 1 k 2

2

kSimplify with a change of variable, k=j-i+1.

1n

1i

n

1k k

2Simplify and overestimate, by adding terms.

1n

1i

n lgO

n) lg O(n

Page 41: Sorting

44

Lower bound for sorting in the comparison model

Page 42: Sorting

45

A lower bound

• Comparison model: We assume that the operation from which we deduce order among keys are comparisons

• Then we prove that we need Ω(nlogn) comparisons on the worst case

Page 43: Sorting

46

Model the algorithm as a decision tree

12 1

2

1 2

2

12

3

3

2

2

1

3

3

1

1

21

3

3

2

2

3

1

1

1

1

2

3

3

1

1

3

2

2

Page 44: Sorting

47

Important Observations

• Every algorithm can be represented as a (binary) tree like this

• Each path corresponds to a run on some input

• The worst case # of comparisons corresponds to the longest path

Page 45: Sorting

48

The lower bound

Let d be the length of the longest path

#leaves ≤ 2dn! ≤

log2(n!) ≤d

Page 46: Sorting

49

Lower Bound for Sorting

• Any sorting algorithm based on comparisons between elements requires (N log N) comparisons.

Page 47: Sorting

50

Beating the lower bound

• We can beat the lower bound if we can deduce order relations between keys not by comparisons

Examples:• Count sort• Radix sort

Page 48: Sorting

51

Linear time sorting

• Or assume something about the input: random, “almost sorted”

Page 49: Sorting

52

Sorting an almost sorted input

• Suppose we know that the input is “almost” sorted

• Let I be the number of “inversions” in the input: The number of pairs ai,aj such that i<j and ai>aj

Page 50: Sorting

53

Example

1, 4 , 5 , 8 , 3

I=3

8, 7 , 5 , 3 , 1 I=10

Page 51: Sorting

54

• Think of “insertion sort” using a list

• When we insert the next item ak, how deep it gets into the list?

• As the number of inversions ai,ak for i < k lets call this Ik

Page 52: Sorting

55

Analysis

The running time is:

1

n

jj

I n I n

Page 53: Sorting

56

Thoughts

• When I=Ω(n2) the running time is Ω(n2)

• But we would like it to be O(nlog(n)) for any input, and faster whan I is small

Page 54: Sorting

57

Finger red black trees

Page 55: Sorting

58

Finger treeTake a regular search tree and reverse the direction of the pointers on the rightmost spine

We go up from the last leaf until we find the subtree containing the item and we descend into it

Page 56: Sorting

59

Finger treesSay we search for a position at distance d from the end

Then we go up to height O(log(d))

Insertions and deletions still take O(log n) worst case time but O(log(d)) amortized time

So search for the dth position takes O(log(d)) time

Page 57: Sorting

60

Back to sorting

• Suppose we implement the insertion sort using a finger search tree

• When we insert item k then d=O(Ik) and it take O(log(Ik)) time

Page 58: Sorting

61

Analysis

The running time is:

1

( log( ) )n

jj

O I n

Since ∑Ij = I this is at most

logI

O n nn

Page 59: Sorting

62

Selection

Find the kth element

Page 60: Sorting

63

Randomized selection

Randomized-select (A, p, r,k) if p=r then return A[p]

q←randomized-partition(A,p,r) j ← q-p+1 if j=k then return A[q] else if k < j then return randomized-select(A,p,q-

1,k) else return randomized-select(A,q+1,r,k-j)

Page 61: Sorting

65

Expected running time

With probability 1/n, A[p,q] contains exactly k elements, for k=1,2,…,n

1

1( ( )) ( ) ( (max( 1, )))

n

k

E T n O n E T k n kn

Page 62: Sorting

66

Assume n is even

1

1( ( )) ( ) ( (max( 1, )))

n

k

E T n O n E T k n kn

( ( 1)) ( ( 2)) ....

1( ( )) ( )

2 2

1 ...... ( ( 1))2

E T n E T n

n nE T n O n E T E T

n

nE T E T n

Page 63: Sorting

67

In general

1

1( ( )) ( ) ( (max( 1, )))

n

k

E T n O n E T k n kn

1

/ 2

2( ( )) ( ) ( ( ))

n

k n

E T n O n E T kn

Page 64: Sorting

68

Solve by “substitution”

1

/ 2

2( ( )) ( ) ( ( ))

n

k n

E T n O n E T kn

Assume T(k) ≤ ck for k < n, and prove T(n) ≤ cn

1

/ 2

2( ( ))

n

k n

E T n an ckn

Page 65: Sorting

69

Solve by “substitution”1

/ 2

2( ( ))

n

k n

E T n an ckn

/ 2 11

1 1

2 nn

k k

can k k

n

/ 2 ( / 2 1)2 ( 1)

2 2

n nc n nan

n

Page 66: Sorting

70

( / 2 1) / 22 ( 1)( ( ))

2 2

n nc n nE T n an

n

2 ( 1) ( / 2 2)( / 2 1)

2 2

c n n n nan

n

232

4 2

c n nan

n

3

4

cnan

4

cncn an

Choose c ≥4a

cn

Page 67: Sorting

71

Selection in linear worst case time

Blum, Floyd, Pratt, Rivest, and Tarjan (1973)

Page 68: Sorting

72

5-tuples

6

2

9

5

1

Page 69: Sorting

73

Sort the tuples

9

6

5

2

1

Page 70: Sorting

74

Recursively find the median of the medians

9

6

5

2

1

Page 71: Sorting

75

Recursively find the median of the medians

9

6

5

2

1

7 10 1 3 2 11

Page 72: Sorting

76

Recursively find the median of the medians

9

6

5

2

1

7 10 1 3 2 11

Page 73: Sorting

78

Partition around the median of the medians

5

Continue recursively with the side that contains the kth element

Page 74: Sorting

79

Neither side can be large

5

≤ ¾n

≤ ¾n

Page 75: Sorting

80

The reason

1

3 2

9

6

5

2

1

7 10 11

Page 76: Sorting

81

The reason

1

3 2

9

6

5

2

1

7 10 11

Page 77: Sorting

82

Analysis

3 1( ) ( )

4 5T n O n T n T n

( ) ( )T n O n

Page 78: Sorting

83

Order statistics, a dynamic version

rank and select

Page 79: Sorting

84

The dictionary ADT

• Insert(x,D)• Delete(x,D)• Find(x,D): Returns a pointer to x if x ∊ D, and

a pointer to the successor or predecessor of x if x is not in D

Page 80: Sorting

85

Suppose we want to add to the dictionary ADT

• Select(k,D): Returns the kth

element in the dictionary:

An element x such that k-1 elements are smaller than x

Page 81: Sorting

86

Select(5,D)

9089

7773

70673426

2120194

Page 82: Sorting

87

Select(5,D)

9089

7773

70673426

2120194

Page 83: Sorting

88

9089777370673426

2120194

Can we still use a red-black tree ?

Page 84: Sorting

89

For each node v store # of leaves in the subtree of v

9089777370673426

2120194

22

4

2 2

4 4

2 2

8

12

Page 85: Sorting

90

Select(7,T)

9089777370673426

2120194

22

4

2 2

4 4

2 2

8

12

Page 86: Sorting

91

Select(7,T)

9089777370673426

2120194

22

4

2 2

4 4

2 2

8

12Select(3, )

Page 87: Sorting

92

Select(7,T)

9089777370673426

2120194

22

4

2 2

4 4

2 2

8

12

Select(3, )

Page 88: Sorting

93

Select(1,)

Select(7,T)

9089777370673426

2120194

22

4

2 2

4 4

2 2

8

12

Page 89: Sorting

94

Select(i,T)

Select(i,T): Select(i,root(T))

Select(k,v): if k = 1 then return v.left if k = 2 then return v.right if k ≤ (v.left).size

then return Select(k,v.left) else return Select(k – (v.left).size),v.right)

O(logn) worst case time

Page 90: Sorting

95

Rank(x,T)

• Return the index of x in T

Page 91: Sorting

96

Rank(x,T)

xNeed to return 9

Page 92: Sorting

97

9089777370673426

2120194

22

4

2 2

4 4

2 2

8

12

xSum up the sizes of the subtrees to the left of the path

Page 93: Sorting

98

Rank(x,T)

• Write the p-code

Page 94: Sorting

99

Insertion and deletions

• Consider insertion, deletion is similar

Page 95: Sorting

100

Insert

4

2

8

12

Page 96: Sorting

101

Insert (cont)

5

3

9

13

2

Page 97: Sorting

102

Easy to maintain through rotations

x

y

B

C

y

Ax

B C

<===>

A

size(x) ← size(B) + size(C)

size(y) ← size(A) + size(x)

Page 98: Sorting

103

Summary

• Insertion and deletion and other dictionary operations still take O(log n) time