Sorting
-
Upload
jonas-suarez -
Category
Documents
-
view
12 -
download
0
description
Transcript of Sorting
1
Sorting
• We have actually seen already two efficient ways to sort:
2
A kind of “insertion” sort
• Insert the elements into a red-black tree one by one
• Traverse the tree in in-order and collect the keys
• Takes O(nlog(n)) time
3
Heapsort (Willians, Floyd, 1964)
• Put the elements in an array• Make the array into a heap• Do a deletemin and put the
deleted element at the last position of the array
4
Put the elements in the heap
65
79
26
3323
2924 1519
79 65 26 24 19 15 29 23 33 40 7
740
Q
5
65
79
26
3323
2924 1519
79 65 26 24 19 15 29 23 33 40 7
740
Q
Make the elements into a heap
6
Make the elements into a heap
65
79
26
3323
2924 1519
79 65 26 24 19 15 29 23 33 40 7
740
Q
Heapify-down(Q,4)
7
65
79
26
3323
2924 157
79 65 26 24 7 15 29 23 33 40 19
1940
Q
Heapify-down(Q,4)
8
65
79
26
3323
2924 157
79 65 26 24 7 15 29 23 33 40 19
1940
Q
Heapify-down(Q,3)
9
65
79
26
3324
2923 157
79 65 26 23 7 15 29 24 33 40 19
1940
Q
Heapify-down(Q,3)
10
65
79
26
3324
2923 157
79 65 26 23 7 15 29 24 33 40 19
1940
Q
Heapify-down(Q,2)
11
65
79
15
3324
2923 267
79 65 15 23 7 26 29 24 33 40 19
1940
Q
Heapify-down(Q,2)
12
65
79
15
3324
2923 267
79 65 15 23 7 26 29 24 33 40 19
1940
Q
Heapify-down(Q,1)
13
7
79
15
3324
2923 2665
79 7 15 23 65 26 29 24 33 40 19
1940
Q
Heapify-down(Q,1)
14
7
79
15
3324
2923 2619
79 7 15 23 19 26 29 24 33 40 65
6540
Q
Heapify-down(Q,1)
15
7
79
15
3324
2923 2619
79 7 15 23 19 26 29 24 33 40 65
6540
Q
Heapify-down(Q,0)
16
79
7
15
3324
2923 2619
7 79 15 23 19 26 29 24 33 40 65
6540
Q
Heapify-down(Q,0)
17
19
7
15
3324
2923 2679
7 19 15 23 79 26 29 24 33 40 65
6540
Q
Heapify-down(Q,0)
18
19
7
15
3324
2923 2640
7 19 15 23 40 26 29 24 33 79 65
6579
Q
Heapify-down(Q,0)
20
Summery
• We can build the heap in linear time (we already did this analysis)
• We still have to deletemin the elements one by one in order to sort that will take O(nlog(n))
21
Quicksort (Hoare 1961)
22
quicksort
Input: an array A[p, r]
Quicksort (A, p, r) if (p < r)
then q = Partition (A, p, r) //q is the position of the pivot element
Quicksort (A, p, q-1) Quicksort (A, q+1, r)
23
2 8 7 1 3 5 6 4
i j
2 8 7 1 3 5 6 4
i j
2 8 7 1 3 5 6 4
i j
2 8 7 1 3 5 6 4
i j
2 1 7 8 3 5 6 4
i j
p r
24
2 1 7 8 3 5 6 4
i j
2 1 3 8 7 5 6 4
i j
2 1 3 8 7 5 6 4
i j
2 1 3 8 7 5 6 4
i j
2 1 3 4 7 5 6 8
i j
25
2 8 7 1 3 5 6 4p r
Partition(A, p, r) x ←A[r]
i ← p-1 for j ← p to r-1
do if A[j] ≤ x then i ← i+1 exchange A[i] ↔ A[j] exchange A[i+1] ↔A[r] return i+1
26
Analysis
• Running time is proportional to the number of comparisons
• Each pair is compared at most once O(n2)
• In fact for each n there is an input of size n on which quicksort takes cn2 Ω(n2)
27
But
• Assume that the split is even in each iteration
28
T(n) = 2T(n/2) + bn
How do we solve linear recurrences like this ? (read Chapter 4)
29
Recurrence tree
T(n/2)
bn
T(n/2)
30
Recurrence tree
bn/2
bn
bn/2
T(n/4)T(n/4)T(n/4)T(n/4)
31
Recurrence tree
bn/2
bn
bn/2
T(n/4)T(n/4)T(n/4)T(n/4)logn
In every level we do bn comparisonsSo the total number of comparisons is O(nlogn)
34
Observations
• We can’t guarantee good splits
• But intuitively on random inputs we will get good splits
35
Randomized quicksort
• Use randomized-partition rather than partition
Randomized-partition (A, p, r) i ← random(p,r)
exchange A[r] ↔ A[i] return partition(A,p,r)
36
• On the same input we will get a different running time in each run !
• Look at the average for one particular input of all these running times
37
Expected # of comparisons
Let X be the expected # of comparisons
This is a random variable
Want to know E(X)
38
Expected # of comparisons
Let z1,z2,.....,zn the elements in sorted order
Let Xij = 1 if zi is compared to zj and 0 otherwise
So,
1n
1i
n
1ijijXX
39
n 1 n n 1 n
ij iji 1 j i 1 i 1 j i 1
E X E X E X
by linearity of expectation
n 1 n
i ji 1 j i 1
Pr{z is compared to z }
40
n 1 n n 1 n
ij iji 1 j i 1 i 1 j i 1
E X E X E X
by linearity of expectation
n 1 n
i 1 j ii j
1
Pr{z is compared to z }
41
Consider zi,zi+1,.......,zj ≡ Zij
Claim: zi and zj are compared either zi or zj is the first chosen in Zij
Proof: 3 cases:– {zi, …, zj} Compared on this
partition, and never again.
– {zi, …, zj} the same
– {zi, …, zk, …, zj} Not compared on this partition. Partition separates them, so no future partition uses both.
42
= 1/(j-i+1) + 1/(j-i+1)= 2/(j-i+1)
Pr{zi is compared to zj}
= Pr{zi or zj is first pivot chosen from Zij} just explained
= Pr{zi is first pivot chosen from Zij} +
Pr{zj is first pivot chosen from Zij}
mutually exclusivepossibilities
43
1n
1i
n
1ij 1ij
2XE
n 1 n i+1
i 1 k 2
2
kSimplify with a change of variable, k=j-i+1.
1n
1i
n
1k k
2Simplify and overestimate, by adding terms.
1n
1i
n lgO
n) lg O(n
44
Lower bound for sorting in the comparison model
45
A lower bound
• Comparison model: We assume that the operation from which we deduce order among keys are comparisons
• Then we prove that we need Ω(nlogn) comparisons on the worst case
46
Model the algorithm as a decision tree
12 1
2
1 2
2
12
3
3
2
2
1
3
3
1
1
21
3
3
2
2
3
1
1
1
1
2
3
3
1
1
3
2
2
47
Important Observations
• Every algorithm can be represented as a (binary) tree like this
• Each path corresponds to a run on some input
• The worst case # of comparisons corresponds to the longest path
48
The lower bound
Let d be the length of the longest path
#leaves ≤ 2dn! ≤
log2(n!) ≤d
49
Lower Bound for Sorting
• Any sorting algorithm based on comparisons between elements requires (N log N) comparisons.
50
Beating the lower bound
• We can beat the lower bound if we can deduce order relations between keys not by comparisons
Examples:• Count sort• Radix sort
51
Linear time sorting
• Or assume something about the input: random, “almost sorted”
52
Sorting an almost sorted input
• Suppose we know that the input is “almost” sorted
• Let I be the number of “inversions” in the input: The number of pairs ai,aj such that i<j and ai>aj
53
Example
1, 4 , 5 , 8 , 3
I=3
8, 7 , 5 , 3 , 1 I=10
54
• Think of “insertion sort” using a list
• When we insert the next item ak, how deep it gets into the list?
• As the number of inversions ai,ak for i < k lets call this Ik
55
Analysis
The running time is:
1
n
jj
I n I n
56
Thoughts
• When I=Ω(n2) the running time is Ω(n2)
• But we would like it to be O(nlog(n)) for any input, and faster whan I is small
57
Finger red black trees
58
Finger treeTake a regular search tree and reverse the direction of the pointers on the rightmost spine
We go up from the last leaf until we find the subtree containing the item and we descend into it
59
Finger treesSay we search for a position at distance d from the end
Then we go up to height O(log(d))
Insertions and deletions still take O(log n) worst case time but O(log(d)) amortized time
So search for the dth position takes O(log(d)) time
60
Back to sorting
• Suppose we implement the insertion sort using a finger search tree
• When we insert item k then d=O(Ik) and it take O(log(Ik)) time
61
Analysis
The running time is:
1
( log( ) )n
jj
O I n
Since ∑Ij = I this is at most
logI
O n nn
62
Selection
Find the kth element
63
Randomized selection
Randomized-select (A, p, r,k) if p=r then return A[p]
q←randomized-partition(A,p,r) j ← q-p+1 if j=k then return A[q] else if k < j then return randomized-select(A,p,q-
1,k) else return randomized-select(A,q+1,r,k-j)
65
Expected running time
With probability 1/n, A[p,q] contains exactly k elements, for k=1,2,…,n
1
1( ( )) ( ) ( (max( 1, )))
n
k
E T n O n E T k n kn
66
Assume n is even
1
1( ( )) ( ) ( (max( 1, )))
n
k
E T n O n E T k n kn
( ( 1)) ( ( 2)) ....
1( ( )) ( )
2 2
1 ...... ( ( 1))2
E T n E T n
n nE T n O n E T E T
n
nE T E T n
67
In general
1
1( ( )) ( ) ( (max( 1, )))
n
k
E T n O n E T k n kn
1
/ 2
2( ( )) ( ) ( ( ))
n
k n
E T n O n E T kn
68
Solve by “substitution”
1
/ 2
2( ( )) ( ) ( ( ))
n
k n
E T n O n E T kn
Assume T(k) ≤ ck for k < n, and prove T(n) ≤ cn
1
/ 2
2( ( ))
n
k n
E T n an ckn
69
Solve by “substitution”1
/ 2
2( ( ))
n
k n
E T n an ckn
/ 2 11
1 1
2 nn
k k
can k k
n
/ 2 ( / 2 1)2 ( 1)
2 2
n nc n nan
n
70
( / 2 1) / 22 ( 1)( ( ))
2 2
n nc n nE T n an
n
2 ( 1) ( / 2 2)( / 2 1)
2 2
c n n n nan
n
232
4 2
c n nan
n
3
4
cnan
4
cncn an
Choose c ≥4a
cn
71
Selection in linear worst case time
Blum, Floyd, Pratt, Rivest, and Tarjan (1973)
72
5-tuples
6
2
9
5
1
73
Sort the tuples
9
6
5
2
1
74
Recursively find the median of the medians
9
6
5
2
1
75
Recursively find the median of the medians
9
6
5
2
1
7 10 1 3 2 11
76
Recursively find the median of the medians
9
6
5
2
1
7 10 1 3 2 11
78
Partition around the median of the medians
5
Continue recursively with the side that contains the kth element
79
Neither side can be large
5
≤ ¾n
≤ ¾n
80
The reason
1
3 2
9
6
5
2
1
7 10 11
≥
81
The reason
1
3 2
9
6
5
2
1
7 10 11
≤
82
Analysis
3 1( ) ( )
4 5T n O n T n T n
( ) ( )T n O n
83
Order statistics, a dynamic version
rank and select
84
The dictionary ADT
• Insert(x,D)• Delete(x,D)• Find(x,D): Returns a pointer to x if x ∊ D, and
a pointer to the successor or predecessor of x if x is not in D
85
Suppose we want to add to the dictionary ADT
• Select(k,D): Returns the kth
element in the dictionary:
An element x such that k-1 elements are smaller than x
86
Select(5,D)
9089
7773
70673426
2120194
87
Select(5,D)
9089
7773
70673426
2120194
88
9089777370673426
2120194
Can we still use a red-black tree ?
89
For each node v store # of leaves in the subtree of v
9089777370673426
2120194
22
4
2 2
4 4
2 2
8
12
90
Select(7,T)
9089777370673426
2120194
22
4
2 2
4 4
2 2
8
12
91
Select(7,T)
9089777370673426
2120194
22
4
2 2
4 4
2 2
8
12Select(3, )
92
Select(7,T)
9089777370673426
2120194
22
4
2 2
4 4
2 2
8
12
Select(3, )
93
Select(1,)
Select(7,T)
9089777370673426
2120194
22
4
2 2
4 4
2 2
8
12
94
Select(i,T)
Select(i,T): Select(i,root(T))
Select(k,v): if k = 1 then return v.left if k = 2 then return v.right if k ≤ (v.left).size
then return Select(k,v.left) else return Select(k – (v.left).size),v.right)
O(logn) worst case time
95
Rank(x,T)
• Return the index of x in T
96
Rank(x,T)
xNeed to return 9
97
9089777370673426
2120194
22
4
2 2
4 4
2 2
8
12
xSum up the sizes of the subtrees to the left of the path
98
Rank(x,T)
• Write the p-code
99
Insertion and deletions
• Consider insertion, deletion is similar
100
Insert
4
2
8
12
101
Insert (cont)
5
3
9
13
2
102
Easy to maintain through rotations
x
y
B
C
y
Ax
B C
<===>
A
size(x) ← size(B) + size(C)
size(y) ← size(A) + size(x)
103
Summary
• Insertion and deletion and other dictionary operations still take O(log n) time