Randomized Algorithms for Selection and Sorting Prepared by John Reif, Ph.D. Analysis of Algorithms.
-
Upload
johnathan-ford -
Category
Documents
-
view
226 -
download
2
Transcript of Randomized Algorithms for Selection and Sorting Prepared by John Reif, Ph.D. Analysis of Algorithms.
Randomized Algorithms for Selection and Sorting
Prepared by
John Reif, Ph.D.
Analysis of Algorithms
Randomized Algorithms for Selection and Sorting
a) Randomized Samplingb) Selection by Randomized Samplingc) Sorting by Randomized Splitting:
Quicksort and Multisample Sorts
Readings
• Main Reading Selections:• CLR, Chapters 9
Comparison Problems
• Input set X of N distinct keys total ordering < over X
• Problems1) For each key x X
rank(x,X) = |{x’ X| x’ < x}| + 1
2) For each index i {1, …, N)select (i,X) = the key x X where i = rank(x,X)
3) Sort (X) = (x1, x2, …, xn)where xi = select (i,X)
Randomized Comparison Tree Model
Algorithm Samplerank s(x,X)
beginLet S be a random sample of X-{x}
of size soutput 1+ N/s [rank(x,S)-1]
end
Algorithm Samplerank s(x,X) (cont’d)
• Lemma 1– The expected value of
samplerank s(x,X) is rank (x,X)
Algorithm Samplerank s(x,X) (cont’d)
Let k = rank(x,X)
For a random y εX,
k-1 Prob (y< x) =
Nk-1
Hence E (rank(x,S)) = s 1N
Solving for k, we get
Nrank (x,X) = k = 1+ E[rank(x,S)-1] = E (samplerank(x,X))
s
proof
S is Random Sample of X of Size s
More Precise Bounds on Randomized Sampling (cont’d)
More Precise Bounds on Randomized Sampling
• Let S be a random sampling of X
• Let ri = rank(select(i,S),X)
-αi
2
N N Prob |r -i | > c log N N
s+1 s
Lemma
More Precise Bounds on Randomized Sampling (cont’d)
• ProofWe can bound ri by a Beta
distribution, implying
• Weak bounds follow from Chebychev inequality
• The Tighter bounds follow from Chernoff Bounds
i
2i 2
N (r ) = i
s+1i (s-i+1)
(r ) N(s+1) (s+2)
mean
Var
Subdivision by Random Sampling
• Let S be a random sample of X of size s
• Let k1, k2, …, ks be the elements of S in sorted order
• These elements subdivide X into
1 1
2 1 2
3 2 3
s+1
s+1 subsets x = {x ε X | x k }
X = {x ε X | k x k }
X = {x ε X | k x k }
X = {x ε X | x k}
Subdivision by Random Sampling (cont’d)
• How even are these subdivisions?
Subdivision by Random Sampling (cont’d)
• Lemma 3If random sample S in X is of size s and X is of size N,
then S divides X into subsets each of
-α(N-1) α (N) with prob 1 - N
ssize ln
Subdivision by Random Sampling (cont’d)
Proof• The number of (s+1) partitions of X is
• The number of partitions of X with one block of size v is
sN-1 (N-1)~
s s!
sN-v-1 (N-v-1)~
s s!
Subdivision by Random Sampling (cont’d)
• So the probability of a random (s+1) partition having a block size v is
-1
-1
N-v-1
s N-v-1 1 N-1~ 1 for Y=
N-1 N-1 v
s
1 (N-1) 1 ~ e = e N if v = N
s
1 since 1 e
s s
sY svsY
NY
Y
Y
lnY
Y
Randomized Algorithms for Selection
• “canonical selection algorithm”• Algorithm can select (i,X) input set X of N keys
index i {1, …, N}[0] if N=1 then output X[1] select a bracket B of X, so that
select (i,X) B with high prob. [2] Let i1 be the number of keys less
than any element of B[3] output can select (i-i1, B)
B found by random sampling
Hoar’s Selection Algorithm
• Algorithm Hselect (i,X) where 1 i N
beginif X = {x} then output x else choose a random splitter k X let B = {x X|x < k}if |B| i then output Hselect(i,B) else output Hselect(i-|B|, X-B)
end
Hoar’s Selection Algorithm (cont’d)
• Sequential time bound T(i,N) has mean
i N
j=1 j=i+1
1T (i, N) = N + T (i-j, N-j)+ T (i-j)
N
= 2 N + min (i, N-i) + o(N)
Hoar’s Selection Algorithm (cont’d)• Random splitter k X Hselect (i,X) has two
cases
Hoar’s Selection Algorithm (cont’d)
• Inefficient: each recursive call requires N comparisons, but only
reduces average problem size to ½ N
Improved Randomized Selection
• By Floyd and Rivest• Algorithm FRselect(i,X)begin
if X = {x} then output x else choose k1, k2 X such that k1< k2 let r1 = rank(k1, X), r2 = rank(k2, X)
if r1 > i then FRselect(i, {x X|x < k1})
else if r2 > i then FRselect(i-r1, {x X|k1x k2})
else FRselect(i-r2, {x X|x > k2})
end
Choosing k1, k2
• We must choose k1, k2 so that with high likelihood,
k1 select(i,X) k2
Choosing k1, k2 (cont’d)
• Choose random sample S X size s• Define
1
2
(s+1)k = select i ,
(N+1)
(s+1)k = select i ,
(N+1)
where = d s log N , d = constant
S
S
Choosing k1, k2 (cont’d)
• Lemma 2 implies
-α1
-α2
1 1
2 2
P rob (r > i) < N
and
P rob (r > i) < N
where r = rank (k , X)
r = rank (k , X)
Expected Time Bound
1 1
2 1 2
1 2 1 2 1
-
T(i,N) N + 2T(-,s)
+ Prob (r > i) T(i, r )
+ Prob (i > r ) T(i - r , N - r )
+ Prob (r i r ) T(i - r , r - r )
N+1N+2T(-,s)+2N N+T i, 2
s+1
N + min (i, N-i
2
3
) + o(N)
3 if we set and s = N log N = o(N)
Small Cost of Recursions
• Note• With prob 1 - 2N- each recursive
call costs only O(s) = o(N) rather than N in previous algorithm
Randomized Sorting Algorithms
• “canonical sorting algorithm”• Algorithm cansort(X)begin
if x={x} then output X elsechoose a random sample S of X of size sSort SS subdivides X into s+1 subsets
X1, X2, …, Xs+1
output cansort (X1) cansort (X2)… cansort(Xs+1)
end
Using Random Sampling to Aid Sorting
• Problem:– Must subdivide X into subsets of nearly
equal size to minimize number of comparisons
– Solution: random sampling!
Hoar’s Randomized Sorting Algorithm
• Uses sample size s=1• Algorithm quicksort(X)begin
if |X|=1 then output X elsechoose a random splitter k X
output quicksort ({x X|x<k})· (k)· quicksort ({x X|x>k})
end
Expected Time Cost of Hoar’s Sort
• Inefficient:• Better to divide problem size by ½
with high likelihood!
1
1T(N) N-1 + (T(i-1) T(N-i))
N
2 N log N
N
i
Improved Sort using
• Better choice of splitter is k = sample selects (N/2, N)
• Algorithm samplesorts (X)begin
if |X|=1 then output X choose a random subset S of X size
s = N/log N k select (S/2,S) cost time o(N)
output samplesorts ({x X|x<k})· (k) ·
samplesorts ({x X|x>k})end
Randomized Approximant of Mean
• By Lemma 2, rank(k,X) is very nearly the mean:
-αNProb(| rank(k,X) - | d α N log N) < N
2
Improved Expected Time Bounds
• Is optimal for comparison trees!
-1T(N) 2T(N ) + N T(N) N + o(N) +N - 1
log (N!)
Open Problems in Selection and Sorting
1) Improve Randomized Algorithms to exactly match lower bounds on number of comparisons
2) Can we de randomize these algorithms – i.e., give deterministic algorithms with the same bounds?
Randomized Algorithms for Selection and Sorting
Prepared by
John Reif, Ph.D.
Analysis of Algorithms