Randomized Algorithms
Consider the task of sorting N objects (eg numbers).
A possible randomized algorithm works as follows.
-
Pick one of the objects at random, and pretend that it is
the median of the set.
Compare all (N-1) others with it, thus dividing
the others into two sets of size A and B.
Certainly A+B = N-1; if we got lucky,
A ~= N/2.
Maybe, typically (in some sense), A ~= N/3 and B ~= 2N/3,
or vice versa.
-
Repeat step 1 on each of the two sets (until A = 1 and B = 1).
If we get lucky at every step (i.e., we hit the true median every time)
then the number of
comparisons of objects with each other will be
pretty accurately
Cmin = N log2 N.
Failing to hit the median means that our binary tree will be
lopsided, and more comparisons will be needed. How many comparisons?
We still expect something that scales as N ln N,
but the base of the logarithm will be different.
The "2" in the "log2" in
Cmin(N) = N log2 N
is there because the perfect median choice makes a balanced
binary tree, and log2 N is the depth of that
tree. The intuition is:
When we make an unbalanced tree, the "2" changes to a number smaller
than 2; as we'll see, it turns out to be roughly 1.6.
Crandomized algorithm ~= N log1.6 N
There are two rather different ways to prove this result.
Recurrence relation proof
Let the average cost, in comparisons, of sorting N items by
the randomized algorithm be T(N).
Then, considering the first step, with the pseudo-median being
drawn at random with u being uniform distributed between 0 and 1,
T(N) ~= N + < T(uN) > + < T((1-u)N) >
where < T((1-u)N) > denotes the average of T((1-u)N),
and we've neglected the distinction between N and N-1.
We need to solve this recurrence relation, with boundary condition
T(1)=T(0)=0.
Let's guess that
T(N) = a N ln N
and solve for a
Then
T(N) ~= N + T(N) - a N < H_2^(e)(u) >
For consistency this gives
1 = a < H_2^(e)(u) >
Carrying out the integral we find the neat result:
a = 2.
So the base of the logarithm - the number formally known as `1.6' -
is in fact sqrt(e).
`Means add' proof
Another neat derivation of the cost of the randomized sort
algorithm, which gives little insight into the correct answer,
but generates it quite easily,
runs as follows.
Imagine that the numbers are already sorted, and
consider two numbers whose ranks in the list of size $N$ are $m$ and $n$, with $m < n$.
When we apply the algorithm, what is the chance $c_{mn}$ that $m$ will, at some
point along the way, get compared with $n$?
Well, if any of the $n-m-1$ numbers between $m$ and $n$ gets selected before
both $m$ and $n$, then the subsequent comparisons will separate $m$ and $n$ into
separate sets, so that they never get compared with each other.
On the other hand, if one of $m$ and $n$ is selected before any of those $n-m-1$ numbers,
then $m$ and $n$ will instantly be compared with each other in the
next comparison step.
Thus
c_{mn} = FRACTION {2}{n-m+1} ;
and the expected number of comparisons
is the sum over all pairs,
C(N) = {m,n: m < n} FRACTION {2}{n-m+1}
|