Saturday, May 29, 2010

When linear search is faster than binary

It is often stated that a binary search algorithm performs better than linear search. Indeed, let's assume that the goal is to find the index of the largest integer in the sorted zero-based array A that is less than or equal to the given integer X. For simplicity, let's also assume that the size of array is N = 2K, that for all possible inputs A[0] <= X and that one can also put a sentinel value after the end, so that X < A[N] for all possible inputs X. E.g., if it is known that X always fits in 16 bits, one can put 65536 as the sentinel.

The assumptions above are true for the entropy decoder found in the Monkey's Audio codec. In that case, N = 64, X is unsigned and always fits in 16 bits, A[0] = 0, A[64] = 65536.

This is linear search:


A[N] = sentinel;
index = 0;
while (A[index + 1] <= X)
    ++index;


This is binary search:


int bit = N >> 1;  /* N is a power of two */
index = 0;
while (bit != 0) {
    if (A[index + bit] <= X)
        index += bit;
    bit >>= 1;
}


If the size of array to be searched is N, then the average case complexity is commonly stated to be O(log(N)) for binary search and O(N) for linear search. So, for the Monkey's Audio case presented above, one would expect, on average, 32 loop iterations for the linear search algorithm and 6 iterations for the binary search.

However, there is one key fact that makes the estimate above invalid. Such simple averaging is valid only if each result appears with the same probability. For Monkey's Audio entropy decoder, this is not the case.

In fact, out of the 64 possible results, "0" appears with 31% probability, and the total probability of the first 6 results is 87%. So, it is quite natural that linear search is faster than binary in this case, because it often stops very early.

No comments: