Fast Array-Searching Algorithm

December 4th, 2006

A while back I tried to solve a simple problem, first pick a completely random integer, then what’s the fastest way to guess it? That is to say, using the fewest amount of chances. You only have one advantage, when you guess, you are told if the number you guessed is <= or > the number you a searching for.

The answer is pretty simple. Let’s say the number is 481, all you need to start with is a really big number that is greater than what you’re looking for, let’s say 1000. Cut it in half, is it still greater than the number? Yes, so now you know that the number is >= 0 and < 500. Cut it in half again, and you’ll find it’s <= the number you’re searching for. Now you know that the number is >= 250 and < 500. Continue with this like so…

250 >= x < 500
Guess: 325 (it's <=)
325 >= x < 500
Guess: 412 (it's <=)
412 >= x < 500
Guess: 456 (it's <=)
456 >= x < 500
Guess: 489 (it's >)
478 >= x < 489
Guess: 483 (it's >)
478 >= x < 483
Guess: 480 (it's <=)
480 >= x < 483
Guess: 481 (it's <=)
481 >= x < 483
Guess: 482 (it's >)
481 >= x < 482

At the end, you’ll notice there’s only one number that’s >= 481 and < 482, the number you’re looking for.

In case you didn’t notice several numbers above were rounded. This always bothered me as it seemed untrustworthy, inheriting possible problems from floating point number math. Then I realized this issue could be overstepped and made much faster through the use of binary math. With a binary math technique you’d only work with multiples of two, so there are never rounding issues. First you need to find the highest multiple of two that is above the number you are searching for, in this case, we’re trying to find 1683.

Guess: 00000000 00000001 (1) (true)
Guess: 00000000 00000010 (2) (true)
Guess: 00000000 00000100 (4) (true)
Guess: 00000000 00001000 (8) (true)
Guess: 00000000 00010000 (16) (true)
Guess: 00000000 00100000 (32) (true)
Guess: 00000000 01000000 (64) (true)
Guess: 00000000 10000000 (128) (true)
Guess: 00000001 00000000 (256) (true)
Guess: 00000010 00000000 (512) (true)
Guess: 00000100 00000000 (1024) (true)
Guess: 00001000 00000000 (2048) (false)
Answer: 2048

Now we know the first multiple of two that is higher than the number is 2048. Now finding the number between the current minimum/maximum is much simpler through the use of binary math.

00000100 00000000 => 1024 (start)
00000110 00000000 => (1024 | 512) == 1536 (true)
00000111 00000000 => (1536 | 256) == 1792 (false)
00000110 10000000 => (1536 | 128) == 1664 (true)
00000110 11000000 => (1664 | 64) == 1728 (false)
00000110 10100000 => (1664 | 32) == 1696 (false)
00000110 10010000 => (1664 | 16) == 1680 (true)
00000110 10011000 => (1680 | 8) == 1688 (false)
00000110 10010100 => (1680 | 4) == 1684 (false)
00000110 10010010 => (1680 | 2) == 1682 (true)
00000110 10010011 => (1682 | 1) == 1683 (true)
Answer: 1683

You’ll notice every time the result is true, the bit that is currently being tested stays in the result, otherwise it returns to zero.

Next is a simple example of how to use this technique. The goal of this code is to insert a number into an Array in numerical order. That is to say, each number in the Array should be greater than (or equal to) the numbers that precede it. Also keep in mind even if there is only one entry (length == 1) there are two possible places to put the new number, before and after the lone entry.

function addEntry(entry:Number):void {
var cBit:uint = 1;
while (cBit < arr.length) {
cBit <<= 1;
}
var pos:uint = 0;

do {
if ((pos | cBit) <= arr.length) {
if (entry >= arr[(pos | cBit)-1]) {
pos
|= cBit;
}
}

} while ((cBit >>>= 1) !== 0);
arr.splice(pos,0,entry);
}

That’s the simplified version, get a more optimized and complete example here. The great thing about this method is that it can be used to keep (or find) just about anything sorted. I’d also like to note that while I did actually think this up on my own, I’m definitely not the first. Not by a long shot. The real name for it appears to be Binary Search Algorithim.

2 Responses to “Fast Array-Searching Algorithm”

  1. marcus Says:

    Could you also suggest what could be the running time for such an algorithm? Thanks.

  2. Max Says:

    marcus: O(2*s), where 2^s is the highest integer greater than or equal to the number being tested. In other words, if it’s done in 32-bit math, since no number can be higher than 2^32-1, then the maximum amount of iterations is 2*32, or 64.

    Then again that’s assuming you do the guessing stage, this can also be done with O(s) as a constant by setting the “guess” as the maximum possible value which would be good for systems where integers commonly go over 2^16-1. In 32-bit math that would make the amount of iterations always equal to 32, one for every bit.

Leave a Reply