Boyer Moore Horspool Video Search Tutorial


When looking for a string of characters in a text we rely today on the functions offered by our programming language but how would you find a channel if you did not have access to these functions? How are its functions effective?

The Naïve / Brute force approach

The first approach that can come to mind is the "naïve" approach of dragging the search keyword along our string until you find a match.
We begin by comparing the first letter of our word with the first letter of our text.

v
I search the word "word" in this chain of character a bit long
this string of character

If it corresponds we move to the next letter, if it does not match we shift the word a notch and we start again.

  v
I search the word "word" in this chain of character a bit long
_This character string

and so on until we reach the end of our chain or until we reach the desired word

I search the word "word" in this chain of character a bit long
_______________________________this string of characters

In this situation all the letters match and we can consider that the word has been found.

Complexity

To understand the complexity of this algorithm it is necessary to analyze the best and the worst case.

The best case occurs when the search string begins with a letter that is not present in the string. Then there will be m - n + 1 iterations (with m the size of the search string and n the size of the text in which one looks).

I search the word "word" in this chain of character a bit long
grafikart

The worst of cases occurs when the search string and text uses the same letters.

aaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaab

In this situation we will have an iteration number equal to n ^ (m - n + 1). However in a realistic case we will be closer to the first case (better) than this one.

Boyer-Moore-Horspool

The Boyer Moore Horspool Algorithm (or simply Horspool) is a simplification of the Boyer-Moore algorithm that will be based on a jump table. The purpose of this algorithm is to limit the number of iterations by skipping several steps in the checks. Take our case:

                                                v
I search the word "word" in this chain of character a bit long
this string of character

The algorithm will start by comparing the letters but from right to left. From the beginning we have a non-correspondence because the letter e does not match the character ". In the same way the character " is nowhere to be found in our search string (so it's useless to do tests by dragging a single character because we know in advance that it will not be suitable). We will then skip as many steps as we have characters in the search string.

                                                                                                  v
I search the word "word" in this chain of character a bit long
_________________________this string of characters

We find again a non-correspondence but this time with a r. r being present in our chain we will shift our word in relation to the position of the last r of our word.

                                                                                                    v
I search the word "word" in this chain of character a bit long
__________________________this string of characters

And we continue (this time with the at)

                                                                                                    v
I search the word "word" in this chain of character a bit long
_______________________________this string of characters

And we just found a match in just 4 iterations (against 31 previously)!

Jump table

For this algorithm to work, you must first be able to create a jump table. This table makes it possible to know the length of the jump to be made according to the first letter verified.

If you wish you can see on this site the course of research Boyer Moore Horspool.

For further

If you want to go further do not hesitate to go to Wikipedia to discover other algorithms.