# rabin karp

🏠

Write a function that finds instances of a substring in a string in linear time.

## Solution

### Hashing the substring

Creating a hash works a bit like fingerprinting: it enables condensing a long and complex piece of data, like a string, into a simpler one, like an integer.

The hash for pyth is 2053428.

Hashing a string works letter by letter:

• 1/ multiply the previous result by the base (26, as there are 26 letters in the ascii lowercase alphabet)
• 2/ add the ascii value of the current letter
• 3/ go to step 1 for the next letter

This does not ensure that the hash 2053428 can only represent pyth since hashes can collide. But the probability that it represents pyth is very high.

We create a hash, because comparing substrings is now only a matter of comparing two integers. It also enables us to roll the hash down the string in constant time.

Likewise, we'll pre-compute the hash for algo.

### Rolling the hash

Let's replace our letters with variables a, b, c and d.

We can also replace our base with $\beta$, and re-organise our equation a little.

We can use this rearranged formula, $a \beta^{3} + b \beta^{2} + c \beta + d$, to remove the left-most letter, and add a new letter to the right.

Here's an implementation, with pre-computed hashes:

1def rabin_karp(s, t):
2    s_n, t_n = len(s), len(t)
3    rolling = reduce(lambda h, c: h * 26 + ord(c), s[:t_n], 0)
4    target = reduce(lambda h, c: h * 26 + ord(c), t, 0)
5    for i in range(t_n, s_n):
6        if rolling == target and s[i - t_n : i] == t:
7            return i - t_n
8        rolling = (rolling - (ord(s[i - t_n]) * (26 ** (t_n - 1)))) * 26 + ord(s[i])
9    return s_n - t_n if rolling == target and s[-t_n:] == t else -1


 1def rabin_karp(s, t):
2    s_n, t_n = len(s), len(t)
3    # pre-compute our rolling hash for the string
4    rolling = reduce(lambda h, c: h * 26 + ord(c), s[:t_n], 0)
5    # pre-compute our target hash
6    target = reduce(lambda h, c: h * 26 + ord(c), t, 0)
7    # iterate from the end of the target to the end of the string
8
9    for i in range(t_n, s_n):
10        # if the rolling hash matches our target hash
11        # we still need to check for hash collisions
12        # by doing a string comparison
13        if rolling == target and s[i - t_n : i] == t:
14            return i - t_n
15        # a is the left-most character in the rolling hash
16        a = ord(s[i - t_n])
17        # here we multiply it by B^(n-1)
18        aBnm1 = a * (26 ** (t_n - 1))
19        # remove leftmost character from rolling hash
20        rolling -= aBnm1
21        # multiply all characters by the base (distributive property)
22        rolling *= 26
23        # add the new character to the rolling hash
24        rolling += ord(s[i])
25
26    # check the last iteration
27    return s_n - t_n if rolling == target and s[-t_n:] == t else -1


Here's another implementation, that doesn't require hashes to be pre-computed, or to check the last iteration separately:

 1def rabin_karp(s, t):
2    s_n, t_n, H, T, s_pow = len(s), len(t), 0, 0, 26 ** (len(t) - 1)
3    if len(t) <= len(s) and len(t):
4        for i in range(s_n):
5            if i < t_n:
6                H, T = H * 26 + ord(s[i]), T * 26 + ord(t[i])
7            else:
8                H = (H - (ord(s[i - t_n]) * s_pow)) * 26 + ord(s[i])
9            if H == T and s[i - t_n + 1 : i + 1] == t:
10                return i - t_n + 1
11    return -1 if len(t) else 0