KMP Algorithm

Leetcode 28:

Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Introduction

Suppose text is abbbbbbbbbbc, and pattern is abbbc, to find the index of the first match, we start matching on index 0, 1, 2, …. The time complexity if O(mn), where m is the length of text and n is the length of pattern.

Brute Force Approach

Why KMP

Observation: When there is a mismatch at index 4, and all characters match for previous indexes, where do we start the next match?

  • In the pattern string, the first character a is not contained in index 1 ~ 3.
  • We have checked that index 0 ~ 3 is matched (same) for text and pattern string.
  • As a result, a will not appear in the index 1 ~ 3 at the text string, and we don’t need to re-match at index 1.

Bad next match:

Bad Next Match

We can re-match at the index 4, because there is no a for index 1 ~ 3 at text string.

Correct Next Match

Another Example

Let’s see another example:

Another Example Start

There is a mismatch at index 9:

Mismatch at Index 9
  • The previous pattern: abcxxab matches the text, and mismatches occurs at index 9.
  • For the string before the mismatch index: abcxxab, there is a repeated pattern ab: ab occurs at the first two index and the last 2 index.
  • If we rematch at the index of the mismatch, we might lose a chance to get a full-match because the first two indexes of the pattern string, ab, also matches the ab at index 7 and 8 in the text string.
  • We rematch at the index 2 of the pattern string.
Rematch Strategy

How do we know where to start re-match?

Observe the previous example, we find that:

Explanation
  • pattern[0:7] matches text[2:9], the index 9 does not match.
  • In pattern shring, ab is repeated in abcxxab.
  • As a result, pattern[0:1] matches pattern[-2:0], which also matches text[7:8].
  • We rematch for pattern[2] and text[9].

Construct “next” Array

We need an array to indicate where to start re-matching if a mismatch occurs.

Next Array
private int[] getNext(char[] strArr) {
    int[] next = new int[strArr.length];
    int i = 0;
    int j = 1;
    while (j < strArr.length) {
        if (strArr[i] == strArr[j]) {
            next[j] = i+1;
            i++;
            j++;
        } else {
            if (i == 0) {
                next[j] = i;
                j++;
            } else {
                i = next[i-1];
            }
        }
    }
    return next;
}

Match

public int strStr(String haystack, String needle) {
    char[] haystackArr = haystack.toCharArray();
    char[] needleArr = needle.toCharArray();
    int[] next = getNext(needleArr);
    int i = 0;
    int j = 0;
    if (haystack.length() < needle.length()) return -1;
    while (i < haystack.length()) {
        if (haystackArr[i] == needleArr[j]) {
            if (j == needle.length()-1) return i - j;
            i++;
            j++;
        } else {
            if (j == 0) {
                i++;
            } else {
                j = next[j-1];
            }
        }
    }
    return -1;
}

KMP Algorithm
http://example.com/2024/07/07/Leetcode/kmp/
Author
Songlin Zhao
Posted on
July 7, 2024
Licensed under