KMP Algorithm
Leetcode 28:
Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.
Introduction
Suppose text is abbbbbbbbbbc, and pattern is abbbc, to find the index of the first match, we start matching on index 0, 1, 2, …. The time complexity if O(mn), where m is the length of text and n is the length of pattern.
Why KMP
Observation: When there is a mismatch at index 4, and all characters match for previous indexes, where do we start the next match?
- In the pattern string, the first character
ais not contained in index 1 ~ 3. - We have checked that index 0 ~ 3 is matched (same) for text and pattern string.
- As a result,
awill not appear in the index 1 ~ 3 at the text string, and we don’t need to re-match at index 1.
Bad next match:
We can re-match at the index 4, because there is no a for index 1 ~ 3 at text string.
Another Example
Let’s see another example:
There is a mismatch at index 9:
- The previous pattern:
abcxxabmatches the text, and mismatches occurs at index 9. - For the string before the mismatch index:
abcxxab, there is a repeated patternab:aboccurs at the first two index and the last 2 index. - If we rematch at the index of the mismatch, we might lose a chance to get a full-match because the first two indexes of the pattern string,
ab, also matches theabat index 7 and 8 in the text string. - We rematch at the index 2 of the pattern string.
How do we know where to start re-match?
Observe the previous example, we find that:
pattern[0:7]matchestext[2:9], the index 9 does not match.- In pattern shring,
abis repeated inabcxxab. - As a result,
pattern[0:1]matchespattern[-2:0], which also matchestext[7:8]. - We rematch for
pattern[2]andtext[9].
Construct “next” Array
We need an array to indicate where to start re-matching if a mismatch occurs.
private int[] getNext(char[] strArr) {
int[] next = new int[strArr.length];
int i = 0;
int j = 1;
while (j < strArr.length) {
if (strArr[i] == strArr[j]) {
next[j] = i+1;
i++;
j++;
} else {
if (i == 0) {
next[j] = i;
j++;
} else {
i = next[i-1];
}
}
}
return next;
}Match
public int strStr(String haystack, String needle) {
char[] haystackArr = haystack.toCharArray();
char[] needleArr = needle.toCharArray();
int[] next = getNext(needleArr);
int i = 0;
int j = 0;
if (haystack.length() < needle.length()) return -1;
while (i < haystack.length()) {
if (haystackArr[i] == needleArr[j]) {
if (j == needle.length()-1) return i - j;
i++;
j++;
} else {
if (j == 0) {
i++;
} else {
j = next[j-1];
}
}
}
return -1;
}