Jes2ica.IO

coding, trading, reading

  1. 1. TL;DR
  2. 2. Preprocessing
  3. 3. Search
  4. 4. More info

TL;DR

In a word, KMP records the suffix substring which is a prefix of the string with max length at position i.

Preprocessing

  • KMP algorithm does preproceses pattern[] and constructs an auxiliary lps[] of size m (same as size of pattern) which is used to skip characters while matching.
  • name lps indicates longest proper prefix which is also suffix.
    • A proper prefix is prefix with whole string not allowed.
    • For example, prefixes of “ABC” are “”, “A”, “AB” and “ABC”. Proper prefixes are “”, “A” and “AB”. Suffixes of the string are “”, “C”, “BC” and “ABC”.
    • lps[i] = the longest proper prefix of pat[0..i] which is also a suffix of pat[0..i].
    • For the pattern “ababaca”, lps=[0, 0, 1, 2, 3, 0, 1].
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public int[] computeLps(String pattern) {
int n = pattern.length();
int[] lps = new int[n];
for (int i = 1; i < n; i++) {
// Start by assuming we're extending the previous LPS
int j = lps[i - 1];
while (j > 0 && pattern.charAt(j) != pattern.charAt(i)) {
j = lps[j - 1];
}
if (pattern.charAt(j) == pattern.charAt(i)) {
lps[i] = j + 1;
} else {
lps[i] = 0;
}
}
return lps;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public int search(String pattern, String text) {
int[] lps = computeLps(pattern);
int j = 0; // Number of chars matched in pattern.
for (int i = 0; i < text.length(); i++) {
while (j > 0 && text.charAt(i) != pattern.charAt(j)) {
// Fall back in the pattern.
j = lps[j - 1]; // Strickly decreasing.
}
if (text.charAt(i) == pattern.charAt(j)) {
// Next char matched, increment position.
j++;
if (j == pattern.length()) {
return i - (j - 1);
}
}
}
return -1; // Not found.
}

More info

This article was last updated on days ago, and the information described in the article may have changed.