字符串模式匹配

本文轉載自查看原文 2015-03-19 19:14 12941 算法和數據結構/ 算法

要點

模式匹配是數據結構中字符串的一種基本運算，給定一個子串，要求在某個字符串中找出與該子串相同的所有子串，這就是模式匹配。

假設P是給定的子串，T是待查找的字符串，要求從T中找出與P相同的所有子串，這個問題成為模式匹配問題。P稱為模式，T稱為目標。如果T中存在一個或多個模式為P的子串，就給出該子串在T中的位置，稱為匹配成功；否則匹配失敗。

文中代碼是本人自己寫的，實測有效，含JAVA和C++兩種代碼。干貨充足吧。

蠻力算法 (BF算法)

蠻力算法(Brute-Force)，簡稱BF算法。（男朋友算法，簡單粗暴—_—!）

算法思想

BF算法的算法思想是：

從目標串T的的第一個字符起與模式串P的第一個字符比較。

若相等，則繼續對字符進行后續的比較；否則目標串從第二個字符起與模式串的第一個字符重新比較。

直至模式串中的每個字符依次和目標串中的一個連續的字符序列相等為止，此時稱為匹配成功，否則匹配失敗。

通過下圖示例，可一目了然：

算法性能

假設模式串的長度是m，目標串的長度是n。

最壞的情況是每遍比較都在最后出現不等，即沒變最多比較m次，最多比較n-m+1遍。

總的比較次數最多為m(n-m+1)，因此BF算法的時間復雜度為O(mn)。

BF算法中存在回溯，這影響到效率，因而在實際應用中很少采用。

代碼

JAVA版本

1 public class BFMatch {
2
3      static int bfMatch(String target, String pattern) {
4          int pos = -1;
5          int i = 0, j = 0, k = 0;
6
7          // 在沒找到匹配pattern的子串前，遍歷整個target
8          while (-1 == pos && i < target.length()) {
9
10              // 將目標串和模式串逐一比對，如果有不同的則退出
11              while (j < pattern.length() && target.charAt(i) == pattern.charAt(j)) {
12                 i++;
13                 j++;
14             }
15
16              if (j >= pattern.length()) { // 如果模式串掃描完，說明目標串中含有這個子串
17                 pos = k;
18             } else { // 反之，沒有掃描完，則從目標串的下一個字符開始重新逐一比對
19                 j = 0;
20                 k++;
21                 i = k;
22             }
23         }
24
25          return pos;
26     }
27
28      public static void print(String target, String pattern, int index) {
29          if (-1 != index) {
30             System.out.format("[%s] is in the Pos = %d of [%s]\n", pattern, index, target);
31         } else {
32             System.out.format("[%s] is not in the [%s]\n", pattern, target);
33         }
34     }
35
36      public static void main(String[] args) {
37         String target = "Hello World";
38         String pattern = "llo";
39         String pattern2 = "Woe";
40
41          int index = bfMatch(target, pattern);
42          int index2 = bfMatch(target, pattern2);
43         print(target, pattern, index);
44         print(target, pattern2, index2);
45
46     }
47
48 }

BF算法之JAVA實現

C++版本

1 #include <iostream>
2 #include < string>
3
4 using namespace std;
5
6 int bfMatch( string target, string pattern) {
7      int pos = - 1;
8      int i = 0, j = 0, k = 0;
9
10      // 在沒找到匹配pattern的子串前，遍歷整個target
11      while (- 1 == pos && i < ( int)target.length()) {
12
13          // 將目標串和模式串逐一比對，如果有不同的則退出
14          while (j < ( int)pattern.length() && target[i] == pattern[j]) {
15             i++;
16             j++;
17         }
18
19          if (j >= ( int)pattern.length()) { // 如果模式串掃描完，說明目標串中含有這個子串
20             pos = k;
21         } else { // 反之，沒有掃描完，則從目標串的下一個字符開始重新逐一比對
22             j = 0;
23             k++;
24             i = k;
25         }
26     }
27
28      return pos;
29 }
30
31 void print( string target, string pattern, int index) {
32      if (- 1 != index) {
33         cout << " [ " << pattern << " ] is in the Pos = " << index << " of [ " << target << " ] " << endl;
34     } else {
35         cout << " [ " << pattern << " ] is not in the [ " << target << " ] " << endl;
36     }
37 }
38
39 int main()
40 {
41      string target = " Hello World ";
42      string pattern = " llo ";
43      string pattern2 = " Woe ";
44
45      int index = bfMatch(target, pattern);
46      int index2 = bfMatch(target, pattern2);
47     print(target, pattern, index);
48     print(target, pattern2, index2);
49      return 0;
50 }

BF算法之C++實現

運行結果

[llo] is in the Pos = 2 of [Hello World]
[Woe] is not in the [Hello World]

KMP算法

Knuth-Morris-Pratt算法（簡稱KMP），是由D.E.Knuth、J.H.Morris和V.R.Pratt共同提出的一個改進算法，消除了BF算法中回溯問題，完成串的模式匹配。

算法思想

在BF算法中，用模式串去和目標串的某個子串比較時，如果不全部匹配，就要回溯到起始位置，然后后移。

顯然，移回到前面已經比較過的位置，還是不能完全匹配。

KMP算法的思想是，設法利用這個已知信息，跳過前面已經比較過的位置，繼續把它向后移，這樣就提高了效率。

由此可知，KMP算法其實有兩大要點：

(1) 計算跳轉位置信息，這里我們稱之為部分匹配表。

(2) 后移到指定位置，重新開始匹配。

首先，來看如何獲得部分匹配表。

為了確定匹配不成功時，下次匹配時 j的位置，引入了next[]數組，next[j]的值表示模式串P[0...j-1]中最長后綴的長度等於相同字符序列的前綴。

這個next 數組叫做部分匹配表。

對於next[]數組的定義如下：

對於BF算法中的例子，模式串P=“abcac”，根劇next[j]的定義，可得到下表：

j	0	1	2	3	4
t[j]	a	b	c	a	c
next[j]	-1	0	0	0	1

有了部分匹配表，就可以后移到指定位置

在匹配過程中，若發生不匹配的情況。

如果next[j] >= 0，則目標串的指針 i 不變，將模式串的指針 j 移動到 next[j] 的位置繼續進行匹配；

若next[j] = -1，則將 i 右移1位，並將 j 置0，繼續進行比較。

以上要點配合下面的示意圖理解，效果會更好哦。

算法性能

假設模式串的長度是m，目標串的長度是n。

在KMP算法中求next數組的時間復雜度為O(m)，在后面的匹配中因目標串T的下標不用回溯，所以比較次數可記為n。

由此，得出KMP算法的總的時間復雜度為O(n+m)。

代碼

JAVA版本

1 public class KMPMatch {
2
3      // 計算部分匹配表
4      public static int[] getNext(String pattern) {
5          int j = 0, k = -1;
6          int[] next = new int[pattern.length()];
7         next[0] = -1;
8          while (j < pattern.length() - 1) {
9              if (-1 == k || pattern.charAt(j) == pattern.charAt(k)) {
10                 j++;
11                 k++;
12                 next[j] = k;
13             } else {
14                 k = next[k];
15             }
16         }
17
18          return next;
19     }
20
21      // KMP算法
22      static int kmpMatch(String target, String pattern) {
23          int i = 0, j = 0, index = 0;
24          int[] next = getNext(pattern); // 計算部分匹配表
25
26          while (i < target.length() && j < pattern.length()) {
27              if (-1 == j || target.charAt(i) == pattern.charAt(j)) {
28                 i++;
29                 j++;
30             } else {
31                 j = next[j]; // 如果出現部分不匹配，獲取跳過的位置
32             }
33         }
34
35          if (j >= pattern.length())
36             index = i - pattern.length(); // 匹配成功，返回匹配子串的首字符下標
37          else
38             index = -1; // 匹配失敗
39
40          return index;
41
42     }
43
44      // 打印完整序列
45      public static void printAll( int[] list) {
46          for ( int value : list) {
47             System.out.print(value + "\t");
48         }
49         System.out.println();
50     }
51
52      public static void main(String[] args) {
53         String target = "ababcabcacbab";
54         String pattern = "abcac";
55          int index = kmpMatch(target, pattern);
56         System.out.format("[%s] is in the pos = %d of [%s]", pattern, index, target);
57     }
58
59 }

KMP算法之JAVA實現

C++版本

1 #include <iostream>
2 #include < string>
3
4 using namespace std;
5
6 const int MAX = 100;
7 int next[MAX] = { 0};
8
9 // 計算部分匹配表
10 void getNext( string pattern) {
11      int j = 0, k = - 1;
12     next[ 0] = - 1;
13      while (j < ( int)pattern.length() - 1) {
14          if (- 1 == k || pattern[j] == pattern[k]) {
15             j++;
16             k++;
17             next[j] = k;
18         } else {
19             k = next[k];
20         }
21     }
22      return;
23 }
24
25 // KMP算法
26 int kmpMatch( string target, string pattern) {
27      int i = 0, j = 0, index = 0;
28     getNext(pattern); // 計算部分匹配表
29
30      while (i < ( int)target.length() && j < ( int)pattern.length()) {
31          if (- 1 == j || target[i] == pattern[j]) {
32             i++;
33             j++;
34         } else {
35             j = next[j]; // 如果出現部分不匹配，獲取跳過的位置
36         }
37     }
38
39      if (j >= ( int)pattern.length())
40         index = i - pattern.length(); // 匹配成功，返回匹配子串的首字符下標
41      else
42         index = - 1; // 匹配失敗
43
44      return index;
45
46 }
47
48 void print( string target, string pattern, int index) {
49      if (- 1 != index) {
50         cout << " [ " << pattern << " ] is in the Pos = " << index << " of [ " << target << " ] " << endl;
51     } else {
52         cout << " [ " << pattern << " ] is not in the [ " << target << " ] " << endl;
53     }
54 }
55
56 int main()
57 {
58      string target = " ababcabcacbab ";
59      string pattern = " abcac ";
60      int index = kmpMatch(target, pattern);
61     print(target, pattern, index);
62      return 0;
63 }

KMP算法之C++實現

運行結果

[abcac] is in the pos = 5 of [ababcabcacbab]

參考資料

《數據結構習題與解析》（B級第3版）

http://www.ruanyifeng.com/blog/2013/05/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm.html

http://www.cnblogs.com/dolphin0520/archive/2011/08/24/2151846.html

相關閱讀

歡迎閱讀 程序員的內功——算法 系列

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 字符串的模式匹配字符串的模式匹配字符串模式匹配KMP算法 [JS高程] 字符串模式匹配方法 Pattern + Matcher 檢索正則模式匹配的字符串 [編程題]字符串模式匹配字符串模式匹配算法2 - AC算法 Shell 命令中的特殊替換、模式匹配替換、字符串提取和替換 Java數據結構之字符串模式匹配算法---KMP算法字符串模式匹配算法系列（一）：BF算法

字符串 模式匹配

免責聲明！

字符串模式匹配