開篇

通常的匹配分為兩類，一種是正則表達式匹配，pattern包含一些關鍵字，比如'*'的用法是緊跟在pattern的某個字符后，表示這個字符可以出現任意多次(包括0次)。

另一種是通配符匹配，我們在操作系統里搜索文件的時候，用的就是這種匹配。比如 "*.pdf"，'*'在這里就不再代表次數，而是通配符，可以匹配任意長度的任意字符組成的串。所以"*.pdf"表示尋找所有的pdf文件。

在算法題中，往往也會有類似的模擬匹配題，當然考慮到當場實現的時間，會減少通配符數量或者正則表達式關鍵字的數量，只留那么幾個，即便如此，這類題目也是屬於比較難的題目了==。

正則表達式匹配

例題如下：

Regular Expression Matching

http://basicalgos.blogspot.com/2012/03/10-regular-expression-matching.html

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

這道題是面Facebook時遇到的一道題。

要處理的關鍵字有兩個'*', '.' ，第二個比較好辦，第一個比較麻煩，

因為'*'可以表示任意數量，因此當*(p+1) == '*'時，我們可以掠過'*'之前的字符，直接++p，或者如果*s == *(p-1)或*(p-1) == '.'，我們可以跳過任意個這樣的s。因此，'*'的處理被跳過多少個s划分成了多個子問題，我用遞歸函數來處理這些子問題。當時的代碼還沒有這么簡潔，這是我修改后的代碼：

bool isMatch(char *s, char *p){
    if(*s == '\0' && *p == '\0')
        return true;
        
    if (*(p+1) == '*'){
        while(*p == *s || *p == '.'){ //若*s和*p相等，挨個略過
            if(isMatch(s++, p+2));
                return true;
        }
        return isMatch(s, p+2); //若*s和*p不等，直接略過*p；或者當*(p+2) == '\0'時的最后處理
    }
    
    if(*s == *p || *p == '.')
        return *s == '\0' ? false : isMatch(s+1, p+1);
    
    return false;
}

通配符匹配

我們以LeetCode上的一題為例。

Wildcard Matching

Implement wildcard pattern matching with support for '?' and '*'.

'?' Matches any single character.
'*' Matches any sequence of characters (including the empty sequence).

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "*") → true
isMatch("aa", "a*") → true
isMatch("ab", "?*") → true
isMatch("aab", "c*a*b") → false

required function:

bool isMatch(const char *s, const char *p)

通配符有兩個："?"和"*"

因為*是可以匹配任意字符串的，因此還是划分子問題，我一開始的思路是遇到*后，和上一題一樣使用遞歸來處理子問題。

代碼：

class Solution {
public:
    bool isMatch(const char *s, const char *p) {
        if(*s == '\0'){
            if(*p == '\0') return true;
            if(*p != '*') return false;
        }
        if(*p == '?') return isMatch(++s, ++p);
        else if(*p == '*'){
            while(*(++p) == '*');
            for(; *s != '\0'; ++s){
                if(isMatch(s, p)) return true;
            }
            return isMatch(s, p);
        }else{
            if(*p == *s) return isMatch(++s, ++p);
            return false;
        }
        return false;
    }
};

但是這樣做超時。

為了節約時間，我用空間換時間，用rec[][]記錄了比較結果。

class Solution {
public:
    bool isMatch(const char *s, const char *p) {
        int lens = 0, lenp = 0;
        const char *s1 = s, *p1 = p;
        for(; *s1 != '\0'; ++s1, ++lens);
        for(; *p1 != '\0'; ++p1, ++lenp);
        if(lenp == 0) return false;
        if(lens == 0) return true;
        rec = new int*[lens+1];
        for(int i = 0; i <= lens; ++i){
            rec[i] = new int[lenp+1];
            for(int j = 0; j <= lenp; ++j){
                rec[i][j] = -1;
            }
        }
        return isMatchCore(s, s, p, p);
    }
private:
    int** rec;
    bool isMatchCore(const char *oris, const char *s, const char *orip, const char *p) {
        if(*s == '\0'){
            if(*p == '\0') return true;
            if(*p != '*') return false;
        }
        if(rec[s-oris][p-orip] >= 0) return rec[s-oris][p-orip];
        if(*p == '?') return isMatchCore(oris, ++s, orip, ++p);
        else if(*p == '*'){
            while(*(++p) == '*');
            for(; *s != '\0'; ++s){
                if(isMatchCore(oris, s, orip, p)) return true;
            }
            return isMatchCore(oris, s, orip, p);
        }else{
            if(*p == *s) return isMatchCore(oris, ++s, orip, ++p);
            return false;
        }
        return false;
    }
};

結果依然超時。

原因在於即便使用了帶記錄的遞歸，對於p上的每一個'*'，依然需要考慮'*' 匹配之后字符的所有情況，比如p = "c*ab*c"，s = "cddabbac"時，遇到第一個'*'，我們需要用遞歸處理p的剩余部分"ab*c" 和s的剩余部分"ddabbac"的所有尾部子集匹配。也就是："ab*c"和"ddabbac"，"ab*c" 和"dabbac"的匹配，"ab*c" 和"abbac"的匹配，... ，"ab*c" 和"c"的匹配，"ab*c" 和"\0"的匹配。

遇到第二個'*'，依然如此。每一個'*'都意味着p的剩余部分要和s的剩余部分的所有尾子集匹配一遍。

然而，我們如果仔細想想，實際上，當p中'*'的數量大於1個時，我們並不需要像上面一樣匹配所有尾子集。

依然以 p = "c*ab*c"，s = "cddabbac"為例。

對於p = "c*ab*c"，我們可以猜想出它可以匹配的s應該長成這樣： "c....ab.....c"，省略號表示0到任意多的字符。我們發現主要就是p的中間那個"ab"比較麻煩，一定要s中的'ab'來匹配，因此只要s中間存在一個"ab"，那么一切都可以交給后面的'*'了。

所以說，當我們挨個比較p和s上的字符時，當我們遇到p的第一個'*'，我們實際只需要不斷地在s的剩余部分找和'ab'匹配的部分。

換言之，我們可以記錄下遇到*時p和s的位置，記為presp和press，然后挨個繼續比較*(++p)和*(++s)；如果發現*p != *s，就回溯回去，p = presp，s = press+1, ++press；直到比較到末尾，或者遇到了下一個'*'，如果遇到了下一個'*'，說明 "ab"部分搞定了，下面的就交給第二個'*'了；如果p和s都到末尾了，那么就返回true；如果到末尾了既沒遇到新的'*'，又還存在不匹配的值，press也已經到末尾了，那么就返回false了。

這樣的思路和上面的遞歸比起來，最大的區別就在於：

遇到'*'，我們只考慮遇到下一個'*'前的子問題，而不是考慮一直到末尾的子問題。從而避免大量的子問題計算。

我們通過記錄 presp和press，每次回溯的方法，避免使用遞歸。

代碼：

class Solution {
public:
    bool isMatch(const char *s, const char *p) {
        const char *presp = NULL, *press = NULL;    //previous starting comparison place after * in s and p.
        bool startFound = false;
        while(*s != '\0'){
            if(*p == '?'){++s; ++p;}
            else if(*p == '*'){
                presp = ++p;
                press = s;
                startFound = true;
            }else{
                if(*p == *s){
                    ++p;
                    ++s;
                }else if(startFound){
                    p = presp;
                    s = (++press);
                }else return false;
            }
        }
        while(*p == '*') ++p;
        return *p == '\0';
    }
};

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 LeetCode正則表達式匹配 LeetCode–正則表達式匹配正則表達式匹配題（劍指Offer）面試題53：正則表達式匹配面試題19：正則表達式匹配（C++）《劍指offer》面試題53 ：正則表達式匹配 Java [LeetCode] 10. Regular Expression Matching 正則表達式匹配【LeetCode】正則表達式匹配（動態規划） leetcode10 正則表達式匹配 dp leetcode 10 正則表達式匹配