子序列 sub sequence問題，例：最長公共子序列，[LeetCode] Distinct Subsequences(求子序列個數)

本文轉載自查看原文 2014-07-29 11:38 4353 數據結構&算法/ Algorithm/ LeetCode

引言

子序列和子字符串或者連續子集的不同之處在於，子序列不需要是原序列上連續的值。

對於子序列的題目，大多數需要用到DP的思想，因此，狀態轉移是關鍵。

這里摘錄兩個常見子序列問題及其解法。

例題1，最長公共子序列

我們知道最長公共子串的求法，先溫習一下，它的求法也是使用DP思想，對於字符串s1 和字符串s2，令 m[i][j] 表示 s1上以s1[i]結尾的子串和s2上s2[j]結尾的子串的最長公共子串長度，因為公共子串必須是連續的，因此狀態轉移方程：m[i, j] = (s1[i] == s2[j] ? m[i-1, j-1] + 1 : 0)。因為m[i, j]的計算只需要用到 m[i-1, j-1]，再之前的就用不着了，因此我們不必用一個二維數組來保存整個m[s1.length()][s2.length()]，只需要保存並不斷更新m[i-1, j-1]就可以了。

代碼：

char *LCString(const char* s1, const char* s2){
    if(NULL == s1 || NULL == s2)
        return NULL;
    int size1 = 0, size2 = 0;
    const char* head1 = s1; const char* head2 = s2;
    while(*(head1++) != '\0') size1++;
    while(*(head2++) != '\0') size2++;
    printf("%d, %d\n", size1, size2);
    
    int maxlen = 0, maxend = 0, i = 0, j = 0, tmpPre = 0;
    int m2[size2];
    for(i = 0; i < size2; m2[i] = 0, ++i);
    for(i = 0; i < size1; ++i){
        for(j = 0; j < size2; ++j){
            int len = ((s1[i] == s2[j] ? 1 : 0) + (j > 0 ? tmpPre : 0));
            if(len > maxlen) { maxlen = len; maxend = i;}
            tmpPre = m2[j];
            m2[j] = len;
        }
    }
    
    if(maxlen > 0){//提出最長子字符串
        char* lcs = new char[maxlen + 1];
        for(i = 0; i < maxlen; lcs[maxlen - i - 1] = s1[maxend - i], i++);
        lcs[maxlen] = '\0';
        return lcs;
    }
    return NULL;
}

那么，對於最長公共子序列，如何去求呢？

首先，如果用m[][]來存長度，最后提出最長子序列要麻煩一些，因為子序列是不連續的。不過雖然麻煩，依舊可行。

接着，依然假設m[i, j]表示 s1[i]結尾的子串和s2[j]結尾的子串的最長公共子序列的長度。

那么:

若s1[i] == s2[j]，m[i,j] = m[i-1][j-1] + 1；

若s1[i] != s2[j]，m[i,j] = Max(m[i-1][j], m[i][j-1])。

這里因為求 m[i,j] 時，m[i-1][j]， m[i][j-1]， m[i-1][j-1]都有可能用到，因此咱還是老實一點用二維數組吧。。

template <typename T> T* Lcseq(T* list1, int size1, T* list2, int size2){
    if(NULL == list1 || NULL == list2)
        return NULL;
    int** m = new int*[size1];
    int i = 0, j = 0;
    int max = 0, maxi = 0, maxj = 0;
    for(; i < size1; i++){
        m[i] = new int[size2];
        for(j = 0; j < size2; j++){
            if(i == 0 && j == 0) m[0][0] = (list1[0] == list2[0] ? 1 : 0);
            else if(i == 0) m[i][j] = (list1[i] == list2[j] ? 1 : m[i][j-1]);
            else if(j == 0) m[i][j] = (list1[i] == list2[j] ? 1 : m[i-1][j]);
            else m[i][j] = (list1[i] == list2[j] ? m[i-1][j-1] + 1 : (m[i][j-1] > m[i-1][j] ? m[i][j-1] : m[i-1][j]));
            if(m[i][j] > max){
                max = m[i][j];
                maxi = i;
                maxj = j;
            }
        }
    }
    //printf("%d, %d, %d\n", max, maxi, maxj);
    
    //提取最大公共子序列 
    int p1 = maxi, p2 = maxj, p = max;
    T* sub = new T[max];
    while(p1 >= 0 && p2 >= 0){
        
        if(list1[p1] == list2[p2]){
            sub[--p] = list1[p1];
            //printf("p: %d, p1: %d, p2: %d\n", p, p1, p2);
            p1--;
            p2--;
        }    
        else{
            if(p1 == 0) p2--;
            else if(p2 == 0) p1--;
            else{
                if(m[p1-1][p2] < m[p1][p2-1]) p2--;
                else p1--;
            }
        }
        
    }
    return sub;    
}

例題2，求子序列的個數，LeetCode

Distinct Subsequences

Given a string S and a string T, count the number of distinct subsequences of T in S.

A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).

Here is an example:
S = "rabbbit", T = "rabbit"

Return 3.

class Solution {
public:
    int numDistinct(string S, string T) {
    }
};

如果不用DP，用帶記憶的遞歸也能做，就是時間比較長，而且遞歸需要額外的棧空間。

class Solution {
public:
    int numDistinct(string S, string T) {
        if(T.length() == 0) return 1;
        if(S.length() == 0) return 0;
        
        rec = new int*[S.length()];
        for(int i = 0;i < S.length(); ++i){
            rec[i] = new int[T.length()];
            for(int j = 0;j < T.length(); ++j)
                rec[i][j] = -1;
        }
        return numDistinctCore(S, T, 0 ,0);
    }
    
    int numDistinctCore(string S, string T, int p1, int p2) {
        if(p2 == T.length()) return 1;
        if((T.length()-p2) > (S.length()-p1)) return 0;
        if(rec[p1][p2] >= 0) return rec[p1][p2];
        int sum = 0;
        for(int i = p1;i < S.length(); ++i){
            if(S[i] == T[p2])
                sum += numDistinctCore(S, T, i+1, p2+1);
        }
        rec[p1][p2] = sum;
        return sum;
    }
private:
    int **rec;
};

AC時間 388ms。

引入DP思想的話，我們依舊用rec[i][j] 表示 "S[i]結尾子串" 中包含 "T[j]結尾子串" 的 sequence 個數。

因為S[i]子串包含了S[i-1]子串，所以rec[i][j] 至少等於rec[i-1][j]；同時，如果S[i] == T[j]，那么還可以讓 S[i]和T[j] 匹配，這種情況下，sequence個數就是rec[i-1][j-1]。

rec[i][j] = rec[i-1][j] + (S[i] == T[j] ? rec[i-1][j-1] : 0)。

代碼：

class Solution {
public:
    int numDistinct(string S, string T) {
        int slen = S.length(), tlen = T.length();
        if(slen < tlen) return 0;
        int **rec = new int*[slen+1];
        int i, j;
        for(i = 0; i <= slen; ++i){
            rec[i] = new int[tlen+1];
            for(j = 0; j <= tlen; ++j){
                rec[i][j] = 0;
            }
        }
        for(i = 0; i <= slen; rec[i++][0] = 1);
                
        for(i = 1; i <= slen; ++i){
            for(j = 1; j <= tlen; ++j){
                rec[i][j] = (rec[i-1][j] + (S[i-1] == T[j-1] ? rec[i-1][j-1] : 0));
            }
        }
        
        return rec[slen][tlen];
    }
};

AC時間 52ms。大幅提高。

上面的解法用到了二維數組。后來搜到了小磊哥關於這道題的解。讓 j 從T末尾遍歷，這樣rec[i][j] 要么依舊等於 rec[i-1][j]，也就是不變，要么加上 rec[i-1][j-1]，因為j是從末尾遍歷到前面，因此 rec[i-1][j-1] 不會被覆蓋。這樣做，省去了二維數組，直接一維數組搞定。用match[] 表示 T[j]結尾的子串的sequence個數。

代碼：

class Solution {
public:
    int numDistinct(string S, string T) {
        if(S.size() < T.size()) return 0;
        int match[T.size()+1];
        int i, j;
        for(match[0] = 1, i = 0; i < T.size(); match[++i] = 0);
        for(i = 1; i <= S.size(); ++i)
            for(j = T.size(); j >= 1; --j)
                if(S[i-1] == T[j-1])
                    match[j] += match[j-1];
        return match[T.size()];
    }
};

這里也用到了上一篇文章中利用從后往前遍歷避免值被覆蓋的思想。

16ms AC，只能說，碉堡了。。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 最長公共子序列最長公共子序列問題最長公共子序列 [LeetCode 115] - 不同子序列(Distinct Subsequences) [LeetCode] 115. Distinct Subsequences 不同的子序列最長公共子序列和最長公共子串（dp）【LeetCode-動態規划】最長公共子序列 codevs 1862 最長公共子序列（求最長公共子序列長度並統計最長公共子序列的個數）最長公共子串和最長公共子序列動態規划解決最長公共子序列問題