問題

給出字符串S和T，計算S中為T的不同的子序列的個數。

一個字符串的子序列是一個由該原始字符串通過刪除一些字母（也可以不刪）但是不改變剩下字母的相對順序產生的一個新字符串。如，ACE是ABCDE的一個子序列，但是AEC不是。

這里有一個例子：

S＝“rabbbit”，T＝“rabbit”

返回值應為3

初始思路

要找出子序列的個數，首先要有找出S中為T的子序列的方法。T是S的子序列，首先其每一個字母肯定會在S中出現，通過遍歷T的每一個字母即可完成這個檢查。而根據不能亂序的要求，下一個字母在S中出現的位置不能在上一個字母在S中出現的位置之前。由此，我們得到下面的算法：

循環遍歷T

如果當前字母在S中，而且在S中的位置大於前一個字母在S中的位置

繼續循環

否則

循環結束

確認T為S的子序列

上面的算法用來找S中存不存在唯一的T子序列沒有問題，但是如果T中的字母在S中出現多次就不靠譜了。當T中字母多次出現在S時，意味着出現了分支。如S：doggy，T：dog。當我們遍歷到g字母時，其實出現了取S中兩個不同g字母的分支。看到分支，我們可以想到遞歸：把循環遍歷T的過程改為遞歸，每次遞歸調用要處理的T的位置加1，遞歸結束條件為走到T的結尾。經過這樣變化，每次遞歸條件達成意味着一個子序列出現，這樣也達到了我們計算子序列個數的目的。

查找子序列（T，要查找的字母在T中的位置，上一個字母在S中的位置）

如果 要查找的字母在T中的位置 > T的長度

子序列個數加1

如果當前字母在S中

循環遍歷S中所有該字母的位置

如果當前位置 <= 上一個字母在S中的位置

繼續循環

查找子序列（T，要查找的字母在T中的位置＋ 1， 當前位置）

在上面的偽代碼中，我們發現判斷當前字母是否在S中並獲取它在S中的位置這個功能將會被頻繁調用。在具體實現時，我們應該聯想到使用關聯容器（如map）這種查找速度比較快的數據結構（用以字母為下標的數組也可以，查找速度更快。但是需要考慮大小寫字母，非英文字母等情況）。字母可以作為關聯容器的key，而一個存放位置信息的序列容器（如vector）可以作為關聯容器的值。在進行正式計算前，先遍歷S生成這個存放信息的關聯容器，這樣以后我們就不再需要S本身了。最后得到代碼如下：

 1 class Solution {
 2     public:
 3         int numDistinct(std::string S, std::string T)
 4         {
 5             if(T.size() >= S.size())
 6             {
 7                 if(S == T)
 8                 {
 9                     return 1;
10                 }
11                 else
12                 {
13                     return 0;
14                 }
15             }
16             
17             positionInfo_.clear();
18             count_ = 0;
19             
20             for(int i = 0; i < S.size(); ++i)
21             {
22                 if(positionInfo_.find(S[i]) == positionInfo_.end())
23                 {
24                     positionInfo_[S[i]] = {i};
25                 }
26                 else
27                 {
28                     positionInfo_[S[i]].push_back(i);
29                 }
30             }
31             
32             FindDistinct(T, 0, -1);
33             
34             return count_;
35         }
36         
37     private:
38         
39         void FindDistinct(std::string& T, int pos, int previousPosInS)
40         {
41             if(pos > T.size() - 1)
42             {
43                 ++count_;
44                 return;
45             }
46             
47             const auto iter = positionInfo_.find(T[pos]);
48             
49             for(auto posIter = iter->second.begin(); posIter != iter->second.end(); ++posIter)
50             {
51                 if(*posIter <= previousPosInS)
52                 {
53                     continue;
54                 }
55                 
56                 FindDistinct(T, pos + 1, *posIter);
57             }
58         }
59         
60         std::map<char, std::vector<int>> positionInfo_;
61         
62         int count_;
63     };

numDistinct

提交后Judge Small順利通過，但是Judge Large超時了。

優化

針對遞歸計算的優化方法，通過以前題目的分析我們應該比較有經驗了：無非就是通過緩存計算結果避免在遞歸分支中的重復計算。讓我們用例子中的S和T來看看遞歸過程：

可以看到從T的pos為4的地方存在重復計算，由rab1b2已經可以知道取i后只有1種子序列了。這樣看起來似乎可以用T的pos作為key，在rab1b2的遞歸序列中紀錄count[4]=1。隨后在rab1b3的遞歸中到達pos3層后不用繼續遞歸pos4層即可查表得到本次的子序列數1，看起來似乎沒有問題。但是當我們回到pos為2的rab1時，就可以發現隱藏的錯誤了，此時我們記錄count[3]=2。緊接着我們開始處理pos2層的rab2的遞歸，經過查表得到子序列個數為count[3]=2。這顯然是錯誤的，rab2繼續遞歸並沒有兩種子序列。

分析一下錯誤的原因，我們發現其實某點開始的子序列的個數不但和當時T的位置有關，還和當時在S中選取的字母在S中的位置有關。因此處理完rab1遞歸時我們的緩存應該為count[2, 2] = 2（2為第一個b在S中的位置）。這樣在處理rab2時，[2，3]是沒有緩存的，我們通過遞歸可以得到正確的值1。而前面提到的count[4]=1的緩存變為count[4, 5]=1（5為i在S中的位置），不會影響結果。

現在可以開始實現代碼了，由於需要緩存數據，我們得在原來基礎上做一些小修改，不再使用成員變量紀錄子序列個數，而是使用返回值。這樣子才有辦法緩存不同遞歸序列中的中間結果。至於緩存，使用一個std::map<std::pair<int, int>, int>即可。完成后的代碼如下：

 1 class SolutionV3 {
 2     public:
 3         int numDistinct(std::string S, std::string T)
 4         {
 5             if(T.size() >= S.size())
 6             {
 7                 if(S == T)
 8                 {
 9                     return 1;
10                 }
11                 else
12                 {
13                     return 0;
14                 }
15             }
16             
17             positionInfo_.clear();
18             cachedResult_.clear();
19             
20             for(int i = 0; i < S.size(); ++i)
21             {
22                 if(positionInfo_.find(S[i]) == positionInfo_.end())
23                 {
24                     positionInfo_[S[i]] = {i};
25                 }
26                 else
27                 {
28                     positionInfo_[S[i]].push_back(i);
29                 }
30             }
31             
32             return FindDistinct(T, 0, -1);
33         }
34         
35     private:
36         
37         
38         int FindDistinct(std::string& T, int pos, int posInS)
39         {
40             if(pos > T.size() - 1)
41             {
42                 return 1;
43             }
44             
45             
46             int count = 0;
47             int result = 0;
48 
49             
50             const auto iter = positionInfo_.find(T[pos]);
51             
52             
53             for(auto posIter = iter->second.begin(); posIter != iter->second.end(); ++posIter)
54             {
55                 if(*posIter <= posInS)
56                 {
57                     continue;
58                 }
59                 
60                 CacheKey cacheKey(pos, *posIter);
61                 
62                 if(cachedResult_.find(cacheKey) != cachedResult_.end())
63                 {
64                     count += cachedResult_[cacheKey];
65                     continue;
66                 }
67                 
68                 result = FindDistinct(T, pos + 1, *posIter);
69                 cachedResult_[cacheKey] = result;
70                 count += result;
71             }
72             
73             return count;
74         }
75         
76         std::map<char, std::vector<int>> positionInfo_;
77         
78         std::map<std::pair<int, int>, int> cachedResult_;
79         
80         typedef std::pair<int, int> CacheKey;
81     };

numDistinct_cached

順利通過Judge Large。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 [LeetCode] 115. Distinct Subsequences 不同的子序列子序列 sub sequence問題，例：最長公共子序列，[LeetCode] Distinct Subsequences(求子序列個數) LeetCode:Distinct Subsequences Distinct Subsequences（不同子序列的個數）——b字符串在a字符串中出現的次數、動態規划 UVa 10069 Distinct Subsequences（經典DP） leetcode 659. Split Array into Consecutive Subsequences [LeetCode] 891. Sum of Subsequence Widths 子序列寬度之和 Leetcode 1143. 最長公共子序列（LCS）動態規划子串和子序列的區別最長下降子序列