Given a non-empty string, encode the string such that its encoded length is the shortest.
The encoding rule is: k[encoded_string]
, where the encoded_string inside the square brackets is being repeated exactly k times.
Note:
- k will be a positive integer and encoded string will not be empty or have extra space.
- You may assume that the input string contains only lowercase English letters. The string's length is at most 160.
- If an encoding process does not make the string shorter, then do not encode it. If there are several solutions, return any of them is fine.
Example 1:
Input: "aaa" Output: "aaa" Explanation: There is no way to encode it such that it is shorter than the input string, so we do not encode it.
Example 2:
Input: "aaaaa" Output: "5[a]" Explanation: "5[a]" is shorter than "aaaaa" by 1 character.
Example 3:
Input: "aaaaaaaaaa" Output: "10[a]" Explanation: "a9[a]" or "9[a]a" are also valid solutions, both of them have the same length = 5, which is the same as "10[a]".
Example 4:
Input: "aabcaabcd" Output: "2[aabc]d" Explanation: "aabc" occurs twice, so one answer can be "2[aabc]d".
Example 5:
Input: "abbbabbbcabbbabbbc" Output: "2[2[abbb]c]" Explanation: "abbbabbbc" occurs twice, but "abbbabbbc" can also be encoded to "2[abbb]c", so one answer can be "2[2[abbb]c]".
這道題讓我們壓縮字符串,把相同的字符串用中括號括起來,然后在前面加上出現的次數,感覺還是一道相當有難度的題呢。參考了網上大神的帖子才弄懂該怎么做,這道題還是應該用DP來做。我們建立一個二維的DP數組,其中dp[i][j]表示s在[i, j]范圍內的字符串的縮寫形式(如果縮寫形式長度大於子字符串,那么還是保留子字符串),那么如果s字符串的長度是n,最終我們需要的結果就保存在dp[0][n-1]中,然后我們需要遍歷s的所有子字符串,對於任意一段子字符串[i, j],我們\\我們以中間任意位置k來拆分成兩段,比較dp[i][k]加上dp[k+1][j]的總長度和dp[i][j]的長度,將長度較小的字符串賦給dp[i][j],然后我們要做的就是在s中取出[i, j]范圍內的子字符串t進行合並。合並的方法是我們在取出的字符串t后面再加上一個t,然后在這里面尋找子字符串t的第二個起始位置,如果第二個起始位置小於t的長度的話,說明t包含重復字符串,舉個例子吧,比如 t = "abab", 那么t+t = "abababab",我們在里面找第二個t出現的位置為2,小於t的長度4,說明t中有重復出現,重復的個數為t.size()/pos = 2個,那么我們就要把重復的地方放入中括號中,注意中括號里不能直接放這個子字符串,而是應該從dp中取出對應位置的字符串,因為重復的部分有可能已經寫成縮寫形式了,比如題目中的例子5。再看一個例子,如果t = "abc",那么t+t = "abcabc",我們在里面找第二個t出現的位置為3,等於t的長度3,說明t中沒有重復出現,那么replace就還是t。然后我們比較我們得到的replace和dp[i][j]中的字符串長度,把長度較小的賦給dp[i][j]即可,時間復雜度為O(n
3),空間復雜度為O(n
2),參見代碼如下:
解法一:
class Solution { public: string encode(string s) { int n = s.size(); vector<vector<string>> dp(n, vector<string>(n, "")); for (int step = 1; step <= n; ++step) { for (int i = 0; i + step - 1 < n; ++i) { int j = i + step - 1; dp[i][j] = s.substr(i, step); for (int k = i; k < j; ++k) { string left = dp[i][k], right = dp[k + 1][j]; if (left.size() + right.size() < dp[i][j].size()) { dp[i][j] = left + right; } } string t = s.substr(i, j - i + 1), replace = ""; auto pos = (t + t).find(t, 1); if (pos >= t.size()) replace = t; else replace = to_string(t.size() / pos) + '[' + dp[i][i + pos - 1] + ']'; if (replace.size() < dp[i][j].size()) dp[i][j] = replace; } } return dp[0][n - 1]; } };
根據熱心網友iffalse的留言,我們可以優化上面的方法。如果t是重復的,是不是就不需要再看left.size() + right.size() < dp[i][j].size()了。例如t是abcabcabcabcabc, 最終肯定是5[abc],不需要再看3[abc]+abcabc或者abcabc+3[abc]。對於一個本身就重復的字符串,最小的長度肯定是n[REPEATED],不會是某個left+right。所以應該把k的那個循環放在t和replace那部分代碼的后面。這樣的確提高了一些運算效率的,參見代碼如下:
解法二:
class Solution { public: string encode(string s) { int n = s.size(); vector<vector<string>> dp(n, vector<string>(n, "")); for (int step = 1; step <= n; ++step) { for (int i = 0; i + step - 1 < n; ++i) { int j = i + step - 1; dp[i][j] = s.substr(i, step); string t = s.substr(i, j - i + 1), replace = ""; auto pos = (t + t).find(t, 1); if (pos < t.size()) { replace = to_string(t.size() / pos) + "[" + dp[i][i + pos - 1] + "]"; if (replace.size() < dp[i][j].size()) dp[i][j] = replace; continue; } for (int k = i; k < j; ++k) { string left = dp[i][k], right = dp[k + 1][j]; if (left.size() + right.size() < dp[i][j].size()) { dp[i][j] = left + right; } } } } return dp[0][n - 1]; } };
類似題目:
參考資料:
https://leetcode.com/problems/encode-string-with-shortest-length/