1. 原理

壓縮

LZ78算法的壓縮過程非常簡單。在壓縮時維護一個動態詞典Dictionary，其包括了歷史字符串的index與內容；壓縮情況分為三種：

若當前字符c未出現在詞典中，則編碼為(0, c)；
若當前字符c出現在詞典中，則與詞典做最長匹配，然后編碼為(prefixIndex,lastChar)，其中，prefixIndex為最長匹配的前綴字符串，lastChar為最長匹配后的第一個字符；
為對最后一個字符的特殊處理，編碼為(prefixIndex,)。

如果對於上述壓縮的過程稍感費解，下面給出三個例子。例子一，對於字符串“ABBCBCABABCAABCAAB”壓縮編碼過程如下：

1. A is not in the Dictionary; insert it 2. B is not in the Dictionary; insert it 3. B is in the Dictionary. BC is not in the Dictionary; insert it. 4. B is in the Dictionary. BC is in the Dictionary. BCA is not in the Dictionary; insert it. 5. B is in the Dictionary. BA is not in the Dictionary; insert it. 6. B is in the Dictionary. BC is in the Dictionary. BCA is in the Dictionary. BCAA is not in the Dictionary; insert it. 7. B is in the Dictionary. BC is in the Dictionary. BCA is in the Dictionary. BCAA is in the Dictionary. BCAAB is not in the Dictionary; insert it.

例子二，對於字符串“BABAABRRRA”壓縮編碼過程如下：

1.  B is not in the Dictionary; insert it 2. A is not in the Dictionary; insert it 3. B is in the Dictionary. BA is not in the Dictionary; insert it. 4. A is in the Dictionary. AB is not in the Dictionary; insert it. 5. R is not in the Dictionary; insert it. 6. R is in the Dictionary. RR is not in the Dictionary; insert it. 7. A is in the Dictionary and it is the last input character; output a pair containing its index: (2, )

例子三，對於字符串“AAAAAAAAA”壓縮編碼過程如下：

1.  A is not in the Dictionary; insert it 2. A is in the Dictionary AA is not in the Dictionary; insert it 3. A is in the Dictionary. AA is in the Dictionary. AAA is not in the Dictionary; insert it. 4. A is in the Dictionary. AA is in the Dictionary. AAA is in the Dictionary and it is the last pattern; output a pair containing its index: (3, )

解壓縮

解壓縮能更根據壓縮編碼恢復出（壓縮時的）動態詞典，然后根據index拼接成解碼后的字符串。為了便於理解，我們拿上述例子一中的壓縮編碼序列(0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B)來分解解壓縮步驟，如下圖所示：

前后拼接后，解壓縮出來的字符串為“ABBCBCABABCAABCAAB”。

LZ系列壓縮算法

LZ系列壓縮算法均為LZ77與LZ78的變種，在此基礎上做了優化。

LZ77：LZSS、LZR、LZB、LZH；
LZ78：LZW、LZC、LZT、LZMW、LZJ、LZFG。

其中，LZSS與LZW為這兩大陣容里名氣最響亮的算法。LZSS是由Storer與Szymanski [2]改進了LZ77：增加最小匹配長度的限制，當最長匹配的長度小於該限制時，則不壓縮輸出，但仍然滑動窗口右移一個字符。Google開源的Snappy壓縮算法庫大體遵循LZSS的編碼方案，在其基礎上做了一些工程上的優化。

2. 實現

Python 3.5實現LZ78算法：

# -*- coding: utf-8 -*- # A simplified implementation of LZ78 algorithm # @Time : 2017/1/13 # @Author : rain def compress(message): tree_dict, m_len, i = {}, len(message), 0 while i < m_len: # case I if message[i] not in tree_dict.keys(): yield (0, message[i]) tree_dict[message[i]] = len(tree_dict) + 1 i += 1 # case III elif i == m_len - 1: yield (tree_dict.get(message[i]), '') i += 1 else: for j in range(i + 1, m_len): # case II if message[i:j + 1] not in tree_dict.keys(): yield (tree_dict.get(message[i:j]), message[j]) tree_dict[message[i:j + 1]] = len(tree_dict) + 1 i = j + 1 break # case III elif j == m_len - 1: yield (tree_dict.get(message[i:j + 1]), '') i = j + 1 def uncompress(packed): unpacked, tree_dict = '', {} for index, ch in packed: if index == 0: unpacked += ch tree_dict[len(tree_dict) + 1] = ch else: term = tree_dict.get(index) + ch unpacked += term tree_dict[len(tree_dict) + 1] = term return unpacked if __name__ == '__main__': messages = ['ABBCBCABABCAABCAAB', 'BABAABRRRA', 'AAAAAAAAA'] for m in messages: pack = compress(m) unpack = uncompress(pack) print(unpack == m)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【數據壓縮】LZ78算法原理及實現數據壓縮算法---LZ77算法的分析與實現 JS 使用 lz-string存儲數據壓縮數據壓縮算法---霍夫曼編碼的分析與實現數據壓縮算法綜述（摘錄）【探索】利用 canvas 實現數據壓縮 python 數據壓縮數據壓縮API 二進制數據壓縮算法 mysql表數據壓縮