近日需要在畢業設計中引入一個壓縮庫,要求壓縮與解壓縮速度快,但是壓縮率可以不那么苛刻。查找資料發現Google的snappy庫比較合適,而且該庫開源,由C++寫成。所以就拿來使用一下,下面權作記錄。下面引出的任何涉及Google公司的源代碼,版權歸Google公司所有,我權作學習交流。文章安排如下,首先簡要介紹Snappy,之后安裝之,然后以實際例子介紹如何使用,接着bzip2和gzip做了性能比較,最后提出一些使用上面的疑問。
(一)簡要介紹
去官網下載之http://code.google.com/p/snappy/。在Project Home處有這么一段英文,我想許多地方都引用和翻譯了這段。我也嘗試翻譯一下。
Snappy is a compression/decompression library.
It does not aim for maximum compression,
or compatibility with any other compression library;
instead, it aims for veryhigh speeds and reasonable compression.
For instance, compared to the fastest mode of zlib,
Snappy is an order of magnitude faster for most inputs,
but the resulting compressed files are anywhere from 20% to 100% bigger.
On a single core of a Core i7 processor in 64-bit mode,
Snappy compresses at about 250 MB/sec or more and
decompresses at about 500 MB/sec or more.
Snappy is widely used inside Google, in everything from BigTable
and MapReduce to our internal RPC systems.
譯文:Snappy是一個壓縮/解壓縮庫。它不是以最大壓縮率,或者與其他壓縮庫兼容為目標;它旨在獲得高速的壓縮和合理的壓縮率。例如,Snappy對大多數的輸入比zlib的最快模式要快幾個數量級,但是其壓縮過后的文件通常會比zlib大20%到100%。在Core i7的單核64位模式下,Snappy壓縮速度大概可以達到250MB/s或者更快,解壓縮可以達到大約500MB/s或更快。
Snappy在Google內部廣泛使用,從BigTable,MapReduce到公司內部的RPC系統。
(二)安裝過程
下面描述安裝過程:
下載snappy-1.0.5.tar.gz,snappy的安裝過程與傳統的安裝過程一樣。解壓后的INSTALL文件有詳細的安裝說明。
gunzip snappy-1.0.5.tar.gz
tar xf snappy-1.0.5.tar
cd snappy-1.0.5
./configure
make
make install
安裝完成后,生成的動態庫和靜態庫位於/usr/local/lib處,編程需要用到的頭文件位於/usr/local/include處。注意需要將這些庫文件cp至/usr/lib處,不然就算在鏈接的時候加上-L/usr/local/lib,在運行時也會報錯。./main: error while loading shared libraries: libsnappy.so.1:
cannot open shared object file: No such file or directory
當然這是我的LD_LIBRARY_PATH環境變量的設置問題。
(三)使用snappy
解壓出來的README文件介紹了一簡單的使用方式。snappy是各種庫標示符所在的命名空間。C++使用需要包含#include <snappy.h>頭文件,C語言使用需要包含#include<snapyy-c.h>頭文件。Snappy使用較為簡單,我指的是跟bzip2的庫比起來。所有的函數接口都暴露在上述兩個頭文件中,頭文件中有詳細的使用說明,並有簡單的示例,而且英文通俗易懂。摘抄如下(Google公司版權所有):
snappy.h
- // Copyright 2005 and onwards Google Inc.
- //
- // Redistribution and use in source and binary forms, with or without
- // modification, are permitted provided that the following conditions are
- // met:
- //
- // * Redistributions of source code must retain the above copyright
- // notice, this list of conditions and the following disclaimer.
- // * Redistributions in binary form must reproduce the above
- // copyright notice, this list of conditions and the following disclaimer
- // in the documentation and/or other materials provided with the
- // distribution.
- // * Neither the name of Google Inc. nor the names of its
- // contributors may be used to endorse or promote products derived from
- // this software without specific prior written permission.
- //
- // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- //
- // A light-weight compression algorithm. It is designed for speed of
- // compression and decompression, rather than for the utmost in space
- // savings.
- //
- // For getting better compression ratios when you are compressing data
- // with long repeated sequences or compressing data that is similar to
- // other data, while still compressing fast, you might look at first
- // using BMDiff and then compressing the output of BMDiff with
- // Snappy.
- #ifndef UTIL_SNAPPY_SNAPPY_H__
- #define UTIL_SNAPPY_SNAPPY_H__
- #include <stddef.h>
- #include <string>
- #include "snappy-stubs-public.h"
- namespace snappy {
- class Source;
- class Sink;
- // ------------------------------------------------------------------------
- // Generic compression/decompression routines.
- // ------------------------------------------------------------------------
- // Compress the bytes read from "*source" and append to "*sink". Return the
- // number of bytes written.
- size_t Compress(Source* source, Sink* sink);
- bool GetUncompressedLength(Source* source, uint32* result);
- // ------------------------------------------------------------------------
- // Higher-level string based routines (should be sufficient for most users)
- // ------------------------------------------------------------------------
- // Sets "*output" to the compressed version of "input[0,input_length-1]".
- // Original contents of *output are lost.
- //
- // REQUIRES: "input[]" is not an alias of "*output".
- size_t Compress(const char* input, size_t input_length, string* output);
- // Decompresses "compressed[0,compressed_length-1]" to "*uncompressed".
- // Original contents of "*uncompressed" are lost.
- //
- // REQUIRES: "compressed[]" is not an alias of "*uncompressed".
- //
- // returns false if the message is corrupted and could not be decompressed
- bool Uncompress(const char* compressed, size_t compressed_length,
- string* uncompressed);
- // ------------------------------------------------------------------------
- // Lower-level character array based routines. May be useful for
- // efficiency reasons in certain circumstances.
- // ------------------------------------------------------------------------
- // REQUIRES: "compressed" must point to an area of memory that is at
- // least "MaxCompressedLength(input_length)" bytes in length.
- //
- // Takes the data stored in "input[0..input_length]" and stores
- // it in the array pointed to by "compressed".
- //
- // "*compressed_length" is set to the length of the compressed output.
- //
- // Example:
- // char* output = new char[snappy::MaxCompressedLength(input_length)];
- // size_t output_length;
- // RawCompress(input, input_length, output, &output_length);
- // ... Process(output, output_length) ...
- // delete [] output;
- void RawCompress(const char* input,
- size_t input_length,
- char* compressed,
- size_t* compressed_length);
- // Given data in "compressed[0..compressed_length-1]" generated by
- // calling the Snappy::Compress routine, this routine
- // stores the uncompressed data to
- // uncompressed[0..GetUncompressedLength(compressed)-1]
- // returns false if the message is corrupted and could not be decrypted
- bool RawUncompress(const char* compressed, size_t compressed_length,
- char* uncompressed);
- // Given data from the byte source 'compressed' generated by calling
- // the Snappy::Compress routine, this routine stores the uncompressed
- // data to
- // uncompressed[0..GetUncompressedLength(compressed,compressed_length)-1]
- // returns false if the message is corrupted and could not be decrypted
- bool RawUncompress(Source* compressed, char* uncompressed);
- // Returns the maximal size of the compressed representation of
- // input data that is "source_bytes" bytes in length;
- size_t MaxCompressedLength(size_t source_bytes);
- // REQUIRES: "compressed[]" was produced by RawCompress() or Compress()
- // Returns true and stores the length of the uncompressed data in
- // *result normally. Returns false on parsing error.
- // This operation takes O(1) time.
- bool GetUncompressedLength(const char* compressed, size_t compressed_length,
- size_t* result);
- // Returns true iff the contents of "compressed[]" can be uncompressed
- // successfully. Does not return the uncompressed data. Takes
- // time proportional to compressed_length, but is usually at least
- // a factor of four faster than actual decompression.
- bool IsValidCompressedBuffer(const char* compressed,
- size_t compressed_length);
- // *** DO NOT CHANGE THE VALUE OF kBlockSize ***
- //
- // New Compression code chops up the input into blocks of at most
- // the following size. This ensures that back-references in the
- // output never cross kBlockSize block boundaries. This can be
- // helpful in implementing blocked decompression. However the
- // decompression code should not rely on this guarantee since older
- // compression code may not obey it.
- static const int kBlockLog = 15;
- static const size_t kBlockSize = 1 << kBlockLog;
- static const int kMaxHashTableBits = 14;
- static const size_t kMaxHashTableSize = 1 << kMaxHashTableBits;
- } // end namespace snappy
- #endif // UTIL_SNAPPY_SNAPPY_H__
snapp-c.h
- /*
- * Copyright 2011 Martin Gieseking <martin.gieseking@uos.de>.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions are
- * met:
- *
- * * Redistributions of source code must retain the above copyright
- * notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above
- * copyright notice, this list of conditions and the following disclaimer
- * in the documentation and/or other materials provided with the
- * distribution.
- * * Neither the name of Google Inc. nor the names of its
- * contributors may be used to endorse or promote products derived from
- * this software without specific prior written permission.
- *
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- *
- * Plain C interface (a wrapper around the C++ implementation).
- */
- #ifndef UTIL_SNAPPY_OPENSOURCE_SNAPPY_C_H_
- #define UTIL_SNAPPY_OPENSOURCE_SNAPPY_C_H_
- #ifdef __cplusplus
- extern "C" {
- #endif
- #include <stddef.h>
- /*
- * Return values; see the documentation for each function to know
- * what each can return.
- */
- typedef enum {
- SNAPPY_OK = 0,
- SNAPPY_INVALID_INPUT = 1,
- SNAPPY_BUFFER_TOO_SMALL = 2,
- } snappy_status;
- /*
- * Takes the data stored in "input[0..input_length-1]" and stores
- * it in the array pointed to by "compressed".
- *
- * <compressed_length> signals the space available in "compressed".
- * If it is not at least equal to "snappy_max_compressed_length(input_length)",
- * SNAPPY_BUFFER_TOO_SMALL is returned. After successful compression,
- * <compressed_length> contains the true length of the compressed output,
- * and SNAPPY_OK is returned.
- *
- * Example:
- * size_t output_length = snappy_max_compressed_length(input_length);
- * char* output = (char*)malloc(output_length);
- * if (snappy_compress(input, input_length, output, &output_length)
- * == SNAPPY_OK) {
- * ... Process(output, output_length) ...
- * }
- * free(output);
- */
- snappy_status snappy_compress(const char* input,
- size_t input_length,
- char* compressed,
- size_t* compressed_length);
- /*
- * Given data in "compressed[0..compressed_length-1]" generated by
- * calling the snappy_compress routine, this routine stores
- * the uncompressed data to
- * uncompressed[0..uncompressed_length-1].
- * Returns failure (a value not equal to SNAPPY_OK) if the message
- * is corrupted and could not be decrypted.
- *
- * <uncompressed_length> signals the space available in "uncompressed".
- * If it is not at least equal to the value returned by
- * snappy_uncompressed_length for this stream, SNAPPY_BUFFER_TOO_SMALL
- * is returned. After successful decompression, <uncompressed_length>
- * contains the true length of the decompressed output.
- *
- * Example:
- * size_t output_length;
- * if (snappy_uncompressed_length(input, input_length, &output_length)
- * != SNAPPY_OK) {
- * ... fail ...
- * }
- * char* output = (char*)malloc(output_length);
- * if (snappy_uncompress(input, input_length, output, &output_length)
- * == SNAPPY_OK) {
- * ... Process(output, output_length) ...
- * }
- * free(output);
- */
- snappy_status snappy_uncompress(const char* compressed,
- size_t compressed_length,
- char* uncompressed,
- size_t* uncompressed_length);
- /*
- * Returns the maximal size of the compressed representation of
- * input data that is "source_length" bytes in length.
- */
- size_t snappy_max_compressed_length(size_t source_length);
- /*
- * REQUIRES: "compressed[]" was produced by snappy_compress()
- * Returns SNAPPY_OK and stores the length of the uncompressed data in
- * *result normally. Returns SNAPPY_INVALID_INPUT on parsing error.
- * This operation takes O(1) time.
- */
- snappy_status snappy_uncompressed_length(const char* compressed,
- size_t compressed_length,
- size_t* result);
- /*
- * Check if the contents of "compressed[]" can be uncompressed successfully.
- * Does not return the uncompressed data; if so, returns SNAPPY_OK,
- * or if not, returns SNAPPY_INVALID_INPUT.
- * Takes time proportional to compressed_length, but is usually at least a
- * factor of four faster than actual decompression.
- */
- snappy_status snappy_validate_compressed_buffer(const char* compressed,
- size_t compressed_length);
- #ifdef __cplusplus
- } // extern "C"
- #endif
- #endif /* UTIL_SNAPPY_OPENSOURCE_SNAPPY_C_H_ */