學習Linux系統時都會學習這么幾個壓縮工具:gzip、bzip2、zip、xz,以及相關的解壓工具。關於這幾個工具的使用和相互之間的壓縮比以及壓縮時間對比可以看:Linux中歸檔壓縮工具學習
那么Pigz是什么呢?簡單的說,就是支持並行壓縮的gzip。Pigz默認用當前邏輯cpu個數來並發壓縮,無法檢測個數的話,則默認並發8個線程,也可以使用-p指定線程數。需要注意的是其CPU使用比較高。
廢話不多說,開始測試。
|
1
|
$ yum install pigz
|
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
|
$ pigz --help
Usage: pigz [options] [files ...]
will compress files in place, adding the suffix '.gz'. If no files are
specified, stdin will be compressed to stdout. pigz does what gzip does,
but spreads the work over multiple processors and cores when compressing.
Options:
-0 to -9, -11 Compression level (11 is much slower, a few % better)
--fast, --best Compression levels 1 and 9 respectively
-b, --blocksize mmm Set compression block size to mmmK (default 128K)
-c, --stdout Write all processed output to stdout (won't delete)
-d, --decompress Decompress the compressed input
-f, --force Force overwrite, compress .gz, links, and to terminal
-F --first Do iterations first, before block split for -11
-h, --help Display a help screen and quit
-i, --independent Compress blocks independently for damage recovery
-I, --iterations n Number of iterations for -11 optimization
-k, --keep Do not delete original file after processing
-K, --zip Compress to PKWare zip (.zip) single entry format
-l, --list List the contents of the compressed input
-L, --license Display the pigz license and quit
-M, --maxsplits n Maximum number of split blocks for -11
-n, --no-name Do not store or restore file name in/from header
-N, --name Store/restore file name and mod time in/from header
-O --oneblock Do not split into smaller blocks for -11
-p, --processes n Allow up to n compression threads (default is the
number of online processors, or 8 if unknown)
-q, --quiet Print no messages, even on error
-r, --recursive Process the contents of all subdirectories
-R, --rsyncable Input-determined block locations for rsync
-S, --suffix .sss Use suffix .sss instead of .gz (for compression)
-t, --test Test the integrity of the compressed input
-T, --no-time Do not store or restore mod time in/from header
-v, --verbose Provide more verbose output
-V --version Show the version of pigz
-z, --zlib Compress to zlib (.zz) instead of gzip format
-- All arguments after "--" are treated as files
|
原目錄大小
|
1
2
|
$ du -sh /tmp/hadoop
2.3G /tmp/hadoop
|
使用gzip壓縮(1個線程)
|
1
2
3
4
5
6
7
8
9
|
# 壓縮耗時;
$ time tar -zvcf hadoop.tar.gz /tmp/hadoop
real 0m49.935s
user 0m46.205s
sys 0m3.449s
# 壓縮大小;
$ du -sh hadoop.tar.gz
410M hadoop.tar.gz
|
解壓gzip壓縮文件
|
1
2
3
4
5
|
$ time tar xf hadoop.tar.gz
real 0m17.226s
user 0m14.647s
sys 0m4.957s
|
使用pigz壓縮(4個線程)
|
1
2
3
4
5
6
7
8
9
|
# 壓縮耗時;
$ time tar -cf - /tmp/hadoop | pigz -p 4 > hadoop.tgz
real 0m13.596s
user 0m48.181s
sys 0m2.045s
# 壓縮大小;
$ du -sh hadoop.tgz
411M hadoop.tgz
|
解壓pigz文件
|
1
2
3
4
5
|
$ time pigz -p 4 -d hadoop.tgz
real 0m17.508s
user 0m12.973s
sys 0m5.037s
|
可以看出pigz時間上比gzip快了三分之二還多,但CPU消耗則是gzip的好幾倍,我這里只是4個線程的虛擬機,當然pigz的CPU使用率也是很可觀的哦,基本100%了。所以在對壓縮效率要求較高、但對短時間內CPU消耗較高不受影響的場景,使用pigz非常合適。
當然pigz也不是隨着線程的增加速度就越快,也有一個瓶頸區域,網上有人對比了一下:並發8線程對比4線程提升41.2%,16線程對比8線程提升27.9%,32線程對比16線程提升3%。可以看出線程數越高速度提升就越慢了。更多的可以自己測試。
轉自
Linux命令:pigz多線程壓縮工具 – 運維那點事
http://www.ywnds.com/?p=10332
參考
tar+pigz+ssh實現大數據壓縮傳輸 - 夏天公子 - 博客園
https://www.cnblogs.com/chenglee/p/7161274.html
