Linux命令：pigz多線程壓縮工具【轉】

本文轉載自查看原文 2018-05-09 17:23 2786 Linux

學習Linux系統時都會學習這么幾個壓縮工具：gzip、bzip2、zip、xz，以及相關的解壓工具。關於這幾個工具的使用和相互之間的壓縮比以及壓縮時間對比可以看：Linux中歸檔壓縮工具學習

那么Pigz是什么呢？簡單的說，就是支持並行壓縮的gzip。Pigz默認用當前邏輯cpu個數來並發壓縮，無法檢測個數的話，則默認並發8個線程，也可以使用-p指定線程數。需要注意的是其CPU使用比較高。

廢話不多說，開始測試。

1	$ yum install pigz

$ pigz --help

Usage: pigz [options] [files ...]

will compress files in place, adding the suffix '.gz'. If no files are

specified, stdin will be compressed to stdout. pigz does what gzip does,

but spreads the work over multiple processors and cores when compressing.

Options:

-0 to -9, -11 Compression level (11 is much slower, a few % better)

--fast, --best Compression levels 1 and 9 respectively

-b, --blocksize mmm Set compression block size to mmmK (default 128K)

-c, --stdout Write all processed output to stdout (won't delete)

-d, --decompress Decompress the compressed input

-f, --force Force overwrite, compress .gz, links, and to terminal

-F --first Do iterations first, before block split for -11

-h, --help Display a help screen and quit

-i, --independent Compress blocks independently for damage recovery

-I, --iterations n Number of iterations for -11 optimization

-k, --keep Do not delete original file after processing

-K, --zip Compress to PKWare zip (.zip) single entry format

-l, --list List the contents of the compressed input

-L, --license Display the pigz license and quit

-M, --maxsplits n Maximum number of split blocks for -11

-n, --no-name Do not store or restore file name in/from header

-N, --name Store/restore file name and mod time in/from header

-O --oneblock Do not split into smaller blocks for -11

-p, --processes n Allow up to n compression threads (default is the

number of online processors, or 8 if unknown)

-q, --quiet Print no messages, even on error

-r, --recursive Process the contents of all subdirectories

-R, --rsyncable Input-determined block locations for rsync

-S, --suffix .sss Use suffix .sss instead of .gz (for compression)

-t, --test Test the integrity of the compressed input

-T, --no-time Do not store or restore mod time in/from header

-v, --verbose Provide more verbose output

-V --version Show the version of pigz

-z, --zlib Compress to zlib (.zz) instead of gzip format

-- All arguments after "--" are treated as files

原目錄大小

1 2	$ du -sh /tmp/hadoop 2.3G /tmp/hadoop

使用gzip壓縮（1個線程）

# 壓縮耗時;

$ time tar -zvcf hadoop.tar.gz /tmp/hadoop

real 0m49.935s

user 0m46.205s

sys 0m3.449s

# 壓縮大小;

$ du -sh hadoop.tar.gz

410M hadoop.tar.gz

解壓gzip壓縮文件

$ time tar xf hadoop.tar.gz

real 0m17.226s

user 0m14.647s

sys 0m4.957s

使用pigz壓縮（4個線程）

# 壓縮耗時;

$ time tar -cf - /tmp/hadoop | pigz -p 4 > hadoop.tgz

real 0m13.596s

user 0m48.181s

sys 0m2.045s

# 壓縮大小;

$ du -sh hadoop.tgz

411M hadoop.tgz

解壓pigz文件

$ time pigz -p 4 -d hadoop.tgz

real 0m17.508s

user 0m12.973s

sys 0m5.037s

可以看出pigz時間上比gzip快了三分之二還多，但CPU消耗則是gzip的好幾倍，我這里只是4個線程的虛擬機，當然pigz的CPU使用率也是很可觀的哦，基本100%了。所以在對壓縮效率要求較高、但對短時間內CPU消耗較高不受影響的場景，使用pigz非常合適。

當然pigz也不是隨着線程的增加速度就越快，也有一個瓶頸區域，網上有人對比了一下：並發8線程對比4線程提升41.2%，16線程對比8線程提升27.9%，32線程對比16線程提升3%。可以看出線程數越高速度提升就越慢了。更多的可以自己測試。

轉自

Linux命令：pigz多線程壓縮工具 – 運維那點事
http://www.ywnds.com/?p=10332

參考

tar+pigz+ssh實現大數據壓縮傳輸 - 夏天公子 - 博客園
https://www.cnblogs.com/chenglee/p/7161274.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 24、Linux 多線程壓縮工具pigz 的學習 Linux並行壓縮工具pigz安裝及使用 Json壓縮工具 linux下壓縮神器pigz Linux下最常用的10個文件壓縮工具 js和css壓縮工具 pigz更快的壓縮和解壓工具 js 解壓縮和壓縮工具 Linux操作系統的壓縮、解壓縮工具介紹高清視頻壓縮工具ffmpeg