HBase支持snappy的前提是hadoop支持snappy,所以需要先從底層,從hadoop開始添加snappy
同時,snappy設置完成為了保險起見可以進行壓測,看看集群的效果,存儲壓縮效果和性能測試,性能測試報告點擊這里
安裝Snappy本地庫:
下載snappy:
hadoop@hadoop1$ wget https://src.fedoraproject.org/repo/pkgs/snappy/snappy-1.1.4.tar.gz/sha512/\ 873f655713611f4bdfc13ab2a6d09245681f427fbd4f6a7a880a49b8c526875dbdd623e203905450268f542be24a2dc9dae50e6acc1516af1d2ffff3f96553da/\ snappy-1.1.4.tar.gz
安裝snappy
hadoop@hadoop1$ tar zxvf snappy-1.1.4.tar.gz -C /tmp/snappy hadoop@hadoop1$ cd /tmp/snappy/snappy-1.1.4 hadoop@hadoop1$ ./autogen.sh hadoop@hadoop1$ ./configure hadoop@hadoop1$ make hadoop@hadoop1$ make install
編譯安裝默認是安裝到/usr/local/lib下的,拷貝到/usr/lib64下
hadoop@hadoop1$ sudo cp -dr /usr/local/lib/* /usr/lib64
安裝hadoop-snappy
安裝hadoop-snappy的相關依賴
hadoop@hadoop1$ sudo apt-get install pkg-config libtool automake maven -y
下載,打包hadoop-snappy
hadoop@hadoop1$ git clone https://github.com/electrum/hadoop-snappy.git hadoop@hadoop1$ cd hadoop-snappy && mvn package
Hadoop配置snappy
添加snappy本地庫到 $HADOOP_HOME/lib/native/
目錄下
hadoop@hadoop1$ cp -dr /usr/local/lib/* /opt/hadoop-3.1.3/lib/native
將hadoop-snappy-0.0.1-SNAPSHOT.jar拷貝到
$HADOOP_HOME/lib、
snappy的library拷貝到
$HADOOP_HOME/lib/native/
目錄下即可
hadoop@hadoop1$ cp -r /home/hadoop/snappy/hadoop-snappy/target/hadoop-snappy-0.0.1-SNAPSHOT.jar $HADOOP_HOME/lib
hadoop@hadoop1$ cp /home/hadoop/snappy/hadoop-snappy/target/hadoop-snappy-0.0.1-SNAPSHOT-tar/hadoop-snappy-0.0.1-SNAPSHOT/lib/native/Linux-amd64-64/* $HADOOP_HOME/lib/native/
添加配置到hadoopenv.sh
export LD_LIBRARY_PATH=/usr/local/hadoop/hadoop-3.1.3/lib/native:/usr/local/lib/
添加配置到core-site.xml
<!-- 開啟壓縮 --> <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec</value> </property> <property> <name>io.compression.codec.lzo.class</name <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property>
添加配置到mapred-site.xml
<!-- 這個參數設為true啟用壓縮 --> <property> <name>mapreduce.output.fileoutputformat.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <!-- 使用編解碼器 --> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property>
到此配置snappy完成,下面命令是驗證(其中/input是HDFS上的目錄,下面隨便丟幾個文本文件即可。同時/output目錄必須是不存在的,否則會失敗)
hadoop@hadoop1$ hadoop jar /usr/local/hadoop/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input /output
執行若成功,查看/output目錄下的文件即可
hadoop@hadoop1:~$ hadoop fs -ls /output5 Found 2 items -rw-r--r-- 2 hadoop supergroup 0 2020-08-02 06:11 /output5/_SUCCESS -rw-r--r-- 2 hadoop supergroup 6994 2020-08-02 06:11 /output5/part-r-00000.snappy
對比同樣的/input文本但未使用snappy執行的結果如下,6994(snappy)對比23635(非snappy),壓縮效果還是挺明顯的
hadoop@hadoop1:~$ hadoop fs -ls /output4 Found 2 items -rw-r--r-- 2 hadoop supergroup 0 2020-08-02 06:06 /output4/_SUCCESS -rw-r--r-- 2 hadoop supergroup 23635 2020-08-02 06:06 /output4/part-r-00000
HBase配置snappy
將hadoop-snappy-0.0.1-SNAPSHOT.jar拷貝到$HBASE_HOME/lib
目錄下,同時將$HADOOP_HOME/lib/native軟連接到$HBASE_HOME/lib/native/(native目錄沒有的話創建一個就好了)
hadoop@hadoop1$ cp /home/hadoop/hadoop-snappy/target/hadoop-snappy-0.0.1-SNAPSHOT.jar $HBASE_HOME/lib
hadoop@hadoop1$ ln -s /opt/hadoop-3.1.3/lib/native /opt/hbase-2.2.4/lib/native/Linux-amd64-64
添加配置到hbase-env.sh
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/hadoop-3.1.3/lib/native/:/usr/local/lib export HBASE_LIBRARY_PATH=$HBASE_LIBRARY_PATH:/opt/hbase-2.2.4/lib/native/Linux-amd64-64/:/usr/local/lib/ export CLASSPATH=$CLASSPATH:$HBASE_LIBRARY_PATH
添加配置到hbase-site.xml
<property> <name>hbase.regionserver.codecs</name> <value>snappy</value> </property>
然后就是驗證snappy功能
hbase org.apache.hadoop.hbase.util.CompressionTest file:///home/hadoop/ouput snappy
返回如下則為成功
hadoop@hadoop1:~$ hbase org.apache.hadoop.hbase.util.CompressionTest file:///home/hadoop/ouput snappy SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hbase-2.2.4/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2020-08-02 10:01:34,858 INFO [main] metrics.MetricRegistries: Loaded MetricRegistries class org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl 2020-08-02 10:01:34,921 INFO [main] compress.CodecPool: Got brand-new compressor [.snappy] 2020-08-02 10:01:34,924 INFO [main] compress.CodecPool: Got brand-new compressor [.snappy] 2020-08-02 10:01:34,983 INFO [main] compress.CodecPool: Got brand-new decompressor [.snappy] SUCCESS
進入hbase shell創建帶有snappy的表(這里着重強調一下 "創建表時要指定多個region,否則創建表默認一個region,壓測時就會瘋狂壓測region分布的regionserver機器上,會導致負載集中一台,進而導致壓測結果無法表達集群的性能")
hbase(main):004:0> create 'snappy-test', {NUMREGIONS => 10, SPLITALGO => 'HexStringSplit' },{ NAME => 'data', COMPRESSION => 'snappy'} Created table snappy-test Took 1.2345 seconds => Hbase::Table - snappy-test hbase(main):005:0> put 'snappy-test', '001', 'data:addr', 'beijing' Took 0.0078 seconds hbase(main):006:0> put 'snappy-test', '001', 'data:comp', 'baidu' Took 0.0036 seconds hbase(main):007:0> describe 'snappy-test' Table snappy-test is ENABLED snappy-test COLUMN FAMILIES DESCRIPTION {NAME => 'data', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false' , DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'snappy', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} 1 row(s) QUOTAS 0 row(s) Took 0.0963 seconds hbase(main):008:0>
由創建表的代碼塊可以看出,創建snappy壓縮表其實表述不是很准確, 因為看命令行可以了解到,snappy是賦予'data'列族的一個壓縮選項,而不是'snappy-test'表的屬性,所以執行desc 'snappy-test'所獲取的關於snappy的屬性本身是列族的屬性,若多個列族則可以選擇性的指定某個列族是否開啟snappy壓縮。
如下所示,我又創建一張帶有snappy的表,不過這張表有兩個列族,可以選擇指定某一個列族snappy,或者都壓縮,或者都不,或者選擇其一進行壓縮,都ok:
hbase(main):010:0> create 'snappy-test3', {NUMREGIONS => 10, SPLITALGO => 'HexStringSplit' }, {NAME => 'data', COMPRESSION => 'snappy'}, {NAME=> 'data1'} Created table snappy-test3 Took 2.3475 seconds => Hbase::Table - snappy-test3 hbase(main):012:0> desc 'snappy-test3' Table snappy-test3 is ENABLED snappy-test3 COLUMN FAMILIES DESCRIPTION {NAME => 'data', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPL ICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'SNAPPY', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} {NAME => 'data1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REP LICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} 2 row(s) QUOTAS 0 row(s) Took 0.2907 seconds hbase(main):013:0>
至此,Hadoop,HBase安裝snappy就完成了,如果有什么問題或者探討歡迎評論和聯系我,我每天都在線,歡迎討論