Hadoop3.1.2 + Hbase2.2.0 設置lzo壓縮算法:
寫在前面,在配置hbase使用lzo算法時,在網上搜了很多文章,一般都是比較老的文章,一是版本低,二是一般都是使用hadoop-gpl-compression,hadoop-gpl-compression是一個比較老的依賴包,現已被hadoop-lzo替代,希望遇到hadoop和hbase配置lzo算法時,能有所幫助
安裝lzo庫
1.下載最新的lzo庫,下載地址:http://www.oberhumer.com/opensource/lzo/download/
2.解壓lzo庫
tar -zxvf lzo-2.10.tar.gz
3.進入解壓后的lzo目錄,執行./configure --enable-shared
cd lzo-2.10 ./configure --enable-shared -prefix=/usr/local/hadoop/lzo
4.執行make進行編譯,編譯完成后,執行make install進行安裝
make && make install
如果沒有安裝lzo庫,在hbase中創建表時指定compression為lzo時會報錯:
ERROR: org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.RuntimeException: native-lzo library not available Set hbase.table.sanity.checks to false at conf or table descriptor if you want to bypass sanity checks at org.apache.hadoop.hbase.master.HMaster.warnOrThrowExceptionForFailure(HMaster.java:2314) at org.apache.hadoop.hbase.master.HMaster.sanityCheckTableDescriptor(HMaster.java:2156) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:2048) at org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:651) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:132) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.RuntimeException: native-lzo library not available at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:103) at org.apache.hadoop.hbase.master.HMaster.checkCompression(HMaster.java:2384) at org.apache.hadoop.hbase.master.HMaster.checkCompression(HMaster.java:2377) at org.apache.hadoop.hbase.master.HMaster.sanityCheckTableDescriptor(HMaster.java:2154) ... 7 more Caused by: java.lang.RuntimeException: native-lzo library not available at com.hadoop.compression.lzo.LzoCodec.getCompressorType(LzoCodec.java:135) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:168) at org.apache.hadoop.hbase.io.compress.Compression$Algorithm.getCompressor(Compression.java:355) at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:98) ... 10 more
5,庫文件被默認安裝到了/usr/local/lib,將/usr/local/lib拷貝到/usr/lib下,或者在/usr/lib下建立軟連接
cd /usr/lib ln -s /usr/local/lib/* .
6.安裝lzop wget http://www.lzop.org/download/lzop-1.04.tar.gz
tar -zxvf lzop-1.04.tar.gz ./configure -enable-shared -prefix=/usr/local/hadoop/lzop make && make install
7.把lzop復制到/usr/bin/或建立軟連接
ln -s /usr/local/hadoop/lzop/bin/lzop /usr/bin/lzop
二、安裝hadoop-lzo
1.下載hadoop-lzo ,下載地址:wget https://github.com/twitter/hadoop-lzo/archive/master.zip 這是一個zip壓縮包,如果想使用git下載,可以使用該鏈接:https://github.com/twitter/hadoop-lzo
2.編譯hadoop-lzo源碼,在編譯之前如果沒有安裝maven需要配置maven環境,解壓縮master.zip,為:hadoop-lzo-master,進入hadoop-lzo-master中,修改pom.xml中hadoop版本配置,進行maven編譯
unzip master.zip
cd hadoop-lzo-master
vim pom.xml
修改hadoop.current.version為自己對應的hadoop版本,我這里是3.1.2
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<hadoop.current.version>3.1.2</hadoop.current.version>
<hadoop.old.version>1.0.4</hadoop.old.version>
</properties>
3.在hadoop-lzo-master目錄中執行一下命令編譯hadoop-lzo:
export CFLAGS=-m64 export CXXFLAGS=-m64 export C_INCLUDE_PATH=/usr/local/hadoop/lzo/include #對應lzo安裝的目錄 export LIBRARY_PATH=/usr/local/hadoop/lzo/lib #對應lzo安裝的目錄
mvn clean package -Dmaven.test.skip=true
4.打包完成后,進入target/native/Linux-amd64-64,將libgplcompression*復制到hadoop的native中,將hadoop-lzo.xxx.jar 復制到每台hadoop的common包里
cd target/native/Linux-amd64-64 tar -cBf - -C lib . | tar -xBvf - -C ~ cp ~/libgplcompression* $HADOOP_HOME/lib/native/ cp target/hadoop-lzo-0.4.18-SNAPSHOT.jar $HADOOP_HOME/share/hadoop/common/
libgplcompression*文件:
其中libgplcompression.so和libgplcompression.so.0是鏈接文件,指向libgplcompression.so.0.0.0,
將上面生成的libgplcompression*和target/hadoop-lzo-xxx-SNAPSHOT.jar同步到集群中的所有機器對應的目錄($HADOOP_HOME/lib/native/,$HADOOP_HOME/share/hadoop/common/)。
配置hadoop環境變量
1.在$HADOOP_HOME/etc/hadoop/hadoop-env.sh文件中配置:
export LD_LIBRARY_PATH=/usr/local/lib/lzo/lib # Extra Java CLASSPATH elements. Optional. export HADOOP_CLASSPATH="<extra_entries>:$HADOOP_CLASSPATH:${HADOOP_HOME}/share/hadoop/common" export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_HOME/lib/native
2.在$HADOOP_HOME/etc/hadoop/core-site.xml加上如下配置:
<property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, com.hadoop.compression.lzo.LzoCodec, com.hadoop.compression.lzo.LzopCodec, org.apache.hadoop.io.compress.BZip2Codec </value> </property> <property> <name>io.compression.codec.lzo.class</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property>
如果沒有配置,在hbase中創建表時compression指定lzo時會報錯:
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.RuntimeException: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:103) at org.apache.hadoop.hbase.master.HMaster.checkCompression(HMaster.java:2384) at org.apache.hadoop.hbase.master.HMaster.checkCompression(HMaster.java:2377) at org.apache.hadoop.hbase.master.HMaster.sanityCheckTableDescriptor(HMaster.java:2154) ... 7 more Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec at org.apache.hadoop.hbase.io.compress.Compression$Algorithm$1.buildCodec(Compression.java:128) at org.apache.hadoop.hbase.io.compress.Compression$Algorithm$1.getCodec(Compression.java:114) at org.apache.hadoop.hbase.io.compress.Compression$Algorithm.getCompressor(Compression.java:353) at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:98) ... 10 more Caused by: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.io.compress.Compression$Algorithm$1.buildCodec(Compression.java:124) ... 13 more
3.在$HADOOP_HOME/etc/hadoop/mapred-site.xml加上如下配置:
<property> <name>mapred.compress.map.output</name> <value>true</value> </property> <property> <name>mapred.map.output.compression.codec</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property> <property> <name>mapred.child.env</name> <value>LD_LIBRARY_PATH=/usr/local/hadoop/lzo/lib</value> </property>
<property> <name>mapreduce.reduce.env</name> <value>LD_LIBRARY_PATH=/usr/local/hadoop/lzo/lib</value> </property>
<property> <name>mapred.child.env</name> <value>LD_LIBRARY_PATH=/usr/local/hadoop/lzo/lib</value> </property>
將上述修改的配置文件全部同步到集群的所有機器上,並重啟Hadoop集群,這樣就可以在Hadoop中使用lzo。
在Hbase中配置lzo
1.將hadoop-lzo-xxx.jar復制到/hbase/lib中
cp target/hadoop-lzo-0.4.18-SNAPSHOT.jar $HBASE_HOME/lib
2.在hbase/lib下創建native文件夾,在/hbase/lib/native下創建Linux-amd64-64 -> /opt/hadoop/lib/native的軟連接
ln -s /opt/hadoop/lib/native Linux-amd64-64
如圖:
3.在$HBASE_HOME/conf/hbase-env.sh中添加如下配置:
export HBASE_LIBRARY_PATH=$HBASE_LIBRARY_PATH:$HBASE_HOME/lib/native/Linux-amd64-64/:/usr/local/lib/
4.在$HBASE_HOME/conf/hbase-site.xml中添加如下配置:
<property> <name>hbase.regionserver.codecs</name> <value>lzo</value> </property>
5.啟動hbase一切正常
注意:關於hadoop-gpl-compression的說明:
hadoop-lzo-xxx的前身是hadoop-gpl-compression-xxx,之前是放在google code下管理,地址:http://code.google.com/p/hadoop-gpl-compression/ .但由於協議問題后來移植到github上,也就是現在的hadoop-lzo-xxx,github,鏈接地址:https://github.com/kevinweil/hadoop-lzo.網上介紹hadoop lzo壓縮絕大部分都是基於hadoop-gpl-compression的介紹.而hadoop-gpl-compression還是09年開發的,跟現在hadoop版本已經無法再完全兼容,會發生一些問題。因此也趟了一些坑。希望能給一些朋友一點幫助。
在使用hadoop-gpl-compression-xxx.jar時,hbase啟動會報如下錯:
2019-09-03 11:36:22,771 INFO [main] lzo.GPLNativeCodeLoader: Loaded native gpl library 2019-09-03 11:36:22,866 WARN [main] lzo.LzoCompressor: java.lang.NoSuchFieldError: lzoCompressLevelFunc 2019-09-03 11:36:22,866 ERROR [main] lzo.LzoCodec: Failed to load/initialize native-lzo library 2019-09-03 11:36:23,169 WARN [main] util.CompressionTest: Can't instantiate codec: lzo org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.RuntimeException: native-lzo library not available at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:103) at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:69) at org.apache.hadoop.hbase.regionserver.HRegionServer.checkCodecs(HRegionServer.java:834) at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:565) at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:506) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3180) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:236) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3198) Caused by: java.lang.RuntimeException: native-lzo library not available at com.hadoop.compression.lzo.LzoCodec.getCompressorType(LzoCodec.java:135) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:168) at org.apache.hadoop.hbase.io.compress.Compression$Algorithm.getCompressor(Compression.java:355) at org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:98) ... 14 more 2019-09-03 11:36:23,183 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer java.io.IOException: Compression codec lzo not supported, aborting RS construction at org.apache.hadoop.hbase.regionserver.HRegionServer.checkCodecs(HRegionServer.java:835) at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:565) at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:506) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3180) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:236) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3198) 2019-09-03 11:36:23,184 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster. at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3187) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:236) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3198) Caused by: java.io.IOException: Compression codec lzo not supported, aborting RS construction at org.apache.hadoop.hbase.regionserver.HRegionServer.checkCodecs(HRegionServer.java:835) at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:565) at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:506) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:3180) ... 5 more
當刪除hadoop-gpl-compression-xxx.jar時,替換為hadoop-lzo.xxx.jar后,再啟動hbase,一切正常:
2019-09-03 14:57:43,755 INFO [main] lzo.GPLNativeCodeLoader: Loaded native gpl library from the embedded binaries 2019-09-03 14:57:43,758 INFO [main] lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 5dbdddb8cfb544e58b4e0b9664b9d1b66657faf5] 2019-09-03 14:57:43,983 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 2019-09-03 14:57:44,088 INFO [main] compress.CodecPool: Got brand-new compressor [.lzo_deflate]