环境:
ubuntu
hadoop-2.6.0
hive-1.1.0
1
sudo apt-get install liblzo2-dev hadoop@idex140:~/modules/hadoop-2.6.0$ dpkg -L liblzo2-2 (查看安装包的位置) /. /usr /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/liblzo2.so.2.0.0 /usr/share /usr/share/doc /usr/share/doc/liblzo2-2 /usr/share/doc/liblzo2-2/THANKS /usr/share/doc/liblzo2-2/AUTHORS /usr/share/doc/liblzo2-2/changelog.Debian.gz /usr/share/doc/liblzo2-2/copyright /usr/share/doc/liblzo2-2/LZO.TXT.gz /usr/lib/x86_64-linux-gnu/liblzo2.so.2
2
wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.09.tar.gz
3
tar -xzvf lzo-2.09.tar.gz cd lzo-2.09 export CFLAGS=-m64 (字段64位操作系统) ./configure --enable-shared --prefix /usr/local/lzo-2.09 make && sudo make install
5
sudo apt-get install lzop
6
hadoop@master:~/hadoop-lzo$ C_INCLUDE_PATH=/usr/local/lzo-2.09/include/ \ > LIBRARY_PATH=/usr/local/lzo-2.09/lib/ \ > CXXFLAGS=-m64 \ > mvn clean package (修改hadoop.version为对应正确的版本)
7
tar -cBf - -C target/native/Linux-amd64-64/lib . | tar -xBvf - -C ~/modules/hadoop-2.6.0/lib/native/
8
cp ${HADOOP_LZO_HOME}/target/hadoop-lzo-0.4.20-SNAPSHOT.jar ${HADOOP_HOME}/share/hadoop/common/lib/ source /etc/profile
9 同步以上操作至其它节点
scp lzo-2.09.tar.gz hadoop-slave1:/home/hadoop/ scp lzo-2.09.tar.gz hadoop-slave2:/home/hadoop/ ./configure --enable-shared --prefix /usr/local/lzo-2.09 make && sudo make install sudo apt-get install liblzo2-dev sudo apt-get install lzop scp -r libgpl* hadoop-slave1:/home/hadoop/modules/hadoop-2.6.0/lib/native/ scp -r libgpl* hadoop-slave2:/home/hadoop/modules/hadoop-2.6.0/lib/native/ scp $HADOOP-LZO-HOME/target/hadoop-lzo-0.4.20-SNAPSHOT.jar hadoop-slave1:$HADOOP_HOME/share/hadoop/common/lib/ scp $HADOOP-LZO-HOME/target/hadoop-lzo-0.4.20-SNAPSHOT.jar hadoop-slave1:$HADOOP_HOME/share/hadoop/common/lib/ source /etc/profile
10 更新hadoop配置文件
(1)在文件$HADOOP_HOME/etc/hadoop/hadoop-env.sh中追加如下内容:
# add lzo environment variables export LD_LIBRARY_PATH=/usr/local/lzo-2.09/lib
(2)修改core-size.xml
<property> <name>io.compression.codecs </name> <value>org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, com.hadoop.compression.lzo.LzoCodec, com.hadoop.compression.lzo.LzopCodec, org.apache.hadoop.io.compress.BZip2Codec, org.apache.hadoop.io.compress.SnappyCodec</value> </property> <property> <name>io.compression.codec.lzo.class </name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property>
(3)修改mapred-site.xml
<property> <name>mapred.child.env </name> <value>LD_LIBRARY_PATH =/usr/local/lzo-2.09/lib </value> </property> <property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>BLOCK</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress</name> <value>false</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.DefaultCodec</value> </property>
PS:
中间结果压缩
hadoop设置或者hive设置 | 属性名称(最新名称) | 默认值 | 过时属性名称 |
hadoop job | mapreduce.map.output.compress | false | mapred.compress.map.output |
mapreduce.map.output.compress.codec | org.apache.hadoop.io.compress.DefaultCodec |
mapred.map.output.compression.codec | |
hive job | hive.exec.compress.intermediate | false |
最终输出结果压缩
hadoop设置或者hive设置 | 属性名称(最新名称) | 默认值 | 过时属性名称 |
hadoop job | mapreduce.output.fileoutputformat.compress | false | mapred.output.compress |
mapreduce.output.fileoutputformat.compress.type | RECORD | mapred.output.compression.type | |
mapreduce.output.fileoutputformat.compress.codec | org.apache.hadoop.io.compress.DefaultCodec | mapred.output.compression.codec | |
hive job | hive.exec.compress.output | false |
11 hive创建支持存储lzo压缩数据的测试表
CREATE TABLE rawdata( appkey string, uid string, uidtype string ) COMMENT 'This is the staging of raw data' PARTITIONED BY (day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';