hadoop, hive 啟用LZO壓縮


環境:

   ubuntu

   hadoop-2.6.0 

   hive-1.1.0

 
1  
sudo apt-get install liblzo2-dev
 
hadoop@idex140:~/modules/hadoop-2.6.0$ dpkg -L liblzo2-2  (查看安裝包的位置)
/.
/usr
/usr/lib
/usr/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu/liblzo2.so.2.0.0
/usr/share
/usr/share/doc
/usr/share/doc/liblzo2-2
/usr/share/doc/liblzo2-2/THANKS
/usr/share/doc/liblzo2-2/AUTHORS
/usr/share/doc/liblzo2-2/changelog.Debian.gz
/usr/share/doc/liblzo2-2/copyright
/usr/share/doc/liblzo2-2/LZO.TXT.gz
/usr/lib/x86_64-linux-gnu/liblzo2.so.2

 

2  
wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.09.tar.gz

 

3  
tar -xzvf lzo-2.09.tar.gz  
cd lzo-2.09
export CFLAGS=-m64 (字段64位操作系統)
./configure --enable-shared --prefix /usr/local/lzo-2.09
make && sudo make install

 

5  
sudo apt-get install lzop

 

6  
hadoop@master:~/hadoop-lzo$ C_INCLUDE_PATH=/usr/local/lzo-2.09/include/ \
   > LIBRARY_PATH=/usr/local/lzo-2.09/lib/ \
   > CXXFLAGS=-m64 \
   > mvn clean package  (修改hadoop.version為對應正確的版本)

 

7   

tar -cBf - -C target/native/Linux-amd64-64/lib . | tar -xBvf - -C ~/modules/hadoop-2.6.0/lib/native/

  

8  

cp ${HADOOP_LZO_HOME}/target/hadoop-lzo-0.4.20-SNAPSHOT.jar  ${HADOOP_HOME}/share/hadoop/common/lib/
source /etc/profile

 

9  同步以上操作至其它節點 

 scp lzo-2.09.tar.gz  hadoop-slave1:/home/hadoop/
 scp lzo-2.09.tar.gz  hadoop-slave2:/home/hadoop/
 
 ./configure --enable-shared --prefix /usr/local/lzo-2.09
 make && sudo make install
 
 sudo apt-get install liblzo2-dev
 sudo apt-get install lzop
 
 scp -r libgpl* hadoop-slave1:/home/hadoop/modules/hadoop-2.6.0/lib/native/
 scp -r libgpl* hadoop-slave2:/home/hadoop/modules/hadoop-2.6.0/lib/native/
 
 scp   $HADOOP-LZO-HOME/target/hadoop-lzo-0.4.20-SNAPSHOT.jar hadoop-slave1:$HADOOP_HOME/share/hadoop/common/lib/
 scp   $HADOOP-LZO-HOME/target/hadoop-lzo-0.4.20-SNAPSHOT.jar hadoop-slave1:$HADOOP_HOME/share/hadoop/common/lib/
 source /etc/profile

 

10 更新hadoop配置文件

   (1)在文件$HADOOP_HOME/etc/hadoop/hadoop-env.sh中追加如下內容:
# add lzo environment variables
export LD_LIBRARY_PATH=/usr/local/lzo-2.09/lib

   (2)修改core-size.xml

 
      <property>
        <name>io.compression.codecs </name>
        <value>org.apache.hadoop.io.compress.GzipCodec,
          org.apache.hadoop.io.compress.DefaultCodec,
          com.hadoop.compression.lzo.LzoCodec,
          com.hadoop.compression.lzo.LzopCodec,
          org.apache.hadoop.io.compress.BZip2Codec,
          org.apache.hadoop.io.compress.SnappyCodec</value>
      </property>
      <property>
        <name>io.compression.codec.lzo.class </name>
        <value>com.hadoop.compression.lzo.LzoCodec</value>
      </property>
   (3)修改mapred-site.xml
 
      <property>
       <name>mapred.child.env </name>
        <value>LD_LIBRARY_PATH =/usr/local/lzo-2.09/lib </value>
      </property>
       <property>
        <name>mapreduce.map.output.compress</name>
        <value>true</value>
      </property>
      <property>
        <name>mapreduce.map.output.compress.codec</name>
        <value>com.hadoop.compression.lzo.LzoCodec</value>
      </property>
      <property>
       <name>mapreduce.output.fileoutputformat.compress.type</name>
       <value>BLOCK</value>
      </property>
      <property>
       <name>mapreduce.output.fileoutputformat.compress</name>
       <value>false</value>
      </property>
      <property>
       <name>mapreduce.output.fileoutputformat.compress.codec</name>
       <value>org.apache.hadoop.io.compress.DefaultCodec</value>
      </property>

 PS:

       中間結果壓縮
 
hadoop設置或者hive設置 屬性名稱(最新名稱) 默認值 過時屬性名稱
hadoop job mapreduce.map.output.compress false mapred.compress.map.output
mapreduce.map.output.compress.codec org.apache.hadoop.io.compress.DefaultCodec
mapred.map.output.compression.codec
hive   job hive.exec.compress.intermediate false  
 
       最終輸出結果壓縮
 
hadoop設置或者hive設置 屬性名稱(最新名稱) 默認值 過時屬性名稱
hadoop job mapreduce.output.fileoutputformat.compress  false mapred.output.compress
mapreduce.output.fileoutputformat.compress.type RECORD mapred.output.compression.type
mapreduce.output.fileoutputformat.compress.codec org.apache.hadoop.io.compress.DefaultCodec mapred.output.compression.codec
hive       job hive.exec.compress.output false  
 
11  hive創建支持存儲lzo壓縮數據的測試表
 
    CREATE TABLE rawdata(
      appkey string, uid string, uidtype string                            
    )                
    COMMENT 'This is the staging of raw data'
    PARTITIONED BY (day INT)
    ROW FORMAT DELIMITED 
    FIELDS TERMINATED BY '\t' 
    STORED AS INPUTFORMAT 
      'com.hadoop.mapred.DeprecatedLzoTextInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; 
 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM