Impala1.2.4安裝和配置


Impala1.2.4安裝手冊

安裝前說明:

1、  安全性考慮,我們使用hive用到的賬戶cup進行impala的啟停等操作,而不另外使用impala賬戶;這涉及到后文中的一些文件夾權限調整、配置文件中的用戶參數調整;

2、  性能考慮,impala-state-store、impala-catalog這兩個服務安裝在hadoop集群的namenode上面,impala-server、impala-shell服務安裝在各個datanode上,namenode上不安裝使用impala-server;

3、  在安裝impala相關軟件包的時候使用root賬戶,之后再將相關文件所有者修改為cup賬戶;

4、  啟停impala服務需要root權限的賬號;

5、  安裝步驟參照官方文檔:

http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/Installing-and-Using-Impala.html

安裝Impala軟件包

下載所需要的安裝包,根據需要選擇合適的版本(由於我們用的是CDH4.2.1版本,所以選擇了impala1.2.4):

http://archive.cloudera.com/impala/redhat/6/x86_64/impala/

 

在Hadoop集群的namenode節點上依次安裝以下的包:

rpm -ivh ./bigtop-utils-0.4+300-1.cdh4.0.1.p0.1.el6.noarch.rpm

rpm -ivh ./impala-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-state-store-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-server-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-catalog-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-udf-devel-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-shell-1.2.4-1.p0.420.el6.x86_64.rpm

注意:impala的安裝依賴這個包:bigtop-utils-0.4+300-1.cdh4.0.1.p0.1.el6.noarch.rpm,這個包在官網1.2.4版本的目錄中找不到,需要在1.2.3或者其他版本的目錄中下載。

 

在其它datanode節點上依次安裝以下的包:

rpm -ivh ./bigtop-utils-0.4+300-1.cdh4.0.1.p0.1.el6.noarch.rpm

rpm -ivh ./impala-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-server-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-catalog-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-udf-devel-1.2.4-1.p0.420.el6.x86_64.rpm

rpm -ivh ./impala-shell-1.2.4-1.p0.420.el6.x86_64.rpm

 

查看安裝之后的impala路徑:

[root@cup-slave-11 cup]# find / -name impala

/etc/alternatives/impala

/etc/impala

/etc/default/impala

/var/log/impala

/var/lib/alternatives/impala

/var/lib/impala

/var/run/impala

/usr/lib/impala

Impala配置

在hdfs-site.xml文件中添加如下內容:

<property>

    <name>dfs.client.read.shortcircuit</name>

    <value>true</value>

</property>

<property>

    <name>dfs.domain.socket.path</name>

    <value>/var/run/hadoop-hdfs/dn._PORT</value>

</property>

<property>

  <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>

  <value>true</value>

</property>

<property>

   <name>dfs.client.use.legacy.blockreader.local</name>

   <value>false</value>

</property>

<property>

   <name>dfs.datanode.data.dir.perm</name>

   <value>750</value>

</property>

<property>

   <name>dfs.block.local-path-access.user</name>

   <value>cup</value>

</property>

<property>

   <name>dfs.client.file-block-storage-locations.timeout</name>

   <value>3000</value>

</property>

 

添加配置文件:

impalad的配置文件路徑由環境變量IMPALA_CONF_DIR指定,默認為/etc/impala/conf,拷貝配置好的hive-site.xml、core-site.xml、hdfs-site.xml、hbase-site.xml文件至/etc/impala/conf目錄下。

 

將相關so文件拷貝到hadoop的lib目錄(如果目標目錄有這些文件,可以忽略此步驟):

cp /usr/lib/impala/lib/*.so* $HADOOP_HOME/lib/native/

 

用$HIVE_HOME/lib目錄下帶“datanucleus”字樣的文件替換/usr/lib/impala/lib目錄下對應文件(名稱要改成跟/usr/lib/impala/lib原來的一樣);不然在啟動impala-state-store 和impala-catalog的時候會報錯,詳見異常3、異常5。

 

復制$HADOOP_HOME/lib 目錄下的mysql-connector-java.jar文件到“/usr/share/java”目錄,因為impala的catalogd要使用(注意mysql驅動包的名稱一定要是mysql-connector-java.jar):

[root@cup-slave-11 native]# more /usr/bin/catalogd

#!/bin/bash

 

export IMPALA_BIN=${IMPALA_BIN:-/usr/lib/impala/sbin}

export IMPALA_HOME=${IMPALA_HOME:-/usr/lib/impala}

export HIVE_HOME=${HIVE_HOME:-/usr/lib/hive}

export HBASE_HOME=${HBASE_HOME:-/usr/lib/hbase}

export IMPALA_CONF_DIR=${IMPALA_CONF_DIR:-/etc/impala/conf}

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/impala/conf}

export HIVE_CONF_DIR=${HIVE_CONF_DIR:-/etc/impala/conf}

export HBASE_CONF_DIR=${HBASE_CONF_DIR:-/etc/impala/conf}

export LIBHDFS_OPTS=${LIBHDFS_OPTS:--Djava.library.path=/usr/lib/impala/lib}

export MYSQL_CONNECTOR_JAR=${MYSQL_CONNECTOR_JAR:-/usr/share/java/mysql-connector-java.jar}

 

根據實際環境修改impala配置信息:

[root@cup-master-1 ~]# vi /etc/default/impala

 

IMPALA_STATE_STORE_HOST=10.204.193.10

IMPALA_STATE_STORE_PORT=24000

IMPALA_BACKEND_PORT=22000

IMPALA_LOG_DIR=/var/log/impala

 

IMPALA_CATALOG_ARGS=" -log_dir=${IMPALA_LOG_DIR} "

IMPALA_STATE_STORE_ARGS=" -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}"

IMPALA_SERVER_ARGS=" \

    -log_dir=${IMPALA_LOG_DIR} \

    -state_store_port=${IMPALA_STATE_STORE_PORT} \

    -use_statestore \

    -state_store_host=${IMPALA_STATE_STORE_HOST} \

    -be_port=${IMPALA_BACKEND_PORT}"

 

ENABLE_CORE_DUMPS=false

 

# LIBHDFS_OPTS=-Djava.library.path=/usr/lib/impala/lib

MYSQL_CONNECTOR_JAR=/usr/share/java/mysql-connector-java.jar

IMPALA_BIN=/usr/lib/impala/sbin

IMPALA_HOME=/usr/lib/impala

HIVE_HOME=/home/cup/hive-0.10.0-cdh4.2.1

HBASE_HOME=/home/cup/hbase-0.94.2-cdh4.2.1

IMPALA_CONF_DIR=/etc/impala/conf

HADOOP_CONF_DIR=/etc/impala/conf

HIVE_CONF_DIR=/etc/impala/conf

HBASE_CONF_DIR=/etc/impala/conf

 

根據實際環境修改impala相關腳本文件/etc/init.d/impala-state-store、/etc/init.d/impala-server、/etc/init.d/impala-catalog,修改其中兩處跟用戶相關的地方:

DAEMON="catalogd"

DESC="Impala Catalog Server"

EXEC_PATH="/usr/bin/catalogd"

SVC_USER="cup"  ###編者注:這里默認是impala

DAEMON_FLAGS="${IMPALA_CATALOG_ARGS}"

CONF_DIR="/etc/impala/conf"

PIDFILE="/var/run/impala/catalogd-impala.pid"

LOCKDIR="/var/lock/subsys"

LOCKFILE="$LOCKDIR/catalogd"

 

install -d -m 0755 -o cup -g cup /var/run/impala 1>/dev/null 2>&1 || :

[ -d "$LOCKDIR" ] || install -d -m 0755 $LOCKDIR 1>/dev/null 2>&1 || :

 

 

在hdfs上創建impala目錄:

hadoop dfs -mkdir /user/impala

 

在每個節點上創建/var/run/hadoop-hdfs,因為hdfs-site.xml文件的dfs.domain.socket.path參數指定了這個目錄:

[root@cup-slave-11 impala]# mkdir /var/run/hadoop-hdfs

 

將/var/run/hadoop-hdfs和/var/log/impala目錄的所有權賦給cup賬戶和cup用戶組,不然在啟動impala-server的時候會出現異常4:

chown -R cup:cup /var/log/impala

chown -R cup:cup /var/run/hadoop-hdfs

啟動Impala服務

啟動namenode節點上impala的state-store服務:

sudo service impala-state-store start

啟動namenode節點上impala的catalog服務:

sudo service impala-catalog start

啟動datanode節點上impala的impala-server服務:

sudo service impala-server start

停止namenode節點上impala的state-store服務:

sudo service impala-state-store stop

停止namenode節點上impala的catalog服務:

sudo service impala-catalog stop

停止datanode節點上impala的impala-server服務:

sudo service impala-server stop

 

注意:少數情況下啟動impala服務雖然沒有明顯的錯誤提示,但是也有可能並未啟動成功,需要觀察/var/log/impala中是否有error字樣的錯誤日志,如果有的話需要進一步核查。

確認Impala正常使用

查看datanode上面的impala進程是否存在:

[cup@cup-master-1 ~]$ ps -ef|grep impala

cup       5522 45968  0 08:58 pts/25   00:00:00 grep impala

cup       8292     1  0 Mar27 ?        00:01:06 /usr/lib/impala/sbin/statestored -log_dir=/var/log/impala -state_store_port=24000

 

查看datanode上面的impala-server進程是否存在:

[cup@cup-slave-11 ~]$ ps -ef|grep impala

cup       15630  15599  0 09:24 pts/0    00:00:00 grep impala

cup      112216      1  0 Mar27 ?        00:01:15 /usr/lib/impala/sbin/impalad -log_dir=/var/log/impala -state_store_port=24000 -use_statestore -state_store_host=10.204.193.10 -be_port=22000

 

訪問datanode上impala的web頁面,默認端口25010:

  

訪問datanode上面impala的web頁面,默認端口25000:

 

在安裝了impala-shell的節點執行sql語句:

[cup@cup-slave-11 ~]$ impala-shell

Starting Impala Shell without Kerberos authentication

Connected to cup-slave-11:21000

Server version: impalad version 1.2.4 RELEASE (build ac29ae09d66c1244fe2ceb293083723226e66c1a)

Welcome to the Impala shell. Press TAB twice to see a list of available commands.

 

Copyright (c) 2012 Cloudera, Inc. All rights reserved.

 

(Shell build version: Impala Shell v1.2.4 (ac29ae0) built on Wed Mar  5 07:05:40 PST 2014)

[cup-slave-11:21000] > show databases;

Query: show databases

+---------+

| name    |

+---------+

| cloudup |

| default |

| xhyt    |

+---------+

Returned 3 row(s) in 0.01s

[cup-slave-11:21000] > use cloudup;

Query: use cloudup

[cup-slave-11:21000] > select * from url_read_typ_rel limit 5;

Query: select * from url_read_typ_rel limit 5

+----------------------+---------+---------+---------+---------+--------+-----+

| urlhash              | rtidlv1 | rtyplv1 | rtidlv2 | rtyplv2 | isttim | url |

+----------------------+---------+---------+---------+---------+--------+-----+

| 2160609062987073557  | 3       | 股票    | NULL    |         | NULL   |     |

| 8059679893178527423  | 3       | 股票    | NULL    |         | NULL   |     |

| -404610021015528651  | 2       | 房產    | NULL    |         | NULL   |     |

| -6322366252916938780 | 5       | 教育    | NULL    |         | NULL   |     |

| -6821513749785855580 | 12      | 游戲    | NULL    |         | NULL   |     |

+----------------------+---------+---------+---------+---------+--------+-----+

Returned 5 row(s) in 0.61s

 

常見異常:

異常1:

在啟停state-store的時候會報錯:

[root@cup-master-1 ~]# service impala-state-store start

/etc/init.d/impala-state-store: line 35: /etc/default/hadoop: No such file or directory

Starting Impala State Store Server:[  OK  ]

解決方法:

impala多個啟動文件中有執行/etc/default/hadoop的操作,但實際上我們並沒有此文件,此異常提示沒有實質影響,可忽略。

異常2:

啟動impala-server服務的時候會報錯(錯誤日志在目錄/var/log/impala下面):

ERROR: short-circuit local reads is disabled because

  - Impala cannot read or execute the parent directory of dfs.domain.socket.path

  - dfs.client.read.shortcircuit is not enabled.

ERROR: block location tracking is not properly enabled because

  - dfs.client.file-block-storage-locations.timeout is too low. It should be at least 3000.

解決方法:

確保在hdfs-site.xml文件配置了以下參數即可:

dfs.client.read.shortcircuit、

dfs.domain.socket.path、

dfs.datanode.hdfs-blocks-metadata.enabled、

dfs.client.use.legacy.blockreader.local、

dfs.datanode.data.dir.perm、

dfs.block.local-path-access.user、

dfs.client.file-block-storage-locations.timeout

異常3:

啟動impala-state-store start報錯:

java.lang.ClassNotFoundException: org.datanucleus.jdo.JDOPersistenceManagerFactory

        at com.cloudera.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:51)

        at com.cloudera.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:41)

/*編者注:此處省略若干信息*/

Caused by: javax.jdo.JDOFatalUserException: Class datanucleus.jdo.JDOPersistenceManagerFactory was not found.

NestedThrowables:

java.lang.ClassNotFoundException: org.datanucleus.jdo.JDOPersistenceManagerFactory

Caused by: java.lang.ClassNotFoundException: org.datanucleus.jdo.JDOPersistenceManagerFactory

        at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

        at java.security.AccessController.doPrivileged(Native Method)

        at java.net.URLClassLoader.findClass(URLClassLoader.java:205) javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1155)

解決方法:

這是由於/usr/lib/impala/lib目錄下的datanucleus相關軟件包跟$HIVE_HOME/lib目錄下的版本不一致,需要將$HIVE_HOME/lib目錄下的datanucleus相關文件替換到/usr/lib/impala/lib目錄,同時修改文件名稱與原來/usr/lib/impala/lib中的一樣(因為有些配置文件中寫明了文件名)。

異常4:

如果這兩個目錄的所有者不是運行impala的用戶,在啟動會報錯:

[root@cup-slave-11 impala]# service impala-server start

/etc/init.d/impala-server: line 35: /etc/default/hadoop: No such file or directory

Starting Impala Server:[  OK  ]

/bin/bash: /var/log/impala/impala-server.log: Permission denied

解決方法:

將/var/run/hadoop-hdfs和/var/log/impala目錄的所有權賦給cup賬戶和cup用戶組,同時確保/etc/init.d/impala-state-store、/etc/init.d/impala-server、/etc/init.d/impala-catalog三個文件中的用戶和用戶組配置為cup用戶。

異常5:

啟動impala-catalog服務的時候報錯:

E0327 16:02:46.283989 45718 Log4JLogger.java:115] Bundle "org.datanucleus.api.jdo" requires "org.datanucleus" version "3.2.0.m4" but the resolved bundle has version "3

.2.1" which is outside the expected range.

解決方法:

根據錯誤描述,將/usr/lib/impala/lib目錄下的datanucleus-api-jdo-3.2.1.jar文件名稱改為datanucleus-api-jdo-3.2.0.m4.jar,問題解決。

 

----end

本文連接:http://www.cnblogs.com/chenz/articles/3629698.html

作者:chenzheng

聯系:vinkeychen@gmail.com


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM