Impala1.2.4安裝手冊
安裝前說明:
1、 安全性考慮,我們使用hive用到的賬戶cup進行impala的啟停等操作,而不另外使用impala賬戶;這涉及到后文中的一些文件夾權限調整、配置文件中的用戶參數調整;
2、 性能考慮,impala-state-store、impala-catalog這兩個服務安裝在hadoop集群的namenode上面,impala-server、impala-shell服務安裝在各個datanode上,namenode上不安裝使用impala-server;
3、 在安裝impala相關軟件包的時候使用root賬戶,之后再將相關文件所有者修改為cup賬戶;
4、 啟停impala服務需要root權限的賬號;
5、 安裝步驟參照官方文檔:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/Installing-and-Using-Impala.html |
安裝Impala軟件包
下載所需要的安裝包,根據需要選擇合適的版本(由於我們用的是CDH4.2.1版本,所以選擇了impala1.2.4):
http://archive.cloudera.com/impala/redhat/6/x86_64/impala/ |
在Hadoop集群的namenode節點上依次安裝以下的包:
rpm -ivh ./bigtop-utils-0.4+300-1.cdh4.0.1.p0.1.el6.noarch.rpm rpm -ivh ./impala-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-state-store-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-server-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-catalog-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-udf-devel-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-shell-1.2.4-1.p0.420.el6.x86_64.rpm |
注意:impala的安裝依賴這個包:bigtop-utils-0.4+300-1.cdh4.0.1.p0.1.el6.noarch.rpm,這個包在官網1.2.4版本的目錄中找不到,需要在1.2.3或者其他版本的目錄中下載。
在其它datanode節點上依次安裝以下的包:
rpm -ivh ./bigtop-utils-0.4+300-1.cdh4.0.1.p0.1.el6.noarch.rpm rpm -ivh ./impala-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-server-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-catalog-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-udf-devel-1.2.4-1.p0.420.el6.x86_64.rpm rpm -ivh ./impala-shell-1.2.4-1.p0.420.el6.x86_64.rpm |
查看安裝之后的impala路徑:
[root@cup-slave-11 cup]# find / -name impala /etc/alternatives/impala /etc/impala /etc/default/impala /var/log/impala /var/lib/alternatives/impala /var/lib/impala /var/run/impala /usr/lib/impala |
Impala配置
在hdfs-site.xml文件中添加如下內容:
<property> <name>dfs.client.read.shortcircuit</name> <value>true</value> </property> <property> <name>dfs.domain.socket.path</name> <value>/var/run/hadoop-hdfs/dn._PORT</value> </property> <property> <name>dfs.datanode.hdfs-blocks-metadata.enabled</name> <value>true</value> </property> <property> <name>dfs.client.use.legacy.blockreader.local</name> <value>false</value> </property> <property> <name>dfs.datanode.data.dir.perm</name> <value>750</value> </property> <property> <name>dfs.block.local-path-access.user</name> <value>cup</value> </property> <property> <name>dfs.client.file-block-storage-locations.timeout</name> <value>3000</value> </property> |
添加配置文件:
impalad的配置文件路徑由環境變量IMPALA_CONF_DIR指定,默認為/etc/impala/conf,拷貝配置好的hive-site.xml、core-site.xml、hdfs-site.xml、hbase-site.xml文件至/etc/impala/conf目錄下。
將相關so文件拷貝到hadoop的lib目錄(如果目標目錄有這些文件,可以忽略此步驟):
cp /usr/lib/impala/lib/*.so* $HADOOP_HOME/lib/native/ |
用$HIVE_HOME/lib目錄下帶“datanucleus”字樣的文件替換/usr/lib/impala/lib目錄下對應文件(名稱要改成跟/usr/lib/impala/lib原來的一樣);不然在啟動impala-state-store 和impala-catalog的時候會報錯,詳見異常3、異常5。
復制$HADOOP_HOME/lib 目錄下的mysql-connector-java.jar文件到“/usr/share/java”目錄,因為impala的catalogd要使用(注意mysql驅動包的名稱一定要是mysql-connector-java.jar):
[root@cup-slave-11 native]# more /usr/bin/catalogd #!/bin/bash
export IMPALA_BIN=${IMPALA_BIN:-/usr/lib/impala/sbin} export IMPALA_HOME=${IMPALA_HOME:-/usr/lib/impala} export HIVE_HOME=${HIVE_HOME:-/usr/lib/hive} export HBASE_HOME=${HBASE_HOME:-/usr/lib/hbase} export IMPALA_CONF_DIR=${IMPALA_CONF_DIR:-/etc/impala/conf} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/impala/conf} export HIVE_CONF_DIR=${HIVE_CONF_DIR:-/etc/impala/conf} export HBASE_CONF_DIR=${HBASE_CONF_DIR:-/etc/impala/conf} export LIBHDFS_OPTS=${LIBHDFS_OPTS:--Djava.library.path=/usr/lib/impala/lib} export MYSQL_CONNECTOR_JAR=${MYSQL_CONNECTOR_JAR:-/usr/share/java/mysql-connector-java.jar} |
根據實際環境修改impala配置信息:
[root@cup-master-1 ~]# vi /etc/default/impala
IMPALA_STATE_STORE_HOST=10.204.193.10 IMPALA_STATE_STORE_PORT=24000 IMPALA_BACKEND_PORT=22000 IMPALA_LOG_DIR=/var/log/impala
IMPALA_CATALOG_ARGS=" -log_dir=${IMPALA_LOG_DIR} " IMPALA_STATE_STORE_ARGS=" -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}" IMPALA_SERVER_ARGS=" \ -log_dir=${IMPALA_LOG_DIR} \ -state_store_port=${IMPALA_STATE_STORE_PORT} \ -use_statestore \ -state_store_host=${IMPALA_STATE_STORE_HOST} \ -be_port=${IMPALA_BACKEND_PORT}"
ENABLE_CORE_DUMPS=false
# LIBHDFS_OPTS=-Djava.library.path=/usr/lib/impala/lib MYSQL_CONNECTOR_JAR=/usr/share/java/mysql-connector-java.jar IMPALA_BIN=/usr/lib/impala/sbin IMPALA_HOME=/usr/lib/impala HIVE_HOME=/home/cup/hive-0.10.0-cdh4.2.1 HBASE_HOME=/home/cup/hbase-0.94.2-cdh4.2.1 IMPALA_CONF_DIR=/etc/impala/conf HADOOP_CONF_DIR=/etc/impala/conf HIVE_CONF_DIR=/etc/impala/conf HBASE_CONF_DIR=/etc/impala/conf |
根據實際環境修改impala相關腳本文件/etc/init.d/impala-state-store、/etc/init.d/impala-server、/etc/init.d/impala-catalog,修改其中兩處跟用戶相關的地方:
DAEMON="catalogd" DESC="Impala Catalog Server" EXEC_PATH="/usr/bin/catalogd" SVC_USER="cup" ###編者注:這里默認是impala DAEMON_FLAGS="${IMPALA_CATALOG_ARGS}" CONF_DIR="/etc/impala/conf" PIDFILE="/var/run/impala/catalogd-impala.pid" LOCKDIR="/var/lock/subsys" LOCKFILE="$LOCKDIR/catalogd"
install -d -m 0755 -o cup -g cup /var/run/impala 1>/dev/null 2>&1 || : [ -d "$LOCKDIR" ] || install -d -m 0755 $LOCKDIR 1>/dev/null 2>&1 || :
|
在hdfs上創建impala目錄:
hadoop dfs -mkdir /user/impala |
在每個節點上創建/var/run/hadoop-hdfs,因為hdfs-site.xml文件的dfs.domain.socket.path參數指定了這個目錄:
[root@cup-slave-11 impala]# mkdir /var/run/hadoop-hdfs |
將/var/run/hadoop-hdfs和/var/log/impala目錄的所有權賦給cup賬戶和cup用戶組,不然在啟動impala-server的時候會出現異常4:
chown -R cup:cup /var/log/impala chown -R cup:cup /var/run/hadoop-hdfs |
啟動Impala服務
啟動namenode節點上impala的state-store服務:
sudo service impala-state-store start |
啟動namenode節點上impala的catalog服務:
sudo service impala-catalog start |
啟動datanode節點上impala的impala-server服務:
sudo service impala-server start |
停止namenode節點上impala的state-store服務:
sudo service impala-state-store stop |
停止namenode節點上impala的catalog服務:
sudo service impala-catalog stop |
停止datanode節點上impala的impala-server服務:
sudo service impala-server stop |
注意:少數情況下啟動impala服務雖然沒有明顯的錯誤提示,但是也有可能並未啟動成功,需要觀察/var/log/impala中是否有error字樣的錯誤日志,如果有的話需要進一步核查。
確認Impala正常使用
查看datanode上面的impala進程是否存在:
[cup@cup-master-1 ~]$ ps -ef|grep impala cup 5522 45968 0 08:58 pts/25 00:00:00 grep impala cup 8292 1 0 Mar27 ? 00:01:06 /usr/lib/impala/sbin/statestored -log_dir=/var/log/impala -state_store_port=24000 |
查看datanode上面的impala-server進程是否存在:
[cup@cup-slave-11 ~]$ ps -ef|grep impala cup 15630 15599 0 09:24 pts/0 00:00:00 grep impala cup 112216 1 0 Mar27 ? 00:01:15 /usr/lib/impala/sbin/impalad -log_dir=/var/log/impala -state_store_port=24000 -use_statestore -state_store_host=10.204.193.10 -be_port=22000 |
訪問datanode上impala的web頁面,默認端口25010:
訪問datanode上面impala的web頁面,默認端口25000:
在安裝了impala-shell的節點執行sql語句:
[cup@cup-slave-11 ~]$ impala-shell Starting Impala Shell without Kerberos authentication Connected to cup-slave-11:21000 Server version: impalad version 1.2.4 RELEASE (build ac29ae09d66c1244fe2ceb293083723226e66c1a) Welcome to the Impala shell. Press TAB twice to see a list of available commands.
Copyright (c) 2012 Cloudera, Inc. All rights reserved.
(Shell build version: Impala Shell v1.2.4 (ac29ae0) built on Wed Mar 5 07:05:40 PST 2014) [cup-slave-11:21000] > show databases; Query: show databases +---------+ | name | +---------+ | cloudup | | default | | xhyt | +---------+ Returned 3 row(s) in 0.01s [cup-slave-11:21000] > use cloudup; Query: use cloudup [cup-slave-11:21000] > select * from url_read_typ_rel limit 5; Query: select * from url_read_typ_rel limit 5 +----------------------+---------+---------+---------+---------+--------+-----+ | urlhash | rtidlv1 | rtyplv1 | rtidlv2 | rtyplv2 | isttim | url | +----------------------+---------+---------+---------+---------+--------+-----+ | 2160609062987073557 | 3 | 股票 | NULL | | NULL | | | 8059679893178527423 | 3 | 股票 | NULL | | NULL | | | -404610021015528651 | 2 | 房產 | NULL | | NULL | | | -6322366252916938780 | 5 | 教育 | NULL | | NULL | | | -6821513749785855580 | 12 | 游戲 | NULL | | NULL | | +----------------------+---------+---------+---------+---------+--------+-----+ Returned 5 row(s) in 0.61s |
常見異常:
異常1:
在啟停state-store的時候會報錯:
[root@cup-master-1 ~]# service impala-state-store start /etc/init.d/impala-state-store: line 35: /etc/default/hadoop: No such file or directory Starting Impala State Store Server:[ OK ] |
解決方法:
impala多個啟動文件中有執行/etc/default/hadoop的操作,但實際上我們並沒有此文件,此異常提示沒有實質影響,可忽略。
異常2:
啟動impala-server服務的時候會報錯(錯誤日志在目錄/var/log/impala下面):
ERROR: short-circuit local reads is disabled because - Impala cannot read or execute the parent directory of dfs.domain.socket.path - dfs.client.read.shortcircuit is not enabled. ERROR: block location tracking is not properly enabled because - dfs.client.file-block-storage-locations.timeout is too low. It should be at least 3000. |
解決方法:
確保在hdfs-site.xml文件配置了以下參數即可:
dfs.client.read.shortcircuit、
dfs.domain.socket.path、
dfs.datanode.hdfs-blocks-metadata.enabled、
dfs.client.use.legacy.blockreader.local、
dfs.datanode.data.dir.perm、
dfs.block.local-path-access.user、
dfs.client.file-block-storage-locations.timeout
異常3:
啟動impala-state-store start報錯:
java.lang.ClassNotFoundException: org.datanucleus.jdo.JDOPersistenceManagerFactory at com.cloudera.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:51) at com.cloudera.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:41) /*編者注:此處省略若干信息*/ Caused by: javax.jdo.JDOFatalUserException: Class datanucleus.jdo.JDOPersistenceManagerFactory was not found. NestedThrowables: java.lang.ClassNotFoundException: org.datanucleus.jdo.JDOPersistenceManagerFactory Caused by: java.lang.ClassNotFoundException: org.datanucleus.jdo.JDOPersistenceManagerFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1155) |
解決方法:
這是由於/usr/lib/impala/lib目錄下的datanucleus相關軟件包跟$HIVE_HOME/lib目錄下的版本不一致,需要將$HIVE_HOME/lib目錄下的datanucleus相關文件替換到/usr/lib/impala/lib目錄,同時修改文件名稱與原來/usr/lib/impala/lib中的一樣(因為有些配置文件中寫明了文件名)。
異常4:
如果這兩個目錄的所有者不是運行impala的用戶,在啟動會報錯:
[root@cup-slave-11 impala]# service impala-server start /etc/init.d/impala-server: line 35: /etc/default/hadoop: No such file or directory Starting Impala Server:[ OK ] /bin/bash: /var/log/impala/impala-server.log: Permission denied |
解決方法:
將/var/run/hadoop-hdfs和/var/log/impala目錄的所有權賦給cup賬戶和cup用戶組,同時確保/etc/init.d/impala-state-store、/etc/init.d/impala-server、/etc/init.d/impala-catalog三個文件中的用戶和用戶組配置為cup用戶。
異常5:
啟動impala-catalog服務的時候報錯:
E0327 16:02:46.283989 45718 Log4JLogger.java:115] Bundle "org.datanucleus.api.jdo" requires "org.datanucleus" version "3.2.0.m4" but the resolved bundle has version "3 .2.1" which is outside the expected range. |
解決方法:
根據錯誤描述,將/usr/lib/impala/lib目錄下的datanucleus-api-jdo-3.2.1.jar文件名稱改為datanucleus-api-jdo-3.2.0.m4.jar,問題解決。
----end
本文連接:http://www.cnblogs.com/chenz/articles/3629698.html
作者:chenzheng
聯系:vinkeychen@gmail.com