前期准備(版本匹配):
Hadoop 2.x is faster and includes features, such as short-circuit reads, which will help improve your HBase random read profile. Hadoop 2.x also includes important bug fixes that will improve your overall HBase experience. HBase 0.98 deprecates use of Hadoop 1.x, and HBase 1.0 will not support Hadoop 1.x.
Use the following legend to interpret this table:
S = supported and tested,
X = not supported,
NT = it should run, but not tested enough.
|
HBase-0.92.x |
HBase-0.94.x |
HBase-0.96.x |
HBase-0.98.x[a] |
HBase-1.0.x[b] |
Hadoop-0.20.205 |
S |
X |
X |
X |
X |
Hadoop-0.22.x |
S |
X |
X |
X |
X |
Hadoop-1.0.0-1.0.2[c] |
X |
X |
X |
X |
X |
Hadoop-1.0.3+ |
S |
S |
S |
X |
X |
Hadoop-1.1.x |
NT |
S |
S |
X |
X |
Hadoop-0.23.x |
X |
S |
NT |
X |
X |
Hadoop-2.0.x-alpha |
X |
NT |
X |
X |
X |
Hadoop-2.1.0-beta |
X |
NT |
S |
X |
X |
Hadoop-2.2.0 |
X |
NT [d] |
S |
S |
NT |
Hadoop-2.3.x |
X |
NT |
S |
S |
NT |
Hadoop-2.4.x |
X |
NT |
S |
S |
S |
Hadoop-2.5.x |
X |
NT |
S |
S |
S |
具體內容參見:https://hbase.apache.org/book/configuration.html#hadoop |
hive與hadoop的版本匹配:
6 June, 2014: release 0.13.1 available
This release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y
21 April, 2014: release 0.13.0 available
This release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y
15 October, 2013: release 0.12.0 available
This release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y
15 May, 2013: release 0.11.0 available
This release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y
March, 2013: HCatalog merges into Hive
具體參見:http://hive.apache.org/downloads.html
版本選取:
Hadoop: hadoop-2.2.0.tar.gz
HBase : hbase-0.98.4-hadoop2-bin.tar.gz
JDK: jdk-7u65-linux-i586.gz
Linux環境: CentOS-6.5-x86_64
Hive: apache-hive-0.13.1-bin.tar.gz
Zookeeper: zookeeper-3.4.6.tar.gz
5個節點,各節點角色安排:
角色 |
ip地址 |
NameNode |
DataNode |
secondarynamenode |
resourcemanager |
nodemanager |
HMaster |
HRegionServer |
zookeeper |
hive |
master |
192.168.1.94 |
Y |
|
Y |
Y |
|
|
|
|
|
slave1 |
192.168.1.105 |
|
Y |
|
|
Y |
Y |
Y |
|
Y |
slave2 |
192.168.1.95 |
|
Y |
|
|
Y |
|
Y |
Y |
|
salve3 |
192.168.1.112 |
|
Y |
|
|
Y |
|
Y |
Y |
|
slave4 |
192.168.1.111 |
|
Y |
|
|
Y |
|
Y |
Y |
|
1、Hadoop的安裝環境准備
1.1在所有節點上創建用戶名為admin的用戶,並設置密碼為“password”
#useradd admin-d /home/admin
密碼修改:
#echo “password” |passwd --stdin admin
1.2 在所有節點上修改/etc/hosts(root權限)
192.168.1.94 centos94
192.168.1.105 centos105
192.168.1.95 centos95
192.168.1.112 centos112
192.168.1.111 centos111
1.3 復制hadoop-2.2.0.tar.gz,hbase-0.98.4-hadoop2-bin.tar.gz,jdk-7u65-linux-i586.gz 到/home/admin
安裝jdk(admin用戶權限)
$tar -zxvf jdk-7u65-linux-i586.gz
刪除centos自帶的openjdk
查看版本
#rpm -qa |grep java
顯示如下信息:
tzdata-java-2013g-1.el6.noarch
java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.i686
java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.i686
卸載:
yum -y remove java java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.i686
yum -y remove java java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.i686
yum -y remove java tzdata-java-2013g-1.el6.noarch
或者:
rpm -e --nodeps java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.i686
rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.i686
rpm -e --nodeps tzdata-java-2013g-1.el6.noarch
1.4 配置環境變量
編輯/etc/profile文件
export JAVA_HOME=/usr/local/jdk1.7.0_65
export CLASSPATH=.;%JAVA_HOME%\lib;%JAVA_HOME%\lib\tools.jar
export PATH=%JAVA_HOME%\bin;%PATH%
1.5 關閉防火牆和SELinux(root權限)
service iptables status
service iptables stop
chkconfig iptables off
1.6 設置靜態ip地址(root權限)
vi /etc/sysconfig/network-scripts/ifcfg-eth0
IPADDR=192.168.1.94根據每台機器的IP來改)
GATEWAY=192.168.1.255
NETMASK=255.255.255.0
vi /etc/network
HOSTNAME=centos94(根據各個節點不同來改,並且跟hosts內一致)
1.7 SSH無密碼訪問設置(admin用戶權限)
(在這個過程完成之后可能出現ssh訪問仍然需要輸入密碼的問題,需要修改權限
chmod 700 .ssh
chmod 600 .ssh/*)
在每個節點上設置:
ssh-keygen -t rsa -P “”
一直確認鍵
將centos95/centos105/centos111/centos112的.ssh目錄下的id_rsa.pub都復制到centos94節點的.ssh目錄下
scp id_rsa.pub admin@centos94:/home/hadoop/.ssh/id_rsa.pub.centos95
scp id_rsa.pub admin@centos94:/home/hadoop/.ssh/id_rsa.pub.centos105
scp id_rsa.pub admin@centos94:/home/hadoop/.ssh/id_rsa.pub.centos112
scp id_rsa.pub admin@centos94:/home/hadoop/.ssh/id_rsa.pub.centos111
在centos94節點的.ssh目錄下將id_rsa.pub、id_rsa.pub.centos95、id_rsa.pub.centos105、id_rsa.pub.centos111、id_rsa.pub.centos112合並為authorized_keys
將合並后的authorized_keys復制到slave節點的.ssh目錄下
scp authorized_keys admin@centos95:/home/admin/.ssh/
scp authorized_keys admin@centos105:/home/admin/.ssh/
scp authorized_keys admin@centos111:/home/admin/.ssh/
scp authorized_keys admin@centos112:/home/admin/.ssh/
2、Hadoop的安裝和配置(admin用戶權限)
(安裝成功以后可能出現啟動仍然有問題,這時候查看hadoop和jdk安裝之后文件的所屬用戶,如果不是admin用戶,需要修改為admin用戶:chown admin:admin hadoop-2.2.0)
2.1 解壓hadoop-2.2.0.tar.gz到/home/admin目錄下
配置/etc/profile文件
export HADOOP_HOME=/home/admin/hadoop-2.2.0
export PATH=$HADOOP_HOME/bin
配置文件放在$HADOOP_HOME/etc/hadoop目錄,該目錄下的core-site.xml、yarn-site.xml、hdfs-site.xml、mapred-site.xml都是空的。可以從HADOOP_HOME/share/hadoop目錄下拷貝一份到/etc/hadoop目錄,然后在此基礎上修改。
cd $HADOOP_HOME
cp ./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml ./etc/hadoop/core-site.xml
cp ./share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml ./etc/hadoop/hdfs-site.xml
cp ./share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml ./etc/hadoop/yarn-site.xml
cp ./share/doc/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml ./etc/hadoop/mapred-site.xml
接下來,對默認的文件做適當修改,否則無法啟動成功。
2.2 配置hadoop-env.sh 文件
vi /home/admin/hadoop-2.2.0/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.7.0_65
2.3 配置core-site.xml文件
vi /home/admin/hadoop-2.2.0/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/admin/hadoop-2.2.0/tmp</value>
</property>
</configuration>
2.4 配置hdfs-site.xml文件
vi /home/admin/hadoop-2.2.0/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/admin/hadoop-2.2.0/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/admin/hadoop-2.2.0/dfs/data</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>master:50070</value>
<description>The address and the base port where the dfs namenode web ui will listen on.</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
<description>The secondary namenode http server address and port.</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>(數據保存份數)Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.</description>
</property>
</configuration>
2.5 配置yarn-site.xml文件
vi /home/admin/hadoop-2.2.0/etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
(要配上這個參數,否則在8088頁面看不到nodemanager節點)
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
2.6 配置mapred-site.xml文件
cp /home/admin/hadoop-2.2.0/etc/hadoop/mapred-site.xml.template /home/admin/hadoop-2.2.0/etc/hadoop/mapred-site.xml
vi /home/admin/hadoop-2.2.0/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
3 啟動hadoop集群
(集群每啟動一次,就要刪除各個節點下的tmp文件和dfs文件,否則datanode啟動不了,
整個集群最好只是格式化一次)
3.1 格式化namenode
bin/hdfs namenode -format
3.2 啟動:
sbin/start-dfs.sh
sbin/start-yarn.sh
3.3 若要查看mapreduce執行的歷史記錄,需單獨啟動jobhistoryserver進程
sbin/mr-jobhistory-daemon.sh start historyserver
注意:再次啟動時,所有格式化命令都不用運行,直接啟動。
4、Hive的安裝
hive將元數據存儲在RDBMS中,有三種方式可以連接到數據庫:
1、內嵌模式:元數據保存在內嵌數據庫的Derby,一般用於單元測試,只允許一個會話連接。
2、多用戶模式:在本地安裝MySQL,把元數據放到MySQL內。
3、遠程模式:元數據放置在遠程的MySQL數據庫。
4.1 在安裝hive之前,由於HWI功能依賴ant,需要首先安裝ant
下載:apache-ant-1.9.4-bin.tar.gz版本
將apache-ant-1.9.4-bin.tar.gz解壓到/opt目錄下,並改名ant
在/etc/profile里邊添加ANT_HOME和PATH路徑並使profile文件生效
vi /etc/profile
ANT_HOME=/opt/ant
PATH=$ANT_HOME/bin:$PATH
source /etc/profile
檢查ant是否安裝成功:
ant -v 或 ant -version
4.2 MySQL的安裝
下載MySQL安裝包
MySQL-server-5.6.20-1.el7.x86_64.rpm
rpm -ivh MySQL-server-5.6.20-1.el7.x86_64.rpm
初始化MySQL並設置密碼:
# /usr/bin/mysql_install_db
# service mysql start
# cat /root/.mysql_secret #查看root賬號密碼
# The random password set for the root user at Wed Dec 11 23:32:50 2014 (local time): qKTaFZnl
# mysql -uroot –pqKTaFZnl
mysql> SET PASSWORD = PASSWORD('123456'); #設置密碼為123456
mysql> exit
# mysql -uroot -p123456
創建新用戶:
mysql> create user 'admin'@'%' identified by 'password';
給新用戶test_user授權,讓他可以從外部登陸和本地登陸
注意:@左邊是用戶名,右邊是域名、IP和%,表示可以訪問mysql的域名和IP,%表示外部任何地址都能訪問。
mysql> grant all privileges on *.* to 'admin'@'%' identified by 'password';
mysql> select user,host,password from mysql.user;
設置開機自啟動
# chkconfig mysql on
# chkconfig --list | grep mysql
查看mysql的默認存儲引擎
mysql> show engines;
+------------+---------+------------------------------------------------------------+--------------+------+------------+
03.| Engine | Support | Comment | Transactions | XA | Savepoints |
04.+------------+---------+------------------------------------------------------------+--------------+------+------------+
05.| MRG_MYISAM | YES | Collection of identical MyISAM tables | NO | NO | NO |
06.| CSV | YES | CSV storage engine | NO | NO | NO |
07.| MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance | NO | NO | NO |
08.| InnoDB | YES | Supports transactions, row-level locking, and foreign keys | YES | YES | YES |
09.| MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO |
10.+------------+---------+------------------------------------------------------------+--------------+------+------------+
11.5 rows in set (0.00 sec)
執行結果可以看出,mysql的默認引擎是MyISAM,這個引擎是不支持事務的。
也可以以下面的方式查看
mysql> show variables like 'storage_engine';
+----------------+--------+
| Variable_name | Value |
+----------------+--------+
| storage_engine | MyISAM |
+----------------+--------+
1 row in set (0.00 sec)
.修改mysql的默認引擎為InnoDB
停止mysql
mysql> exit;
# service mysqld stop
修改/etc/my.cnf
[mysqld] 后加入
default-storage-engine=InnoDB
加入后my.cnf的內容為:
[root@bogon etc]# more my.cnf
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
default-storage-engine=InnoDB
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
啟動mysql
# service mysqld start
Starting mysqld: [ OK ]
查看mysql默認存儲引擎
[root@bogon etc]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.73 Source distribution
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show variables like 'storage_engine';
+----------------+--------+
| Variable_name | Value |
+----------------+--------+
| storage_engine | InnoDB |
+----------------+--------+
1 row in set (0.00 sec)
把MySQL的JDBC驅動包mysql-connector-java-5.1.12.jar加入到hive的lib目錄下。
把jdk目錄下的tools.jar文件復制到hive的lib目錄下
4.3 hive選用版本
apache-hive-0.13.1-bin.tar.gz (Hive與hadoop版本的匹配請參考http://hive.apache.org/downloads.html )
4.4 下載Hive並放在/home/admin目錄下
tar -zxvf apache-hive-0.13.1-bin.tar.gz
4.5 設置環境變量
vi /etc/profile
export HIVE_HOME=/home/admin/apache-hive-0.13.1-bin
export PATH=$HIVE_HOME/bin:$PATH
配置Hive
(1)修改/home/admin/apache-hive-0.13.1-bin/conf/hive-env.sh
export JAVA_HOME=/home/admin/jdk1.7.0_65
export HIVE_HOME=/home/admin/apache-hive-0.13.1-bin
export HADOOP_HOME=/home/admin/hadoop-2.2.0
(2)根據hive-default.xml復制hive-site.xml
cp /usr/local/apache-hive-0.13.1-bin/conf/hive-default.xml /usr/local/apache-hive-0.13.1-bin/conf/hive-site.xml
(3)配置hive-site.xml,主要配置項如下:
hive.metastore.warehouse.dir:(HDFS上的)數據目錄
hive.exec.scratchdir:(HDFS上的)臨時文件目錄
hive.metastore.warehouse.dir默認值是/user/hive/warehouse
hive.exec.scratchdir默認值是/tmp/hive-${user.name}
cd /usr/local/apache-hive-0.13.1-bin/conf
cp hive-default.xml ./hive-site.xml
cp hive-env.sh.template hive-env.sh
cp hive-exec-log4j.properties.template ./hive-exec-log4j.properties
cp hive-log4j.properties.template ./hive-log4j.properties
以下是配置好的hive-site.xml文件:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.1.105:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>admin</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.matastore.local</name>
<value>true</value>
</property>
<property>
<name>hive.matastore.warehourse.dir</name>
<value>hdfs://192.168.1.105:8020/hive/warehouse</value>
</property>
<property>
<name>hive.hwi.listen.host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hive.hwi.listen.port</name>
<value>9999</value>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>/lib/hive-hwi-0.12.0.war</value>
(將0.12.0版本中的hive-hwi-0.12.0.war復制到0.13.1中的lib目錄下,或者自己編譯0.13.1的war包)
</property>
</configuration>
4.5 啟動Hive
執行/home/admin/apache-hive-0.13.1-bin/bin/hive
4.6 啟動hive web interface
執行/home/admin/apache-hive-0.13.1-bin/bin/hive --service hwi
5、Zookeeper的安裝
解壓zookeeper-3.4.6.tar.gz到目錄/home/admin/下
在/etc/profile文件里邊加入ZOOKEEPER_HOME和PATH
配置zoo.cfg
復制conf目錄下的zoo-example.cfg為zoo.cfg
cp zoo-example.cfg ./zoo.cfg
vi zoo.cfg
dataDir=/home/admin/zookeeper-3.4.6/data
server.1=centos95:2888:3888
server.2=centos111:2888:3888
server.3=centos112:2888:3888
然后將配置好的Zookeeper分發到server.1/2/3上的/home/admin/zookeeprt-3.4.6下,並在每一個節點的dataDir,即/home/hadoop/zookeeper-3.4.6/data下創建一個myid文件,其中包含一個該節點對應的數字,即server.1/2/3中'.'后面的數字,該數字應該在1-255之間。
echo 1 > myid(在dataDir目錄下)
啟動zookeeper
在server.1/2/3上分別啟動Zookeeper:
$ ~/zookeeper-3.4.6/bin/zkServer.sh start
測試3個節點是否能連接:
$ ~/zookeeper-3.4.6/bin/zkCli.sh master:2181
$ ~/zookeeper-3.4.6/bin/zkCli.sh slave1:2181
$ ~/zookeeper-3.4.6/bin/zkCli.sh slave2:2181
6、HBase的安裝
配置/etc/profile
HBASE_HOME=/home/admin/hbase
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin
export JAVA_HOME HADOOP_HOME HBASE_HOME PATH
將安裝包解壓到/home/admin下, 編輯conf/hbase-env.sh 在開頭部分添加:
export JAVA_HOME=/home/admin/jdk1.7.0_65
export HBASE_LOG_DIR=/home/admin/hbase-0.98.4-hadoop2/logs
export HBASE_CLASSPATH=/home/admin/hbase-0.98.4-hadoop2/conf:/home/admin/hadoop-2.2.0/etc/hadoop
export HBASE_MANAGES_ZK=false
配置${HBASE_HOME}/conf/hbase-site.xml
<configuration>
<property>
<name>hbase.tmp.dir</name>
<value>/home/admin/var/hbase</value>
</property>
<property >
<name>hbase.rootdir</name>
<value>hdfs://master:8020/hbase</value>
</property>
<property >
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>centos95,centos111,centos112</value>
</property>
<property>
<name>hbase.master.maxclockskew</name>
<value>180000</value>
</property>
</configuration>
對於hbase.master.info.bindAddress的配置需要注意,該項默認值是0.0.0.0,若改為某個結點的主機名或IP時,若在另外一個結點上使用start-hbase.sh啟動hbase會失敗,原因是使用start-hbase.sh啟動時,會將當前結點作為master,即在當前結點上啟動master服務,但如果hbase.master.info.bindAddress是另外一個結點,那么另外一個主機的地址是肯定無法bind到當前主機上的,所以HMaster服務就起不來了.
配置slave結點列表
通常情況我們使用start-hbase.sh腳本來啟動整個集群,查看該腳本可以知道,該腳本會基於配置文件在目標結點上啟動master,zookeeper和regionserver,而regionserver的列表是在${HBASE_HOME}/conf/regionservers文件中配置的,一個結點一行。所以我們需要在此文件中添加所有的regionserver機器名或IP。
啟動HBase集群
執行:
start-hbase.sh
該命令可在任意結點上執行,不過需要注意的是:在哪個結點上執行該命令,該點將自動成為master(與zookeeper的配置不同,hbase的配置文件中不提供指定master的選項),如果需要多個back-up master,可在另外的結點上通過hbase-daemon.sh start master單獨啟動master!
以下是單獨啟動某項服務的命令:
啟動master
hbase-daemon.sh start master
啟動regionserver
hbase-daemon.sh start regionserver
所有服務啟動后,訪問:
http://master:60010
檢查各結點的狀態,如都能訪問表示HBase沒有問題,如無法訪問或缺少節點,可分析log的中的信息找出問題原因。