問題導讀
最近學習Kylin,肯定需要一個已經安裝好的環境,Kylin的依賴環境官方介紹如下:
依賴於 Hadoop 集群處理大量的數據集。您需要准備一個配置好 HDFS,YARN,MapReduce,,Hive, HBase,Zookeeper 和其他服務的 Hadoop 集群供 Kylin 運行。Kylin 可以在 Hadoop 集群的任意節點上啟動。方便起見,您可以在 master 節點上運行 Kylin。但為了更好的穩定性,我們建議您將 Kylin 部署在一個干凈的 Hadoop client 節點上,該節點上 Hive,HBase,HDFS 等命令行已安裝好且 client 配置(如 core-site.xml,hive-site.xml,hbase-site.xml及其他)也已經合理的配置且其可以自動和其它節點同步。運行 Kylin 的 Linux 賬戶要有訪問 Hadoop 集群的權限,包括創建/寫入 HDFS 文件夾,Hive 表, HBase 表和提交 MapReduce 任務的權限。
軟件要求
Hadoop: 2.7+, 3.1+ (since v2.5)
Hive: 0.13 - 1.2.1+
HBase: 1.1+, 2.0 (since v2.5)
Spark (可選) 2.3.0+
Kafka (可選) 1.0.0+ (since v2.5)
JDK: 1.8+ (since v2.5)
OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+
安裝要求知道了,但是hadoop這些東西不太熟悉,小白一個,看了網上一些資料邊看邊學邊做,期間遇到了很多坑!很多人寫的安裝部署文檔要么是步驟東一塊西一塊,要么是省略,扔個連接或則說讓自己去百度。在經歷了很多坑之后終於是把完全分布式的hadoop+mysql+hive+hbase+zookeeper+kylin部署成功了,但是對於日常自己學習測試來說,開多台虛擬機電腦實在撐不住,於是寫了現在這個偽分布式的部署文檔給像我一樣初學kylin的小白同學們
環境配置:
目前有兩個測試環境,以Centsos 7系統的安裝為例子介紹詳細過程,Centos7系統規划配置清單如下,另外一個測試環境為RedHat 6 64位系統,安裝過程都差不多,Mysql安裝有些不一樣,不一樣的地方都分別寫了各自的安裝方法,安裝過程中遇到的坑很多
並且都已經解決,不再一一列舉,按照下面步驟是完全可以在Centos 7/Redhat 6 64位系統安裝成功的。
一、Centos7安裝
打開vmware,創建新虛擬機安裝Centos 774位系統:
完成后界面如下:
選擇啟動虛擬機,選擇第一個選項回車:
選擇繼續
等待依賴包檢查完成,點擊date&time設置時間
接下來點擊software selection選擇安裝模式,這里選擇最精簡安裝:
然后點擊done出來之后,等待依賴包檢查完成,然后設置磁盤分區
選擇現在設置:
點擊done后,進入下面所示界面,選擇標准分區,然后設置點擊+號設置分區
最后分好區如下:
然后點擊done后點擊確認
接下來選擇網絡設置
設置hostname,點擊apply。然后選擇configure設置網絡ip
最后done點擊安裝就可以了:
可以在這個界面設置下root密碼,等待安裝完成就可以了。這是虛擬機的安裝,接下來配置linux,安裝軟件。
1、linux網絡配置:
(1)因為Centos 7安裝的精簡模式,先解決linux網絡問題來讓windows能夠用xshell連上,編輯/etc/sysconfig/network-scripts/ifcfg-ens33內容如下:
TYPE=Ethernet PROXY_METHOD=none BROWSER_ONLY=no BOOTPROTO=none DEFROUTE=yes IPV4_FAILURE_FATAL=no IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_FAILURE_FATAL=no IPV6_ADDR_GEN_MODE=stable-privacy NAME=ens33 UUID=e8df3ff3-cf86-42cd-b48a-0d43fe85d8a6 DEVICE=ens33 ONBOOT="yes" IPADDR=192.168.1.66 PREFIX=24 IPV6_PRIVACY=no
(2)重啟網絡
[root@hadoop ~]# service network restart Restarting network (via systemctl): [ OK ] 重啟后可以通過下面命令來檢查網絡 [root@hadoop ~]# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 00:0c:29:0d:f1:ca brd ff:ff:ff:ff:ff:ff inet 192.168.1.66/24 brd 192.168.1.255 scope global noprefixroute ens33 valid_lft forever preferred_lft forever inet6 fe80::d458:8497:adb:7f01/64 scope link noprefixroute valid_lft forever preferred_lft forever
(3)接下來關閉防火牆
[root@hadoop ~]# systemctl disable firewalld
[root@hadoop ~]# systemctl stop firewalld```
(4)進程守護,關閉selinux
[root@hadoop ~]# setenforce 0 [root@hadoop ~]# vi /etc/selinux/config [root@hadoop ~]# cat /etc/selinux/config # This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - No SELinux policy is loaded. SELINUX=disabled # SELINUXTYPE= can take one of three values: # targeted - Targeted processes are protected, # minimum - Modification of targeted policy. Only selected processes are protected. # mls - Multi Level Security protection. SELINUXTYPE=targeted
重啟
[root@hadoop ~]# reboot
可以通過下面方式查看是否啟用selinux
sestatus
getenforce
(5)編輯/etc/hosts加入下面內容
[root@hadoop ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.1.66 hadoop
2、安裝java
(1)先看下當前linux環境是否有自帶的open jdk:
[root@hadoop ~]# rpm -qa | grep java [root@hadoop ~]# rpm -qa | grep jdk [root@hadoop ~]# rpm -qa | grep gcj
沒有,如果有的話要卸載,卸載案例如下:
卸載linux自帶open jdk,將前面三條命令檢查出來的內容一一卸載:
[root@master ~]# rpm -e --nodeps java-1.7.0-openjdk-1.7.0.99-2.6.5.1.0.1.el6.x86_64 [root@master ~]# rpm -e --nodeps tzdata-java-2016c-1.el6.noarch [root@master ~]# rpm -e java-1.6.0-openjdk-1.6.0.38-1.13.10.4.el6.x86_64 [root@master ~]# rpm -e java-1.7.0-openjdk-1.7.0.99-2.6.5.1.0.1.el6.x86_64
卸載完成后應該再檢查一次
(2)接下來安裝配置java
創建安裝目錄:
[root@hadoop ~]# mkdir -p /usr/java
上傳並解壓jdk到此目錄
[root@hadoop ~]# cd /usr/java/ [root@hadoop java]# ls jdk-8u151-linux-x64 (1).tar.gz
解壓縮
[root@hadoop java]# tar -zxvf jdk-8u151-linux-x64\ \(1\).tar.gz [root@hadoop java]# rm -rf jdk-8u151-linux-x64\ \(1\).tar.gz [root@hadoop java]# ls jdk1.8.0_151
編輯/etc/profile
寫入下面jdk環境變量,保存退出
export JAVA_HOME=/usr/java/jdk1.8.0_151 export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$PATH:$JAVA_HOME/bin
使環境變量生效
[root@master java]# source /etc/profile
檢查安裝是否沒問題
[root@hadoop java]# java -version java version "1.8.0_151" Java(TM) SE Runtime Environment (build 1.8.0_151-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
3、配置SSH免密碼登錄
(1)輸入命令,ssh-keygen -t rsa,生成key,都不輸入密碼,一直回車,/root就會生成.ssh文件夾
[root@hadoop ~]# ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: SHA256:+Xxqh8qa2AguQPY4aNJci6YiUWS822NtcLRK/9Kopp8 root@hadoop1 The key's randomart image is: +---[RSA 2048]----+ | . | | + . | | o . . . | | oo + o . | |++o* B S | |=+*.* + o | |++o. o + o.. | |=. ..=ooo oo. | |o.o+E.+ooo.. | +----[SHA256]-----+ [root@hadoop ~]# cd .ssh/ [root@hadoop .ssh]# ls id_rsa id_rsa.pub known_hosts
合並公鑰到authorized_keys文件,在hadoop服務器,進入/root/.ssh目錄,通過SSH命令合並
[root@hadoop .ssh]# cat id_rsa.pub>> authorized_keys
通過下面命令測試
ssh localhost ssh hadoop ssh 192.168.1.66
4、安裝Hadoop2.7
(1)下載連接:
http://archive.apache.org/dist/hadoop/core/hadoop-2.7.6/
(2)解壓:
[root@hadoop ~]# cd /hadoop/ [root@hadoop hadoop]# ls hadoop-2.7.6 (1).tar.gz [root@hadoop hadoop]# tar -zxvf hadoop-2.7.6\ \(1\).tar.gz ^C [root@hadoop hadoop]# ls hadoop-2.7.6 hadoop-2.7.6 (1).tar.gz [root@hadoop hadoop]# rm -rf *gz [root@hadoop hadoop]# mv hadoop-2.7.6/* .
(3)在/hadoop目錄下創建數據存放的文件夾,tmp、hdfs、hdfs/data、hdfs/name
[root@hadoop hadoop]# pwd /hadoop [root@hadoop hadoop]# mkdir tmp [root@hadoop hadoop]# mkdir hdfs [root@hadoop hadoop]# mkdir hdfs/data [root@hadoop hadoop]# mkdir hdfs/name
(4)配置/hadoop/etc/hadoop目錄下的core-site.xml
[root@hadoop hadoop]# vi etc/hadoop/core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://192.168.1.66:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/hadoop/tmp</value> </property> <property> <name>io.file.buffer.size</name> <value>131702</value> </property> </configuration>
(5)配置/hadoop/etc/hadoop/hdfs-site.xm
[root@hadoop hadoop]# vi etc/hadoop/hdfs-site.xml <configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/hadoop/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/hadoop/hdfs/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>192.168.1.66:9001</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
(6)復制etc/hadoop/mapred-site.xml.template為etc/hadoop/mapred-site.xml,再編輯:
[root@hadoop hadoop]# cd etc/hadoop/ [root@hadoop hadoop]# cp mapred-site.xml.template mapred-site.xml [root@hadoop hadoop]# vi mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>192.168.1.66:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>192.168.1.66:19888</value> </property> </configuration>
(7)配置 etc/hadoop/yarn-site.xml
[root@hadoop1 hadoop]# vi yarn-site.xml <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop</value> </property> </configuration>
(8)配置/hadoop/etc/hadoop/目錄下hadoop-env.sh、yarn-env.sh的JAVA_HOME,不設置的話,啟動不了
[root@hadoop hadoop]# pwd /hadoop/etc/hadoop [root@hadoop hadoop]# vi hadoop-env.sh 將 export JAVA_HOME 改為:export JAVA_HOME=/usr/java/jdk1.8.0_151 加入 export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"} export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native [root@hadoop hadoop]# vi yarn-env.sh 將 export JAVA_HOME 改為:export JAVA_HOME=/usr/java/jdk1.8.0_151
配置slaves文件
[root@hadoop hadoop]# cat slaves localhost
(9)配置hadoop環境變量
[root@hadoop ~]# vi /etc/profile 寫入下面內容 export HADOOP_HOME=/hadoop/ export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR" export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin [root@hadoop ~]# source /etc/profile
(10)啟動hadoop
[root@hadoop hadoop]# pwd /hadoop [root@hadoop hadoop]# bin/hdfs namenode -format .。。。。。。。。。。。。。。。。。。。。。 19/03/04 17:18:00 INFO namenode.FSImage: Allocated new BlockPoolId: BP-774693564-192.168.1.66-1551691079972 19/03/04 17:18:00 INFO common.Storage: Storage directory /hadoop/hdfs/name has been successfully formatted. 19/03/04 17:18:00 INFO namenode.FSImageFormatProtobuf: Saving image file /hadoop/hdfs/name/current/fsimage.ckpt_0000000000000000000 using no compression 19/03/04 17:18:00 INFO namenode.FSImageFormatProtobuf: Image file /hadoop/hdfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds. 19/03/04 17:18:00 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 19/03/04 17:18:00 INFO util.ExitUtil: Exiting with status 0 19/03/04 17:18:00 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hadoop/192.168.1.66 ************************************************************/
全部啟動sbin/start-all.sh,也可以分開sbin/start-dfs.sh、sbin/start-yarn.sh
[root@hadoop hadoop]# sbin/start-dfs.sh [root@hadoop hadoop]# sbin/start-yarn.sh
停止的話,輸入命令,sbin/stop-all.sh
輸入命令jps,可以看到相關信息:
[root@hadoop hadoop]# jps 10581 ResourceManager 10102 NameNode 10376 SecondaryNameNode 10201 DataNode 10683 NodeManager 11007 Jps
(11)啟動jobhistory
mr-jobhistory-daemon.sh start historyserver [root@hadoop hadoop]# jps 33376 NameNode 33857 ResourceManager 33506 DataNode 33682 SecondaryNameNode 33960 NodeManager 34319 JobHistoryServer 34367 Jps
(12)驗證
1)瀏覽器打開http://192.168.1.66:8088/
2)瀏覽器打開http://192.168.1.66:50070/
5、安裝Mysql
需要根據自己的系統版本去下載,下載連接:
https://dev.mysql.com/downloads/mysql/5.7.html#downloads
我這里下載的是適用我當前本人測試環境Centos 7 64位 的系統,而另一個測試環境10.1.197.241是Redhat 6,兩個測試環境如果安裝時要下載對應系統的rpm包,不然不兼容的rpm包安裝時會報下面的錯誤(比如在Redhat6安裝適用centos7的mysql):
[root@s197240 hadoop]# rpm -ivh mysql-community-libs-5.7.18-1.el7.x86_64.rpm warning: mysql-community-libs-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY error: Failed dependencies: libc.so.6(GLIBC_2.14)(64bit) is needed by mysql-community-libs-5.7.18-1.el7.x86_64
1)檢查卸載mariadb-lib
Centos自帶mariadb數據庫,刪除,安裝mysql
[root@hadoop hadoop]# rpm -qa|grep mariadb mariadb-libs-5.5.60-1.el7_5.x86_64 [root@hadoop hadoop]# rpm -e mariadb-libs-5.5.60-1.el7_5.x86_64 --nodeps [root@hadoop hadoop]# rpm -qa|grep mariadb
如果時Redhat6安裝時自帶mysql庫,卸載自帶的包:
通過此命令查找已經安裝的mysql包:
[root@s197240 hadoop]# rpm -qa |grep mysql mysql-community-common-5.7.18-1.el7.x86_64
通過此命令卸載:
[root@s197240 hadoop]# rpm -e --allmatches --nodeps mysql-community-common-5.7.18-1.el7.x86_64
2)上傳解壓安裝包
下載連接:
https://dev.mysql.com/downloads/file/?id=469456
[root@hadoop mysql]# pwd /usr/local/mysql [root@hadoop mysql]# ls mysql-5.7.18-1.el7.x86_64.rpm-bundle.tar [root@hadoop mysql]# tar -xvf mysql-5.7.18-1.el7.x86_64.rpm-bundle.tar mysql-community-server-5.7.18-1.el7.x86_64.rpm mysql-community-embedded-devel-5.7.18-1.el7.x86_64.rpm mysql-community-devel-5.7.18-1.el7.x86_64.rpm mysql-community-client-5.7.18-1.el7.x86_64.rpm mysql-community-common-5.7.18-1.el7.x86_64.rpm mysql-community-embedded-5.7.18-1.el7.x86_64.rpm mysql-community-embedded-compat-5.7.18-1.el7.x86_64.rpm mysql-community-libs-5.7.18-1.el7.x86_64.rpm mysql-community-server-minimal-5.7.18-1.el7.x86_64.rpm mysql-community-test-5.7.18-1.el7.x86_64.rpm mysql-community-minimal-debuginfo-5.7.18-1.el7.x86_64.rpm mysql-community-libs-compat-5.7.18-1.el7.x86_64.rpm
(3)安裝mysql server
其中,安裝mysql-server, 需要以下幾個必要的安裝包:
mysql-community-client-5.7.17-1.el7.x86_64.rpm(依賴於libs) mysql-community-common-5.7.17-1.el7.x86_64.rpm (依賴於common) mysql-community-libs-5.7.17-1.el7.x86_64.rpm mysql-community-server-5.7.17-1.el7.x86_64.rpm(依賴於common, client)
安裝上面四個包需要libaio和net-tools的依賴,這里配置好yum源,使用yum安裝,通過以下命令安裝:
yum -y install libaio yum -y install net-tools
安裝mysql-server:按照common–>libs–>client–>server的順序。若不按照此順序,也會有一定“依賴”關系的提醒。
[root@hadoop mysql]# rpm -ivh mysql-community-common-5.7.18-1.el7.x86_64.rpm warning: mysql-community-common-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY Preparing... ################################# [100%] Updating / installing... 1:mysql-community-common-5.7.18-1.e################################# [100%] [root@hadoop mysql]# rpm -ivh mysql-community-libs-5.7.18-1.el7.x86_64.rpm warning: mysql-community-libs-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY Preparing... ################################# [100%] Updating / installing... 1:mysql-community-libs-5.7.18-1.el7################################# [100%] [root@hadoop mysql]# rpm -ivh mysql-community-client-5.7.18-1.el7.x86_64.rpm warning: mysql-community-client-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY Preparing... ################################# [100%] Updating / installing... 1:mysql-community-client-5.7.18-1.e################################# [100%] [root@hadoop mysql]# rpm -ivh mysql-community-server-5.7.18-1.el7.x86_64.rpm warning: mysql-community-server-5.7.18-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY Preparing... ################################# [100%] Updating / installing... 1:mysql-community-server-5.7.18-1.e################################# [100%]
(4)初始化mysql
[root@hadoop mysql]# mysqld --initialize
mysql默認安裝在/var/lib下。
(5)更改mysql數據庫所屬於用戶及其所屬於組
[root@hadoop mysql]# chown mysql:mysql /var/lib/mysql -R
(6)啟動mysql數據庫
[root@hadoop mysql]# cd /var/lib/mysql [root@hadoop mysql]# systemctl start mysqld.service [root@hadoop ~]# cd /var/log/ [root@hadoop log]# grep 'password' mysqld.log 2019-02-26T04:33:06.989818Z 1 [Note] A temporary password is generated for root@localhost: mxeV&htW-3VC
更改root用戶密碼,新版的mysql在第一次登錄后更改密碼前是不能執行任何命令的
[root@hadoop log]# mysql -u root -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 4 Server version: 5.7.18 Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
更改密碼
mysql> set password=password('oracle'); Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> flush privileges; Query OK, 0 rows affected (0.00 sec) mysql> grant all privileges on *.* to root@'%' identified by 'oracle' with grant option; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> flush privileges; Query OK, 0 rows affected (0.00 sec)
如果是Redhat6系統,啟動mysql數據庫過程如下:
[root@s197240 mysql]# /etc/rc.d/init.d/mysqld start Starting mysqld: [ OK ] [root@s197240 mysql]# ls /etc/rc.d/init.d/mysqld -l -rwxr-xr-x 1 root root 7157 Dec 21 19:29 /etc/rc.d/init.d/mysqld [root@s197240 mysql]# chkconfig mysqld on [root@s197240 mysql]# chmod 755 /etc/rc.d/init.d/mysqld [root@s197240 mysql]# service mysqld start Starting mysqld: [ OK ] [root@s197240 mysql]# service mysqld status mysqld (pid 28861) is running...
mysql啟動后,剩余后面的操作完全按照上面systemctl start mysqld.service步驟下面的過程來就可以了
6、Hive安裝
下載連接:
http://archive.apache.org/dist/hive/hive-2.3.2/
(1)上載和解壓縮
[root@hadoop ~]# mkdir /hadoop/hive [root@hadoop ~]# cd /hadoop/hive/ [root@hadoop hive]# ls apache-hive-2.3.3-bin.tar.gz [root@hadoop hive]# tar -zxvf apache-hive-2.3.3-bin.tar.gz
(2)配置環境變量
[root@hadoop hive]# vim /etc/profile export JAVA_HOME=/usr/java/jdk1.8.0_151 export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$PATH:$JAVA_HOME/bin export HADOOP_HOME=/hadoop/ export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR" export HIVE_HOME=/hadoop/hive/ export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin
#修改完文件后,執行如下命令,讓配置生效:
[root@hadoop hive]# source /etc/profile
(3)Hive配置Hadoop HDFS
hive-site.xml配置
進入目錄$HIVE_HOME/conf,將hive-default.xml.template文件復制一份並改名為hive-site.xml
[root@hadoop hive]# cd $HIVE_HOME/conf [root@hadoop conf]# cp hive-default.xml.template hive-site.xml
使用hadoop新建hdfs目錄,因為在hive-site.xml中有如下配置:
<property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> <description>location of default database for the warehouse</description> </property> <property>
執行hadoop命令新建/user/hive/warehouse目錄:
[root@hadoop1 ~]# $HADOOP_HOME/bin/hadoop dfs -mkdir -p /user/hive/warehouse DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it.
#給新建的目錄賦予讀寫權限
[root@hadoop1 ~]# cd $HIVE_HOME [root@hadoop1 hive]# cd conf/ [root@hadoop1 conf]# sh $HADOOP_HOME/bin/hdfs dfs -chmod 777 /user/hive/warehouse
#查看修改后的權限
[root@hadoop1 conf]# sh $HADOOP_HOME/bin/hdfs dfs -ls /user/hive Found 1 items drwxrwxrwx - root supergroup 0 2019-02-26 14:15 /user/hive/warehouse #運用hadoop命令新建/tmp/hive目錄 [root@hadoop1 conf]# $HADOOP_HOME/bin/hdfs dfs -mkdir -p /tmp/hive #給目錄/tmp/hive賦予讀寫權限 [root@hadoop1 conf]# $HADOOP_HOME/bin/hdfs dfs -chmod 777 /tmp/hive #檢查創建好的目錄 [root@hadoop1 conf]# $HADOOP_HOME/bin/hdfs dfs -ls /tmp Found 1 items drwxrwxrwx - root supergroup 0 2019-02-26 14:17 /tmp/hive
將hive_site.xml文件中的{system:java.io.tmpdir}替換為hive的臨時目錄,例如我替換為$HIVE_HOME/tmp,該目錄如果不存在則要自己手工創建,並且賦予讀寫權限。
[root@hadoop1 conf]# cd $HIVE_HOME [root@hadoop1 hive]# mkdir tmp
配置文件hive-site.xml:
將文件中的所有 system:java.io.tmpdir替換成/hadoop/hive/tmp將文件中所有的 {system:java.io.tmpdir}替換成/hadoop/hive/tmp將文件中所有的system:java.io.tmpdir替換成/hadoop/hive/tmp將文件中所有的{system:user.name}替換為root
(4)配置mysql
把mysql的驅動包上傳到Hive的lib目錄下:
[root@hadoop lib]# pwd /usr/local/hive/lib [root@hadoop1 lib]# ls |grep mysql mysql-connector-java-5.1.47.jar
(5)修改hive-site.xml數據庫相關配置
搜索javax.jdo.option.connectionURL,將該name對應的value修改為MySQL的地址:
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value> <description> JDBC connect string for a JDBC metastore. To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. For example, jdbc:postgresql://myhost/db?ssl=true for postgres database. </description> </property>
搜索javax.jdo.option.ConnectionDriverName,將該name對應的value修改為MySQL驅動類路徑:
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
搜索javax.jdo.option.ConnectionUserName,將對應的value修改為MySQL數據庫登錄名:
<property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>Username to use against metastore database</description> </property>
搜索javax.jdo.option.ConnectionPassword,將對應的value修改為MySQL數據庫的登錄密碼:
<property> <name>javax.jdo.option.ConnectionPassword</name> <value>oracle</value> <description>password to use against metastore database</description> </property>
搜索hive.metastore.schema.verification,將對應的value修改為false:
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
在$HIVE_HOME/conf目錄下新建hive-env.sh
[root@hadoop1 conf]# cd $HIVE_HOME/conf [root@hadoop1 conf]# cp hive-env.sh.template hive-env.sh #打開hive-env.sh並添加如下內容 [root@hadoop1 conf]# vim hive-env.sh export HADOOP_HOME=/hadoop/ export HIVE_CONF_DIR=/hadoop/hive/conf export HIVE_AUX_JARS_PATH=/hadoop/hive/lib
(6)MySQL數據庫進行初始化
[root@apollo conf]# cd $HIVE_HOME/bin #對數據庫進行初始化: [root@hadoop1 bin]# schematool -initSchema -dbType mysql SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/imp l/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/ org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Metastore connection URL: jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotEx ist=true&characterEncoding=UTF-8&useSSL=falseMetastore Connection Driver : com.mysql.jdbc.Driver Metastore connection User: root Starting metastore schema initialization to 2.3.0 Initialization script hive-schema-2.3.0.mysql.sql Initialization script completed schemaTool completed
出現上面就是初始化成功,去mysql看下:
mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | metastore | | mysql | | performance_schema | | sys | +--------------------+ 5 rows in set (0.00 sec) mysql> use metastore Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> show tables; +---------------------------+ | Tables_in_metastore | +---------------------------+ | AUX_TABLE | | BUCKETING_COLS | | CDS | | COLUMNS_V2 | | COMPACTION_QUEUE | | COMPLETED_COMPACTIONS | | COMPLETED_TXN_COMPONENTS | | DATABASE_PARAMS | | DBS | | DB_PRIVS | | DELEGATION_TOKENS | | FUNCS | | FUNC_RU | | GLOBAL_PRIVS | | HIVE_LOCKS | | IDXS | | INDEX_PARAMS | | KEY_CONSTRAINTS | | MASTER_KEYS | | NEXT_COMPACTION_QUEUE_ID | | NEXT_LOCK_ID | | NEXT_TXN_ID | | NOTIFICATION_LOG | | NOTIFICATION_SEQUENCE | | NUCLEUS_TABLES | | PARTITIONS | | PARTITION_EVENTS | | PARTITION_KEYS | | PARTITION_KEY_VALS | | PARTITION_PARAMS | | PART_COL_PRIVS | | PART_COL_STATS | | PART_PRIVS | | ROLES | | ROLE_MAP | | SDS | | SD_PARAMS | | SEQUENCE_TABLE | | SERDES | | SERDE_PARAMS | | SKEWED_COL_NAMES | | SKEWED_COL_VALUE_LOC_MAP | | SKEWED_STRING_LIST | | SKEWED_STRING_LIST_VALUES | | SKEWED_VALUES | | SORT_COLS | | TABLE_PARAMS | | TAB_COL_STATS | | TBLS | | TBL_COL_PRIVS | | TBL_PRIVS | | TXNS | | TXN_COMPONENTS | | TYPES | | TYPE_FIELDS | | VERSION | | WRITE_SET | +---------------------------+ 57 rows in set (0.01 sec)
(7)啟動hive:
啟動metastore服務
nohup hive --service metastore >> ~/metastore.log 2>&1 &
啟動hiveserver2,jdbc連接均需要
nohup hive --service hiveserver2 >> ~/hiveserver2.log 2>&1 &
檢測hive和hive2端口
[root@hadoop bin]# netstat -lnp|grep 9083 tcp 0 0 0.0.0.0:9083 0.0.0.0:* LISTEN 11918/java [root@hadoop bin]# netstat -lnp|grep 10000 tcp 0 0 0.0.0.0:10000 0.0.0.0:* LISTEN 12011/java
測試hive
[root@hadoop1 bin]# ./hive which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0_151 /bin:/usr/java/jdk1.8.0_151/bin:/hadoop//bin:/hadoop//sbin:/root/bin:/usr/java/jdk1.8.0_151/bin:/usr/java/jdk1.8.0_151/bin:/hadoop//bin:/hadoop//sbin:/hadoop/hive/bin)SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/imp l/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/ org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.3.jar!/ hive-log4j2.properties Async: trueHive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> show functions; OK ! != $sum0 % 。。。。。 hive> desc function sum; OK sum(x) - Returns the sum of a set of numbers Time taken: 0.183 seconds, Fetched: 1 row(s) hive> create database sbux; OK Time taken: 0.236 seconds hive> use sbux; OK Time taken: 0.033 seconds hive> create table student(id int, name string) row format delimited fields terminated by '\t'; OK Time taken: 0.909 seconds hive> desc student; OK id int name string Time taken: 0.121 seconds, Fetched: 2 row(s) 在$HIVE_HOME下新建一個文件 #進入#HIVE_HOME目錄 [root@apollo hive]# cd $HIVE_HOME #新建文件student.dat [root@apollo hive]# touch student.dat #在文件中添加如下內容 [root@apollo hive]# vim student.dat 001 david 002 fab 003 kaishen 004 josen 005 arvin 006 wada 007 weda 008 banana 009 arnold 010 simon 011 scott .導入數據 hive> load data local inpath '/hadoop/hive/student.dat' into table sbux.student; Loading data to table sbux.student OK Time taken: 8.641 seconds hive> use sbux; OK Time taken: 0.052 seconds hive> select * from student; OK 1 david 2 fab 3 kaishen 4 josen 5 arvin 6 wada 7 weda 8 banana 9 arnold 10 simon 11 scott NULL NULL Time taken: 2.217 seconds, Fetched: 12 row(s)
(8)在界面上查看剛剛寫入的hdfs數據
在hadoop的namenode上查看:
<ignore_js_op style="overflow-wrap: break-word; color: rgb(68, 68, 68); font-family: "Microsoft Yahei", tahoma, arial, "Hiragino Sans GB", 宋體, sans-serif;">
在mysql的hive數據里查看
[root@hadoop1 bin]# mysql -u root -p Enter password: mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | metastore | | mysql | | performance_schema | | sys | +--------------------+ 5 rows in set (0.00 sec) mysql> use metastore; Database changed mysql> select * from TBLS; +--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+ | TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED | +--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+ | 1 | 1551178545 | 6 | 0 | root | 0 | 1 | student | MANAGED_TABLE | NULL | NULL | | +--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+--------------------+ 1 row in set (0.00 sec)
7、Zookeeper安裝
上傳解壓:
[root@hadoop ~]# cd /hadoop/ [root@hadoop hadoop]# pwd /hadoop [root@hadoop hadoop]# mkdir zookeeper [root@hadoop hadoop]# cd zookeeper/ [root@hadoop zookeeper]# tar -zxvf zookeeper-3.4.6.tar.gz 。。 [root@hadoop zookeeper]# ls zookeeper-3.4.6 zookeeper-3.4.6.tar.gz [root@hadoop zookeeper]# rm -rf *gz [root@hadoop zookeeper]# mv zookeeper-3.4.6/* . [root@hadoop zookeeper]# ls bin CHANGES.txt contrib docs ivy.xml LICENSE.txt README_packaging.txt recipes zookeeper-3.4.6 zookeeper-3.4.6.jar.asc zookeeper-3.4.6.jar.sha1 build.xml conf dist-maven ivysettings.xml lib NOTICE.txt README.txt src zookeeper-3.4.6.jar zookeeper-3.4.6.jar.md5
配置配置文件
創建快照日志存放目錄: mkdir -p /hadoop/zookeeper/dataDir
【注意】:如果不配置dataLogDir,那么事務日志也會寫在dataDir目錄中。這樣會嚴重影響zk的性能。因為在zk吞吐量很高的時候,產生的事務日志和快照日志太多。
[root@hadoop zookeeper]# cd conf/ [root@hadoop conf]# mv zoo_sample.cfg zoo.cfg [root@hadoop conf]# cat /hadoop/zookeeper/conf/zoo.cfg |grep -v ^#|grep -v ^$ tickTime=2000 initLimit=10 syncLimit=5 dataDir=/hadoop/zookeeper/dataDir dataLogDir=/hadoop/zookeeper/dataLogDir clientPort=2181 server.1=192.168.1.66:2887:3887
在我們配置的dataDir指定的目錄下面,創建一個myid文件,里面內容為一個數字,用來標識當前主機,conf/zoo.cfg文件中配置的server.X中X為什么數字,則myid文件中就輸入這個數字:
[root@hadoop conf]# echo "1" > /hadoop/zookeeper/dataDir/myid
啟動zookeeper:
[root@hadoop zookeeper]# cd bin/ [root@hadoop bin]# ./zkServer.sh start JMX enabled by default Using config: /hadoop/zookeeper/bin/../conf/zoo.cfg Starting zookeeper ... STARTED [root@hadoop bin]# ./zkServer.sh status JMX enabled by default Using config: /hadoop/zookeeper/bin/../conf/zoo.cfg Mode: standalone [root@hadoop bin]# ./zkCli.sh -server localhost:2181 Connecting to localhost:2181 2019-03-12 11:47:29,355 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 2019-03-12 11:47:29,360 [myid:] - INFO [main:Environment@100] - Client environment:host.name=hadoop 2019-03-12 11:47:29,361 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.8.0_151 2019-03-12 11:47:29,364 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation 2019-03-12 11:47:29,364 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/java/jdk1.8.0_151/jre 2019-03-12 11:47:29,364 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/hadoop/zookeeper/bin/../build/classes:/hadoop/zookeeper/bin/../build/lib/*.jar:/hadoop/z ookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/hadoop/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/hadoop/zookeeper/bin/../lib/netty-3.7.0.Final.jar:/hadoop/zookeeper/bin/../lib/log4j-1.2.16.jar:/hadoop/zookeeper/bin/../lib/jline-0.9.94.jar:/hadoop/zookeeper/bin/../zookeeper-3.4.6.jar:/hadoop/zookeeper/bin/../src/java/lib/*.jar:/hadoop/zookeeper/bin/../conf:.:/usr/java/jdk1.8.0_151/lib/dt.jar:/usr/java/jdk1.8.0_151/lib/tools.jar2019-03-12 11:47:29,364 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2019-03-12 11:47:29,364 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp 2019-03-12 11:47:29,364 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA> 2019-03-12 11:47:29,364 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux 2019-03-12 11:47:29,364 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64 2019-03-12 11:47:29,364 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.10.0-957.el7.x86_64 2019-03-12 11:47:29,365 [myid:] - INFO [main:Environment@100] - Client environment:user.name=root 2019-03-12 11:47:29,365 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/root 2019-03-12 11:47:29,365 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/hadoop/zookeeper/bin 2019-03-12 11:47:29,366 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyW atcher@799f7e29Welcome to ZooKeeper! 2019-03-12 11:47:29,402 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@975] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authe nticate using SASL (unknown error)JLine support is enabled 2019-03-12 11:47:29,494 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@852] - Socket connection established to localhost/127.0.0.1:2181, initiating session 2019-03-12 11:47:29,519 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x1696f feb12f0000, negotiated timeout = 30000 WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: localhost:2181(CONNECTED) 0] [root@hadoop bin]# jps 12467 QuorumPeerMain 11060 JobHistoryServer 10581 ResourceManager 12085 RunJar 10102 NameNode 12534 Jps 10376 SecondaryNameNode 10201 DataNode 11994 RunJar 10683 NodeManager
發現zookeeper正常起來了
8、Kafka安裝
上傳解壓:
[root@hadoop bin]# cd /hadoop/ [root@hadoop hadoop]# mkdir kafka [root@hadoop hadoop]# cd kafka/ [root@hadoop kafka]# ls kafka_2.11-1.1.1.tgz [root@hadoop kafka]# tar zxf kafka_2.11-1.1.1.tgz [root@hadoop kafka]# mv kafka_2.11-1.1.1/* . [root@hadoop kafka]# ls bin config kafka_2.11-1.1.1 kafka_2.11-1.1.1.tgz libs LICENSE NOTICE site-docs [root@hadoop kafka]# rm -rf *tgz [root@hadoop kafka]# ls bin config kafka_2.11-1.1.1 libs LICENSE NOTICE site-docs
修改配置文件:
[root@hadoop kafka]# cd config/ [root@hadoop config]# ls connect-console-sink.properties connect-file-sink.properties connect-standalone.properties producer.properties zookeeper.properties connect-console-source.properties connect-file-source.properties consumer.properties server.properties connect-distributed.properties connect-log4j.properties log4j.properties tools-log4j.properties [root@hadoop config]# vim server.properties
配置如下:
[root@hadoop config]# cat server.properties |grep -v ^#|grep -v ^$ broker.id=0 listeners=PLAINTEXT://192.168.1.66:9092 num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=102400 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 log.dirs=/hadoop/kafka/logs num.partitions=1 num.recovery.threads.per.data.dir=1 offsets.topic.replication.factor=1 transaction.state.log.replication.factor=1 transaction.state.log.min.isr=1 log.retention.hours=168 log.segment.bytes=1073741824 log.retention.check.interval.ms=300000 zookeeper.connect=192.168.1.66:2181 zookeeper.connection.timeout.ms=6000 group.initial.rebalance.delay.ms=0 delete.topic.enble=true -----如果不指定這個參數,執行刪除操作只是標記刪除
啟動kafka
[root@hadoop kafka]# nohup bin/kafka-server-start.sh config/server.properties&
查看nohup文件有沒有錯誤信息,沒錯就沒問題。
驗證kafka,為了日后操作方便,先來編輯幾個常用腳本:
--消費者消費指定topic數據 [root@hadoop kafka]# cat console.sh #!/bin/bash read -p "input topic:" name bin/kafka-console-consumer.sh --zookeeper 192.168.1.66:2181 --topic $name --from-beginning --列出當前所有topic [root@hadoop kafka]# cat list.sh #!/bin/bash bin/kafka-topics.sh -describe -zookeeper 192.168.1.66:2181 --生產者指定topic生產數據 [root@hadoop kafka]# cat productcmd.sh #!/bin/bash read -p "input topic:" name bin/kafka-console-producer.sh --broker-list 192.168.1.66:9092 --topic $name --啟動kafka [root@hadoop kafka]# cat startkafka.sh #!/bin/bash nohup bin/kafka-server-start.sh config/server.properties& 關閉kafka [root@hadoop kafka]# cat stopkafka.sh #!/bin/bash bin/kafka-server-stop.sh sleep 6 jps --創建topic [root@hadoop kafka]# cat create.sh read -p "input topic:" name bin/kafka-topics.sh --create --zookeeper 192.168.1.66:2181 --replication-factor 1 --partitions 1 --topic $name
接下來驗證kafka可用性:
會話1創建topic
[root@hadoop kafka]# ./create.sh input topic:test Created topic "test".
查看創建的topic
[root@hadoop kafka]# ./list.sh Topic:test PartitionCount:1 ReplicationFactor:1 Configs: Topic: test Partition: 0 Leader: 0 Replicas: 0 Isr: 0
會話1指定test生產數據:
[root@hadoop kafka]# ./productcmd.sh input topic:test >test >
會話2指定test消費數據:
[root@hadoop kafka]# ./console.sh input topic:test Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper] .test
測試可以正常生產和消費。
將kafka和zookeeper相關環境變量加到/etc/profile,並source使其生效。
export ZOOKEEPER_HOME=/hadoop/zookeeper
export KAFKA_HOME=/hadoop/kafka
9、Hbase安裝
下載連接:
http://archive.apache.org/dist/hbase/
(1)創建安裝目錄並上傳解壓:
[root@hadoop hbase]# tar -zxvf hbase-1.4.9-bin.tar.gz [root@hadoop hbase]# ls hbase-1.4.9 hbase-1.4.9-bin.tar.gz [root@hadoop hbase]# rm -rf *gz mv [root@hadoop hbase]# mv hbase-1.4.9/* . [root@hadoop hbase]# pwd /hadoop/hbase [root@hadoop hbase]# ls bin conf hbase-1.4.9 LEGAL LICENSE.txt README.txt CHANGES.txt docs hbase-webapps lib NOTICE.txt
(2)環境變量配置,我的環境變量如下:
export JAVA_HOME=/usr/java/jdk1.8.0_151 export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$PATH:$JAVA_HOME/bin export HADOOP_HOME=/hadoop/ export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR" export HIVE_HOME=/hadoop/hive export HIVE_CONF_DIR=${HIVE_HOME}/conf export HCAT_HOME=$HIVE_HOME/hcatalog export HIVE_DEPENDENCY=/hadoop/hive/conf:/hadoop/hive/lib/*:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-core-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-server-extensions-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-streaming-2.3.3.jar:/hadoop/hive/lib/hive-exec-2.3.3.jar export HBASE_HOME=/hadoop/hbase/ export ZOOKEEPER_HOME=/hadoop/zookeeper export KAFKA_HOME=/hadoop/kafka export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HCAT_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME:$KAFKA_HOME export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:${HIVE_HOME}/lib:$HBASE_HOME/lib
詳細配置
修改conf/hbase-env.sh中的HBASE_MANAGES_ZK為false:
[root@hadoop kafka]# cd /hadoop/hbase/ [root@hadoop hbase]# ls bin conf hbase-1.4.9 LEGAL LICENSE.txt README.txt CHANGES.txt docs hbase-webapps lib NOTICE.txt
修改hbase-env.sh文件加入下面內容
[root@hadoop hbase]# vim conf/hbase-env.sh export JAVA_HOME=/usr/java/jdk1.8.0_151 export HADOOP_HOME=/hadoop/ export HBASE_HOME=/hadoop/hbase/ export HBASE_MANAGES_ZK=false
修改配置文件hbase-site.xml
在該配置文件中可以給hbase配置一個臨時目錄,這里指定為mkdir /root/hbase/tmp,先執行命令創建文件夾。
mkdir /root/hbase mkdir /root/hbase/tmp mkdir /root/hbase/pids
在<configuration>節點內增加以下配置:
<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://192.168.1.66:9000/hbase</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/hadoop/zookeeper/dataDir</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>192.168.1.66</value> <description>the pos of zk</description> </property> <!-- 此處必須為true,不然hbase仍用自帶的zk,若啟動了外部的zookeeper,會導致沖突,hbase啟動不起來 --> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <!-- hbase主節點的位置 --> <property> <name>hbase.master</name> <value>192.168.1.66:60000</value> </property> </configuration> [root@hadoop hbase]# cat conf/regionservers 192.168.1.66 [root@hadoop hbase]# cp /hadoop/zookeeper/conf/zoo.cfg /hadoop/hbase/conf/
啟動hbase
[root@hadoop bin]# ./start-hbase.sh running master, logging to /hadoop/hbase//logs/hbase-root-master-hadoop.out Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 : running regionserver, logging to /hadoop/hbase//logs/hbase-root-regionserver-hadoop.out : Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 : Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 --查看hbase相關進程HMaster、HRegionServer 已經起來了 [root@hadoop bin]# jps 12449 QuorumPeerMain 13094 Kafka 10376 SecondaryNameNode 12046 RunJar 11952 RunJar 11060 JobHistoryServer 10581 ResourceManager 10102 NameNode 10201 DataNode 10683 NodeManager 15263 HMaster 15391 HRegionServer 15679 Jps
10、安裝KYLIN
下載連接
http://kylin.apache.org/cn/download/
(1)上傳解壓
[root@hadoop kylin]# pwd /hadoop/kylin [root@hadoop kylin]# ls apache-kylin-2.4.0-bin-hbase1x.tar.gz [root@hadoop kylin]# tar -zxvf apache-kylin-2.4.0-bin-hbase1x.tar.gz [root@hadoop kylin]# rm -rf apache-kylin-2.4.0-bin-hbase1x.tar.gz [root@hadoop kylin]# [root@hadoop kylin]# mv apache-kylin-2.4.0-bin-hbase1x/* . [root@hadoop kylin]# ls apache-kylin-2.4.0-bin-hbase1x bin commit_SHA1 conf lib sample_cube spark tomcat tool
(2)配置環境變量
/etc/profile內容如下
export JAVA_HOME=/usr/java/jdk1.8.0_151 export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$PATH:$JAVA_HOME/bin export HADOOP_HOME=/hadoop/ export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR" export HIVE_HOME=/hadoop/hive export HIVE_CONF_DIR=${HIVE_HOME}/conf export HCAT_HOME=$HIVE_HOME/hcatalog export HIVE_DEPENDENCY=/hadoop/hive/conf:/hadoop/hive/lib/*:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-core-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-server-extensions-2.3.3.jar:/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-streaming-2.3.3.jar:/hadoop/hive/lib/hive-exec-2.3.3.jar export HBASE_HOME=/hadoop/hbase/ export ZOOKEEPER_HOME=/hadoop/zookeeper export KAFKA_HOME=/hadoop/kafka export KYLIN_HOME=/hadoop/kylin/ export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HCAT_HOME/bin:$HBASE_HOME/bin:$ZOOKEEPER_HOME:$KAFKA_HOME:$KYLIN_HOME/bin export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:${HIVE_HOME}/lib:$HBASE_HOME/lib:$KYLIN_HOME/lib [root@hadoop kylin]# source /etc/profile
(3)修改kylin.properties內容
[root@hadoop kylin]# vim conf/kylin.properties kylin.rest.timezone=GMT+8 kylin.rest.servers=192.168.1.66:7070 kylin.job.jar=/hadoop/kylin/lib/kylin-job-2.4.0.jar kylin.coprocessor.local.jar=/hadoop/kylin/lib/kylin-coprocessor-2.4.0.jar kyin.server.mode=all kylin.rest.servers=192.168.1.66:7070
(4)編輯kylin_hive_conf.xml
[root@hadoop kylin]# vim conf/kylin_hive_conf.xml <property> <name>hive.exec.compress.output</name> <value>false</value> <description>Enable compress</description> </property>
(5)編輯server.xml
[root@hadoop kylin]# vim tomcat/conf/server.xml 注釋掉下面這點代碼: <!-- Connector port="7443" protocol="org.apache.coyote.http11.Http11Protocol" maxThreads="150" SSLEnabled="true" scheme="https" secure="true" keystoreFile="conf/.keystore" keystorePass="changeit" clientAuth="false" sslProtocol="TLS" /> -->
(6)編輯kylin.sh
#additionally add tomcat libs to HBASE_CLASSPATH_PREFIX export HBASE_CLASSPATH_PREFIX=${tomcat_root}/bin/bootstrap.jar:${tomcat_root}/bin/tomcat-juli.jar:${tomcat_root}/lib/*:$hive_dependency:${HBASE_CLASSPATH_PREFIX}
(7)啟動kylin
[root@hadoop kylin]# cd bin/ [root@hadoop bin]# pwd /hadoop/kylin/bin [root@hadoop bin]# ./check-env.sh Retrieving hadoop conf dir... KYLIN_HOME is set to /hadoop/kylin [root@hadoop bin]# ./kylin.sh start Retrieving hadoop conf dir... KYLIN_HOME is set to /hadoop/kylin Retrieving hive dependency... 。。。。。。。。。。。 A new Kylin instance is started by root. To stop it, run 'kylin.sh stop' Check the log at /hadoop/kylin/logs/kylin.log Web UI is at http://<hostname>:7070/kylin [root@hadoop bin]# jps 13216 HMaster 10376 SecondaryNameNode 12011 RunJar 11918 RunJar 13070 HQuorumPeer 11060 JobHistoryServer 10581 ResourceManager 31381 RunJar 10102 NameNode 13462 HRegionServer 10201 DataNode 10683 NodeManager 31677 Jps
至此,安裝已經完成,大家可以通過http://:7070/kylin去訪問kylin了,至於cube及steam cube的官方案例,因為文章長度原因,筆者寫到了這篇文章供參考:
hadoop+kylin安裝及官方cube/steam cube案例文檔
8)初步驗證及使用:
測試創建項目從hive庫取表:
打開網頁:http://192.168.1.66:7070/kylin/login
初始密碼:ADMIN/KYLIN
由頂部菜單欄進入 Model 頁面,然后點擊 Manage Projects。
點擊 + Project 按鈕添加一個新的項目。
在頂部菜單欄點擊 Model,然后點擊左邊的 Data Source 標簽,它會列出所有加載進 Kylin 的表,點擊 Load Table 按鈕。
輸入表名並點擊 Sync 按鈕提交請求。
接下來就可以看到導入的表結構了:
(2)、運行官方案例:
root@hadoop bin]# pwd /hadoop/kylin/bin [root@hadoop bin]# ./sample.sh Retrieving hadoop conf dir... 。。。。。。。。。 Sample cube is created successfully in project 'learn_kylin'. Restart Kylin Server or click Web UI => System Tab => Reload Metadata to take effect
看到上面最后兩條信息就說明案例使用的hive表都創建好了,接下來重啟kylin或則 reload metadata
再次刷新頁面:
選擇第二個kylin_sales_cube
選擇bulid,隨意選擇一個12年以后的日期
然后切換到monitor界面:
等待cube創建完成。
做sql查詢
編輯整個環境重啟腳本方便日常啟停:
環境停止腳本
[root@hadoop hadoop]# cat stop.sh #!/bin/bash echo -e "\n========Start stop kylin========\n" $KYLIN_HOME/bin/kylin.sh stop sleep 5 echo -e "\n========Start stop hbase========\n" $HBASE_HOME/bin/stop-hbase.sh sleep 5 echo -e "\n========Start stop kafka========\n" $KAFKA_HOME/bin/kafka-server-stop.sh $KAFKA_HOME/config/server.properties sleep 3 echo -e "\n========Start stop zookeeper========\n" $ZOOKEEPER_HOME/bin/zkServer.sh stop sleep 3 echo -e "\n========Start stop jobhistory========\n" mr-jobhistory-daemon.sh stop historyserver sleep 3 echo -e "\n========Start stop yarn========\n" stop-yarn.sh sleep 5 echo -e "\n========Start stop dfs========\n" stop-dfs.sh sleep 5 echo -e "\n========Start stop prot========\n" `lsof -i:9083|awk 'NR>=2{print "kill -9 "$2}'|sh` `lsof -i:10000|awk 'NR>=2{print "kill -9 "$2}'|sh` sleep 2 echo -e "\n========Check process========\n" jps
環境啟動腳本
[root@hadoop hadoop]# cat start.sh #!/bin/bash echo -e "\n========Start run dfs========\n" start-dfs.sh sleep 5 echo -e "\n========Start run yarn========\n" start-yarn.sh sleep 3 echo -e "\n========Start run jobhistory========\n" mr-jobhistory-daemon.sh start historyserver sleep 2 echo -e "\n========Start run metastore========\n" nohup hive --service metastore >> ~/metastore.log 2>&1 & sleep 10 echo -e "\n========Start run hiveserver2========\n" nohup hive --service hiveserver2 >> ~/hiveserver2.log 2>&1 & sleep 10 echo -e "\n========Check Port========\n" netstat -lnp|grep 9083 sleep 5 netstat -lnp|grep 10000 sleep 2 echo -e "\n========Start run zookeeper========\n" $ZOOKEEPER_HOME/bin/zkServer.sh start sleep 5 echo -e "\n========Start run kafka========\n" $KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties sleep 5 echo -e "\n========Start run hbase========\n" $HBASE_HOME/bin/start-hbase.sh sleep 5 echo -e "\n========Check process========\n" jps sleep 1 echo -e "\n========Start run kylin========\n" $KYLIN_HOME/bin/kylin.sh start
11.安裝scala
cd /home/tom $ tar -xzvf scala-2.10.6.tgz
在/etc/profile文件的末尾添加環境變量:
export SCALA_HOME=/home/tom//scala-2.10.6 export PATH=$SCALA_HOME/bin:$PATH
保存並更新/etc/profile:
source /etc/profile
查看是否成功:
scala -version
12.安裝Spark
cd /home/tom $ tar -xzvf spark-1.6.0-bin-hadoop2.6.tgz $ mv spark-1.6.0-bin-hadoop2.6 spark-1.6.0 $ sudo vim /etc/profile[/mw_shl_code]
在/etc/profile文件的末尾添加環境變量:
export SPARK_HOME=/home/tom/spark-1.6.0 export PATH=$SPARK_HOME/bin:$PATH
保存並更新/etc/profile:
source /etc/profile
在conf目錄下復制並重命名spark-env.sh.template為spark-env.sh:
cp spark-env.sh.template spark-env.sh $ vi spark-env.sh
在spark-env.sh中添加:
export JAVA_HOME=/home/tom/jdk1.8.0_73/ export SCALA_HOME=/home/tom//scala-2.10.6 export SPARK_MASTER_IP=localhost export SPARK_WORKER_MEMORY=4G
啟動
$SPARK_HOME/sbin/start-all.sh
停止
$SPARK_HOME/sbin/stop-all.sh
測試Spark是否安裝成功:
$SPARK_HOME/bin/run-example SparkPi
檢查WebUI,瀏覽器打開端口:http://localhost:8080
查看集群環境
http://master:8080/ 訪問正常
進入spark-shell
$spark-shell 執行正常如下圖
查看jobs等信息
http://master:4040/jobs/ 訪問正常。
13、Flink安裝
一:安裝
Flink官網下載地址:https://flink.apache.org/downloads.html
選擇1.6.3版本
下載:
wget http://mirrors.hust.edu.cn/apache/flink/flink-1.7.1/flink-1.7.1-bin-hadoop26-scala_2.11.tgz
解壓:
tar -zxvf flink-1.6.3-bin-hadoop26-scala_2.11.tgz mv flink-1.6.3 flink
查看本機host
進入flink目錄,修改conf/flink-conf.yaml文件
vim conf/flink-conf.yaml
修改conf/masters文件,修改后內容如下:
啟動單機版flink:
bin/start-cluster.sh
啟動界面如下:
查看啟動是否成功:
查看dashboard界面:http://192.168.186.129:808
二:官方案例演示
1.啟動一個終端輸入如下指令:
nc -lk 8000
2.啟動第二個終端,執行flink自帶的wordcount案例
bin/flink run examples/streaming/SocketWindowWordCount.jar --port 8000
3.在第一個終端發送數據:
4.測試結果保存在log/flink-root-taskexecutor-0-woniu.out文件中
5. Dashboard也可以看到任務信息