系統 centos7
遠程連接工具MobaXterm
一、虛擬機
虛擬機配置
下載安裝VMware Station,下載centos7
新建虛擬機
下一步
稍后安裝操作系統,下一步
操作系統選擇,下一步
修改名稱和位置,下一步
下一步
完成
新建虛擬機右鍵,虛擬機設置,CD/DVD選擇ISO映像文件
開啟虛擬機
選擇語言
繼續
點 安裝位置
點 完成
軟件選擇 保持最小安裝
開始安裝
設置ROOT密碼
zh**j**123
安裝完成重啟
打開網絡連接
查看VMnet8屬性,查看Internet協議版本4
記住IP地址和子網掩碼
編輯,虛擬網絡編輯器,選 VMnet8,取消勾選 使用本地DHCP服務將IP地址分配給虛擬機
點 NAT 設置,記住網關IP
虛擬機--->設置--->網絡適配器,網絡連接點 自定義,選VMnet8
進入系統
進入/etc/sysconfig/network-scripts目錄,修改ifcfg-ens33
vi /etc/sysconfig/network-scripts/ifcfg-ens33
修改配置
TYPE=Ethernet PROXY_METHOD=none BROWSER_ONLY=no BOOTPROTO=static DEFROUTE=yes IPV4_FAILURE_FATAL=no IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_FAILURE_FATAL=no IPV6_ADDR_GEN_MODE=stable-privacy NAME=ens33 UUID=aae5b9e2-96b2-416f-a009-f8e0c041edca DEVICE=ens33 ONBOOT=yes IPADDR=192.168.147.8 NETMASK=255.255.255.0 GATEWAY=192.168.147.2 DNS=192.168.147.2 DNS1=8.8.8.8
BOOTPROTO=static,
設置網卡引導協議為 靜態
ONBOOT=yes,
設置網卡啟動方式為 開機啟動
並且可以通過系統服務管理器 systemctl
控制網卡
重啟網絡服務
systemctl restart network
測試
[root@localhost network-scripts]# ping www.baidu.com PING www.wshifen.com (104.193.88.77) 56(84) bytes of data. 64 bytes from 104.193.88.77 (104.193.88.77): icmp_seq=2 ttl=128 time=256 ms 64 bytes from 104.193.88.77 (104.193.88.77): icmp_seq=3 ttl=128 time=321 ms
克隆另外兩台主機,名稱為bigdata2,bigdata3,ip為192.168.147.9、192.168.147.10
下一步
下一步
下一步
二、阿里雲
2.1 阿里雲准備
1.三台CES
2.若需要,購買公網彈性IP並綁定
3.若需要,可以購買雲盤
掛載數據盤
阿里雲購買的第2塊雲盤默認是不自動掛載的,需要手動配置掛載上。
(1)查看SSD雲盤
sudo fdisk -l
可以看到SSD系統已經識別為/dev/vdb
(2)格式化雲盤
sudo mkfs.ext4 /dev/vdb
(3)掛載
sudo mount /dev/vdb /opt
將雲盤掛載到/opt目錄下。
(4)配置開機自動掛載
修改/etc/fstab文件,文件末尾添加:
/dev/vdb /opt ext4 defaults 0 0
然后df -hl就可以看到第二塊掛載成功咯
如果是正在使用中的系統盤容量不夠了,擴容系統盤
yum install cloud-utils-growpart growpart /dev/vda 1 resize2fs /dev/vda1
三、准備
關閉防火牆
centos 7 默認使用的是firewall,不是iptables
systemctl stop firewalld.service
systemctl mask firewalld.service
關閉SELinux(所有節點)
vim /etc/selinux/config
設置SELINUX=disabled
修改主機名
分別命名為node01、node02、node03
以node01為例
[root@node01 ~]# hostnamectl set-hostname node01 [root@node01 ~]# cat /etc/hostname node01
已經修改,重新登錄即可。
修改 /etc/hosts文件
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.147.8 node01 192.168.147.9 node02 192.168.147.10 node03
配置免密登錄
生成私鑰和公鑰
ssh-keygen -t rsa
將公鑰拷貝到要免密登錄的目標機器上
ssh-copy-id node01 ssh-copy-id node02 ssh-copy-id node03
編寫幾個有用的腳本文件
使用rsync編寫xsync
#!/bin/sh # 獲取輸入參數個數,如果沒有參數,直接退出 pcount=$# if((pcount==0)); then echo no args...; exit; fi # 獲取文件名稱 p1=$1 fname=`basename $p1` echo fname=$fname # 獲取上級目錄到絕對路徑 pdir=`cd -P $(dirname $p1); pwd` echo pdir=$pdir # 獲取當前用戶名稱 user=`whoami` # 循環 for((host=1; host<=3; host++)); do echo $pdir/$fname $user@slave$host:$pdir echo ==================slave$host================== rsync -rvl $pdir/$fname $user@slave$host:$pdir done #Note:這里的slave對應自己主機名,需要做相應修改。另外,for循環中的host的邊界值由自己的主機編號決定
xcall.sh
#! /bin/bash for host in node01 node02 node03 do echo ------------ $i ------------------- ssh $i "$*" done
執行上面腳本之前將/etc/profile中的環境變量追加到~/.bashrc中,否則ssh執行命令會報錯
[root@node01 bigdata]# cat /etc/profile >> ~/.bashrc [root@node02 bigdata]# cat /etc/profile >> ~/.bashrc [root@node03 bigdata]# cat /etc/profile >> ~/.bashrc
創建/bigdata目錄
JDK配置
下載JDK,這里我們下載JDK8,https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html
需要Oracale賬號密碼,可以網絡搜索
上傳JDK到各個節點的/bigdata目錄下
解壓縮
tar -zxvf jdk-8u241-linux-x64.tar.gz
文件屬主和屬組如果不是root進行修改,下面是
Linux系統按文件所有者、文件所有者同組用戶和其他用戶來規定了不同的文件訪問權限。
1、chgrp:更改文件屬組
語法:
chgrp [-R] 屬組名 文件名
2、chown:更改文件屬主,也可以同時更改文件屬組
語法:
chown [–R] 屬主名 文件名 chown [-R] 屬主名:屬組名 文件名
創建軟連接
ln -s /root/bigdata/jdk1.8.0_241/ /usr/local/jdk
配置環境變量
vi /etc/profile
在最后面添加
export JAVA_HOME=/usr/local/jdk
export PATH=$PATH:${JAVA_HOME}/bin
加載配置文件
source /etc/profile
查看Java版本
[root@node03 bigdata]# java -version java version "1.8.0_241" Java(TM) SE Runtime Environment (build 1.8.0_241-b07) Java HotSpot(TM) 64-Bit Server VM (build 25.241-b07, mixed mode)
安裝成功
安裝MySQL
安裝Maven
http://maven.apache.org/download.cgi
下載,解壓
tar -zxvf apache-maven-3.6.1-bin.tar.gz
建立軟連接
ln -s /bigdata/apache-maven-3.6.3 /usr/local/maven
加入/etc/profile中
export M2_HOME=/usr/local/maven3
export PATH=$PATH:$M2_HOME/bin
安裝Git
yum install git
四、Cloudera Manager 6.3.1安裝
JDK位置
JAVA_HOME 一定要是 /usr/java/java-version
三台節點下載第三方依賴
yum install bind-utils psmisc cyrus-sasl-plain cyrus-sasl-gssapi fuse portmap fuse-libs /lib/lsb/init-functions httpd mod_ssl openssl-devel python-psycopg2 MySQL-python libxslt
配置倉庫
版本 6.3.1
RHEL 7 Compatible | https://archive.cloudera.com/cm6/6.3.1/redhat7/yum/ | cloudera-manager.repo |
下載cloudera-manager.repo 文件,放到Cloudera Manager Server節點的 /etc/yum.repos.d/ 目錄 中
[root@node01 ~]# cat /etc/yum.repos.d/cloudera-manager.repo [cloudera-manager] name=Cloudera Manager 6.3.1 baseurl=https://archive.cloudera.com/cm6/6.3.1/redhat7/yum/ gpgkey=https://archive.cloudera.com/cm6/6.3.1/redhat7/yum/RPM-GPG-KEY-cloudera gpgcheck=1 enabled=1 autorefresh=0
安裝Cloudera Manager Server
yum install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server
如果速度太慢,可以去 https://archive.cloudera.com/cm6/6.3.1/redhat7/yum/RPMS/x86_64/ 下載rpm包,上傳到服務器進行安裝
rpm -ivh cloudera-manager-agent-6.3.1-1466458.el7.x86_64.rpm cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm cloudera-manager-server-6.3.1-1466458.el7.x86_64.rpm
安裝完后
[root@node01 cm]# ll /opt/cloudera/ total 16 drwxr-xr-x 27 cloudera-scm cloudera-scm 4096 Mar 3 19:36 cm drwxr-xr-x 8 root root 4096 Mar 3 19:36 cm-agent drwxr-xr-x 2 cloudera-scm cloudera-scm 4096 Sep 25 16:34 csd drwxr-xr-x 2 cloudera-scm cloudera-scm 4096 Sep 25 16:34 parcel-repo
所有節點
server_host=node01
配置數據庫
安裝mysql
修改密碼,配置權限
移動引擎日志文件
將舊的InnoDB log files /var/lib/mysql/ib_logfile0 和 /var/lib/mysql/ib_logfile1 從 /var/lib/mysql/ 中移動到其他你指定的地方做備份
[root@node01 ~]# mv /var/lib/mysql/ib_logfile0 /bigdata [root@node01 ~]# mv /var/lib/mysql/ib_logfile1 /bigdata
更新my.cnf文件
默認在/etc/my.cnf目錄中
[root@node01 etc]# mv my.cnf my.cnf.bak [root@node01 etc]# vi my.cnf
官方推薦配置
[mysqld] datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock transaction-isolation = READ-COMMITTED # Disabling symbolic-links is recommended to prevent assorted security risks; # to do so, uncomment this line: symbolic-links = 0 key_buffer_size = 32M max_allowed_packet = 32M thread_stack = 256K thread_cache_size = 64 query_cache_limit = 8M query_cache_size = 64M query_cache_type = 1 max_connections = 550 #expire_logs_days = 10 #max_binlog_size = 100M #log_bin should be on a disk with enough free space. #Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your #system and chown the specified folder to the mysql user. log_bin=/var/lib/mysql/mysql_binary_log #In later versions of MySQL, if you enable the binary log and do not set #a server_id, MySQL will not start. The server_id must be unique within #the replicating group. server_id=1 binlog_format = mixed read_buffer_size = 2M read_rnd_buffer_size = 16M sort_buffer_size = 8M join_buffer_size = 8M # InnoDB settings innodb_file_per_table = 1 innodb_flush_log_at_trx_commit = 2 innodb_log_buffer_size = 64M innodb_buffer_pool_size = 4G innodb_thread_concurrency = 8 innodb_flush_method = O_DIRECT innodb_log_file_size = 512M [mysqld_safe] log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid sql_mode=STRICT_ALL_TABLES
確保開機啟動
systemctl enable mysqld
啟動MySql
systemctl start mysqld
安裝JDBC驅動
下載
wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.46.tar.gz
解壓縮
tar zxvf mysql-connector-java-5.1.46.tar.gz
拷貝驅動到 /usr/share/java/ 目錄中並重命名,如果沒有創建該目錄
[root@node01 etc]# mkdir -p /usr/share/java/ [root@node01 etc]# cd mysql-connector-java-5.1.46 [root@node01 mysql-connector-java-5.1.46]# cp mysql-connector-java-5.1.46-bin.jar /usr/share/java/mysql-connector-java.jar
為CM組件配置MySQL數據庫
Cloudera Manager Server, Oozie Server, Sqoop Server, Activity Monitor, Reports Manager, Hive Metastore Server, Hue Server, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server這些組件都需要建立數據庫
Service | Database | User |
---|---|---|
Cloudera Manager Server | scm | scm |
Activity Monitor | amon | amon |
Reports Manager | rman | rman |
Hue | hue | hue |
Hive Metastore Server | metastore | hive |
Sentry Server | sentry | sentry |
Cloudera Navigator Audit Server | nav | nav |
Cloudera Navigator Metadata Server | navms | navms |
Oozie | oozie | oozie |
登錄mysql,輸入密碼
mysql -u root -p
Create databases for each service deployed in the cluster using the following commands. You can use any value you want for the <database>, <user>, and <password> parameters. The Databases for Cloudera Software table, below lists the default names provided in the Cloudera Manager configuration settings, but you are not required to use them.
Configure all databases to use the utf8 character set.
Include the character set for each database when you run the CREATE DATABASE statements described below.
為每個部屬在集里的服務創建數據庫,所有數據庫都使用 utf8 character set
CREATE DATABASE <database> DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
賦權限
GRANT ALL ON <database>.* TO '<user>'@'%' IDENTIFIED BY '<password>';
實例
mysql> CREATE DATABASE amon DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; Query OK, 1 row affected (0.00 sec) mysql> CREATE DATABASE scm DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; Query OK, 1 row affected (0.00 sec) mysql> CREATE DATABASE hive DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; Query OK, 1 row affected (0.00 sec) mysql> CREATE DATABASE oozie DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; Query OK, 1 row affected (0.00 sec) mysql> CREATE DATABASE hue DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; Query OK, 1 row affected (0.01 sec) mysql> CREATE DATABASE rman DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; Query OK, 1 row affected (0.00 sec) mysql> CREATE DATABASE sentry DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; Query OK, 1 row affected (0.01 sec) mysql> mysql> CREATE DATABASE nav DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; Query OK, 1 row affected (0.00 sec) mysql> CREATE DATABASE metastore DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; Query OK, 1 row affected (0.00 sec)
mysql> GRANT ALL ON scm.* TO 'scm'@'%' IDENTIFIED BY '@Zhaojie123'; Query OK, 0 rows affected, 1 warning (0.01 sec) mysql> GRANT ALL ON amon.* TO 'amon'@'%' IDENTIFIED BY '@Zhaojie123'; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> GRANT ALL ON hive.* TO 'hive'@'%' IDENTIFIED BY '@Zhaojie123'; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> GRANT ALL ON oozie.* TO 'oozie'@'%' IDENTIFIED BY '@Zhaojie123'; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> GRANT ALL ON hue.* TO 'hue'@'%' IDENTIFIED BY '@Zhaojie123'; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> GRANT ALL ON rman.* TO 'rman'@'%' IDENTIFIED BY '@Zhaojie123'; Query OK, 0 rows affected, 1 warning (0.01 sec) mysql> GRANT ALL ON metastore.* TO 'metastore'@'%' IDENTIFIED BY '@Zhaojie123'; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> GRANT ALL ON nav.* TO 'nav'@'%' IDENTIFIED BY '@Zhaojie123'; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> GRANT ALL ON navms.* TO 'navms'@'%' IDENTIFIED BY '@Zhaojie123'; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> GRANT ALL ON sentry.* TO 'sentry'@'%' IDENTIFIED BY '@Zhaojie123'; Query OK, 0 rows affected, 1 warning (0.01 sec)
flush privileges;
Record the values you enter for database names, usernames, and passwords. The Cloudera Manager installation wizard requires this information to correctly connect to these databases.
建立Cloudera Manager數據庫
使用CM自帶腳本創建
/opt/cloudera/cm/schema/scm_prepare_database.sh <databaseType> <databaseName> <databaseUser>
實例
[root@node01 cm]# /opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm Enter SCM password: JAVA_HOME=/usr/local/jdk Verifying that we can write to /etc/cloudera-scm-server Creating SCM configuration file in /etc/cloudera-scm-server Executing: /usr/local/jdk/bin/java -cp /usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/java/postgresql-connector-java.jar:/opt/cloudera/cm/schema/../lib/* com.cloudera.enterprise.dbutil.DbCommandExecutor /etc/cloudera-scm-server/db.properties com.cloudera.cmf.db. Tue Mar 03 19:46:36 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. 2020-03-03 19:46:36,866 [main] INFO com.cloudera.enterprise.dbutil.DbCommandExecutor - Successfully connected to database. All done, your SCM database is configured correctly!
主節點
vim /etc/cloudera-scm-server/db.properties
com.cloudera.cmf.db.type=mysql
com.cloudera.cmf.db.host=node01 com.cloudera.cmf.db.name=scm com.cloudera.cmf.db.user=scm com.cloudera.cmf.db.setupType=EXTERNAL com.cloudera.cmf.db.password=@Z
准備parcels,將CDH相關文件拷貝到主節點
[root@node01 parcel-repo]# pwd /opt/cloudera/parcel-repo [root@node01 parcel-repo]# ll total 2035084 -rw-r--r-- 1 root root 2083878000 Mar 3 21:27 CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel -rw-r--r-- 1 root root 40 Mar 3 21:15 CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.sha1 -rw-r--r-- 1 root root 33887 Mar 3 21:15 manifest.json [root@node01 parcel-repo]# mv CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.sha1 CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.sha [root@node01 parcel-repo]# ll total 2035084 -rw-r--r-- 1 root root 2083878000 Mar 3 21:27 CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel -rw-r--r-- 1 root root 40 Mar 3 21:15 CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.sha -rw-r--r-- 1 root root 33887 Mar 3 21:15 manifest.json
啟動
主節點
systemctl start cloudera-scm-server
systemctl start cloudera-scm-agent
從節點
systemctl start cloudera-scm-agent
瀏覽器輸入地址 ip:7180,登錄,用戶名和密碼均為admin
繼續
接受協議,繼續
選擇版本,繼續
進入集群安裝歡迎頁
繼續, 為集群命名,
繼續, 選擇管理的主機
選擇CDH版本
集群安裝
速度慢,可去https://archive.cloudera.com/cdh6/6.3.2/parcels/下載
檢測網絡和主機
不斷繼續
服務暫時選HDFS、YARN、Zookeeper
分配角色
繼續直到完成
配置Hadoop支持LZO
LzoCodec和LzopCodec區別
兩種壓縮編碼LzoCodec和LzopCodec區別:
1. LzoCodec比LzopCodec更快, LzopCodec為了兼容LZOP程序添加了如 bytes signature, header等信息。
2. LzoCodec作為Reduce輸出,結果文件擴展名為 ”.lzo_deflate” ,無法被lzop讀取;使用LzopCodec作為Reduce輸出,生成擴展名為 ”.lzo” 的文件,可被lzop讀取。
3. LzoCodec結果(.lzo_deflate文件) 不能由 lzo index job 的 "DistributedLzoIndexer" 創建index。
4. “.lzo_deflate” 文件不能作為MapReduce輸入。而這些 “.LZO” 文件都支持。
綜上所述,map輸出的中間結果使用LzoCodec,reduce輸出使用 LzopCodec。另外:org.apache.hadoop.io.compress.LzoCodec和com.hadoop.compression.lzo.LzoCodec功能一樣,都是源碼包中帶的,生成的都是 lzo_deflate 文件。
在線Parcel安裝Lzo
下載地址:修改6.x.y為對應版本
CDH6:https://archive.cloudera.com/gplextras6/6.x.y/parcels/ CDH5:https://archive.cloudera.com/gplextras5/parcels/5.x.y/
1. 在CDH的 Parcel 配置中,“遠程Parcel存儲庫URL”,點擊 “+” 號,添加地址欄:
CDH6:https://archive.cloudera.com/gplextras6/6.0.1/parcels/ CDH5:http://archive.cloudera.com/gplextras/parcels/latest/
其他離線方式:
下載parcel放到 /opt/cloudera/parcel-repo 目錄下
或者
搭建httpd,更改parcel URL地址,再在按遠程安裝
2. 返回Parcel列表,延遲幾秒后會看到多出了 GPLEXTRAS(CDH6) 或者 HADOOP_LZO (CDH5),
下載 -- 分配 -- 激活。
3. 安裝完LZO后,打開HDFS配置,找到“壓縮編碼解碼器”,點擊 “+” 號,
添加:
com.hadoop.compression.lzo.LzoCodec
com.hadoop.compression.lzo.LzopCodec
4. YARN配置,找到 “MR 應用程序 Classpath”(mapreduce.application.classpath)
添加:
/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/*
5. 重啟更新過期配置
添加sqoop
繼續
Spark安裝
添加服務,添加spark
服務添加完成后,去節點進行配置
三台節點都要配置
進入目錄
cd /opt/cloudera/parcels/CDH/lib/spark/conf
添加JAVA路徑
vi spark-env.sh
末尾添加
export JAVA_HOME=/usr/local/jdk
創建slaves文件
添加work節點
node02
node03
刪除軟連接work
rm -r work
修改端口,防止與yarn沖突
vi spark-defaults.conf
spark.shuffle.service.port=7337 可改為7338
啟動時發現
[root@node01 sbin]# ./start-all.sh WARNING: User-defined SPARK_HOME (/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/spark) overrides detected (/opt/cloudera/parcels/CDH/lib/spark). WARNING: Running start-master.sh from user-defined location. /opt/cloudera/parcels/CDH/lib/spark/bin/load-spark-env.sh: line 77: /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/spark/bin/start-master.sh: No such file or directory WARNING: User-defined SPARK_HOME (/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/spark) overrides detected (/opt/cloudera/parcels/CDH/lib/spark). WARNING: Running start-slaves.sh from user-defined location. /opt/cloudera/parcels/CDH/lib/spark/bin/load-spark-env.sh: line 77: /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/spark/bin/start-slaves.sh: No such file or directory
將sbin目錄下的文件拷貝到bin目錄下
[root@node01 bin]# xsync start-slave.sh [root@node01 bin]# xsync start-master.sh
啟動成功
jps命令查看,node1又master,node2和node3有worker
進入shell
[root@node01 bin]# spark-shell Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 20/03/04 13:22:07 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! Spark context Web UI available at http://node01:4040 Spark context available as 'sc' (master = yarn, app id = application_1583295431127_0001). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.0-cdh6.3.1 /_/ Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_241) Type in expressions to have them evaluated. Type :help for more information. scala> var h =1 h: Int = 1 scala> h + 3 res1: Int = 4 scala> :quit
在網頁修改才會持續修改,在文件中修改,重啟CDH會被復原。
Flink安裝
本人編譯號的Flink
鏈接:https://pan.baidu.com/s/1lIqeBtNpj0wR-Q8KAEAIsg
提取碼:89wi
1、環境
Jdk 1.8、centos7.6、Maven 3.2.5、Scala-2.122、源碼和CDH 版本
Flink 1.10.0 、 CDH 6.3.1(Hadoop 3.0.0)
flink重新編譯
修改maven的配置文件
vi settings.xml
配置maven源
<mirrors> <mirror> <id>alimaven</id> <mirrorOf>central</mirrorOf> <name>aliyun maven</name> <url>http://maven.aliyun.com/nexus/content/repositories/central/</url> </mirror> <mirror> <id>alimaven</id> <name>aliyun maven</name> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> <mirrorOf>central</mirrorOf> </mirror> <mirror> <id>central</id> <name>Maven Repository Switchboard</name> <url>http://repo1.maven.org/maven2/</url> <mirrorOf>central</mirrorOf> </mirror> <mirror> <id>repo2</id> <mirrorOf>central</mirrorOf> <name>Human Readable Name for this Mirror.</name> <url>http://repo2.maven.org/maven2/</url> </mirror> <mirror> <id>ibiblio</id> <mirrorOf>central</mirrorOf> <name>Human Readable Name for this Mirror.</name> <url>http://mirrors.ibiblio.org/pub/mirrors/maven2/</url> </mirror> <mirror> <id>jboss-public-repository-group</id> <mirrorOf>central</mirrorOf> <name>JBoss Public Repository Group</name> <url>http://repository.jboss.org/nexus/content/groups/public</url> </mirror> <mirror> <id>google-maven-central</id> <name>Google Maven Central</name> <url>https://maven-central.storage.googleapis.com </url> <mirrorOf>central</mirrorOf> </mirror> <mirror> <id>maven.net.cn</id> <name>oneof the central mirrors in china</name> <url>http://maven.net.cn/content/groups/public/</url> <mirrorOf>central</mirrorOf> </mirror> </mirrors>
下載依賴的 flink-shaded 源碼
不同的 Flink 版本使用的 Flink-shaded不同,1.10 版本使用 10.0
https://mirrors.tuna.tsinghua.edu.cn/apache/flink/flink-shaded-10.0/flink-shaded-10.0-src.tgz
解壓后,在 pom.xml 中,添加如下,加入到標簽中
<profile> <id>vendor-repos</id> <activation> <property> <name>vendor-repos</name> </property> </activation> <!-- Add vendor maven repositories --> <repositories> <!-- Cloudera --> <repository> <id>cloudera-releases</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos</url> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled> </snapshots> </repository> <!-- Hortonworks --> <repository> <id>HDPReleases</id> <name>HDP Releases</name> <url>https://repo.hortonworks.com/content/repositories/releases/</url> <snapshots><enabled>false</enabled></snapshots> <releases><enabled>true</enabled></releases> </repository> <repository> <id>HortonworksJettyHadoop</id> <name>HDP Jetty</name> <url>https://repo.hortonworks.com/content/repositories/jetty-hadoop</url> <snapshots><enabled>false</enabled></snapshots> <releases><enabled>true</enabled></releases> </repository> <!-- MapR --> <repository> <id>mapr-releases</id> <url>https://repository.mapr.com/maven/</url> <snapshots><enabled>false</enabled></snapshots> <releases><enabled>true</enabled></releases> </repository> </repositories> </profile>
在flink-shade目錄下運行下面的命令,進行編譯
mvn -T2C clean install -DskipTests -Pvendor-repos -Dhadoop.version=3.0.0-cdh6.3.1 -Dscala-2.12 -Drat.skip=true
下載flink源碼 https://mirrors.aliyun.com/apache/flink/flink-1.10.0/
解壓,進入目錄,修改文件
[root@node02 ~]# cd /bigdata/ [root@node02 bigdata]# cd flink [root@node02 flink]# cd flink-1.10.0 [root@node02 flink-1.10.0]# cd flink-runtime-web/ [root@node02 flink-runtime-web]# ll total 24 -rw-r--r-- 1 501 games 8726 Mar 7 23:31 pom.xml -rw-r--r-- 1 501 games 3505 Feb 8 02:18 README.md drwxr-xr-x 4 501 games 4096 Feb 8 02:18 src drwxr-xr-x 3 501 games 4096 Mar 7 23:19 web-dashboard [root@node02 flink-runtime-web]# vi pom.xml
加入國內的下載地址,否則很可能報錯
<execution> <id>install node and npm</id> <goals> <goal>install-node-and-npm</goal> </goals> <configuration>
<nodeDownloadRoot>http://npm.taobao.org/mirrors/node/</nodeDownloadRoot>
<npmDownloadRoot>http://npm.taobao.org/mirrors/npm/</npmDownloadRoot>
<nodeVersion>v10.9.0</nodeVersion> </configuration> </execution>
在flink源碼解壓目錄下運行下列命令,編譯 Flink 源碼
mvn clean install -DskipTests -Dfast -Drat.skip=true -Dhaoop.version=3.0.0-cdh6.3.1 -Pvendor-repos -Dinclude-hadoop -Dscala-2.12 -T2C
提取出 flink-1.10.0 二進制包即可
目錄地址:
flink-1.10.0/flink-dist/target/flink-1.10.0-bin
flink on yarn模式
三個節點配置環境變量
export HADOOP_HOME=/opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567
export HADOOP_CONF_DIR=/etc/hadoop/conf export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source下配置文件
如果機器上安裝了spark,其worker端口8081會和flink的web端口沖突進行修改
進入一個節點flink目錄下conf目錄中的的配置文件
vi flink-conf.yaml
設置
rest.port: 8082
並繼續在該文件中添加或修改
high-availability: zookeeper high-availability.storageDir: hdfs://node01:8020/flink_yarn_ha high-availability.zookeeper.path.root: /flink-yarn high-availability.zookeeper.quorum: node01:2181,node02:2181,node03:2181 yarn.application-attempts: 10
將flink分發到各個節點
xsync flink-1.10.0
hdfs上面創建文件夾
node01執行以下命令創建hdfs文件夾
hdfs dfs -mkdir -p /flink_yarn_ha
建立測試文件
vim wordcount.txt
內容如下
hello world
flink hadoop
hive spark
hdfs上面創建文件夾並上傳文件
hdfs dfs -mkdir -p /flink_input hdfs dfs -put wordcount.txt /flink_input
測試
[root@node01 flink-1.10.0]# bin/flink run -m yarn-cluster ./examples/batch/WordCount.jar -input hdfs://node01:8020/flink_input -output hdfs://node01:8020/out_result1/out_count.txt -yn 2 -yjm 1024 -ytm 1024
查看輸出結果
hdfs dfs -cat hdfs://node01:8020/out_result/out_count.txt
Kafka
下載 http://archive.cloudera.com/kafka/parcels/4.0.0/
分配,激活
添加服務,三個節點都分配borker角色,其他不用配置
可以修改Java Heap Size of Broker
創建topic
/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/kafka-topics --zookeeper node01:2181,node02:2181,node03:2181 --create --replication-factor 1 --partitions 1 --topic test
查看主題
/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/kafka-topics --zookeeper node01:2181 --list
產生消息
/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/kafka-console-producer --broker-list node01:9092 --topic test
消費消息
/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/kafka-console-consumer --bootstrap-server node01:9092 --topic test
五、原生安裝
https://archive.apache.org/dist/
Hadoop 2.8.5
HBase 2.1.8
Flume
Sqoop
Kafka
Storm
spark 2.4.6
Flink
Zookeeper
https://www.cnblogs.com/aidata/p/12441506.html#_label1_2
三節點
集群規划
在node01、node02和node03三個節點上部署Zookeeper。
解壓安裝
(1)解壓Zookeeper安裝包到/opt/module/目錄下
[root@hadoop101 software]$ tar -zxvf zookeeper-3.4.10.tar.gz -C /opt/module/
(2)同步/opt/module/zookeeper-3.4.10目錄內容到hadoop103、hadoop104
[root@hadoop101 module]$ xsync zookeeper-3.4.10/
配置服務器編號
(1)在/opt/module/zookeeper-3.4.10/這個目錄下創建zkData
[root@hadoop101 zookeeper-3.4.10]$ mkdir -p zkData
(2)在/opt/module/zookeeper-3.4.10/zkData目錄下創建一個myid的文件
[root@hadoop101 zkData]$ touch myid
添加myid文件,注意一定要在linux里面創建,在notepad++里面很可能亂碼
(3)編輯myid文件
[root@hadoop101 zkData]$ vi myid
在文件中添加與server對應的編號 1
(4)拷貝配置好的zookeeper到其他機器上
[root@hadoop101 zkData]$ xsync myid
並分別在hadoop102、hadoop103上修改myid文件中內容為2、3
配置zoo.cfg文件
(1)重命名/opt/module/zookeeper-3.4.10/conf這個目錄下的zoo_sample.cfg為zoo.cfg
[root@hadoop101 conf]$ mv zoo_sample.cfg zoo.cfg
(2)打開zoo.cfg文件
[root@hadoop101 conf]$ vim zoo.cfg
修改數據存儲路徑配置
dataDir=/opt/module/zookeeper-3.4.10/zkData
增加如下配置
#######################cluster########################## server.1=hadoop101:2888:3888 server.2=hadoop102:2888:3888 server.3=hadoop103:2888:3888
(3)同步zoo.cfg配置文件
[root@hadoop101 conf]$ xsync zoo.cfg
(4)配置參數解讀
server.A=B:C:D。
A是一個數字,表示這個是第幾號服務器;
集群模式下配置一個文件myid,這個文件在dataDir目錄下,這個文件里面有一個數據就是A的值,Zookeeper啟動時讀取此文件,拿到里面的數據與zoo.cfg里面的配置信息比較從而判斷到底是哪個server。
B是這個服務器的ip地址;
C是這個服務器與集群中的Leader服務器交換信息的端口;
D是萬一集群中的Leader服務器掛了,需要一個端口來重新進行選舉,選出一個新的Leader,而這個端口就是用來執行選舉時服務器相互通信的端口。
集群操作
(1)分別啟動Zookeeper
[root@hadoop101 zookeeper-3.4.10]$ bin/zkServer.sh start [root@hadoop102 zookeeper-3.4.10]$ bin/zkServer.sh start [root@hadoop103 zookeeper-3.4.10]$ bin/zkServer.sh start
(2)查看狀態
[root@hadoop101 zookeeper-3.4.10]# bin/zkServer.sh status JMX enabled by default Using config: /opt/module/zookeeper-3.4.10/bin/../conf/zoo.cfg Mode: follower [root@hadoop102 zookeeper-3.4.10]# bin/zkServer.sh status JMX enabled by default Using config: /opt/module/zookeeper-3.4.10/bin/../conf/zoo.cfg Mode: leader [root@hadoop103 zookeeper-3.4.5]# bin/zkServer.sh status JMX enabled by default Using config: /opt/module/zookeeper-3.4.10/bin/../conf/zoo.cfg Mode: follower
id在集群中必須是唯一的,其值應在1到255之間。
常用服務命令
1. 啟動ZK服務: bin/zkServer.sh start
2. 查看ZK服務狀態: bin/zkServer.sh status
3. 停止ZK服務: bin/zkServer.sh stop
4. 重啟ZK服務: bin/zkServer.sh restart
5. 連接服務器: zkCli.sh -server 127.0.0.1:2181
集群監控
如果出現錯誤
[myid:1] - WARN [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumCnxManager@685] - Cannot open channel to 3 at election address k8s-node3/10.0.2.15:17888 java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:606) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:656) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:713) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:741) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:910) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1229)
如hadoop101
server.1=0.0.0.0:2888:3888 server.2=hadoop102:2888:3888 server.3=hadoop103:2888:3888
其他節點一樣
本機用節點 用 0.0.0.0 IP代替主機名
How have defined the ip of the local server in each node? If you have given the public ip, then the listener would have failed to connect to the port. You must specify 0.0.0.0 for the current node
server.1=0.0.0.0:2888:3888
server.2=192.168.10.10:2888:3888 server.3=192.168.2.1:2888:3888
This change must be performed at the other nodes too.
安裝腳本
#! /bin/bash echo "====================zookeeper安裝===============================" echo "====================下載zookeeper===============================" #wget https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.5.8/apache-zookeeper-3.5.8-bin.tar.gz #tar -zxvf apache-zookeeper-3.5.8-bin.tar.gz #xsync apache-zookeeper-3.5.8-bin/ # 循環 i=0 for host in node01 node02 node03; do echo ==================node$host================== ssh $host "mkdir -p /bigdata/apache-zookeeper-3.5.8-bin/zkData" ssh $host "touch /bigdata/apache-zookeeper-3.5.8-bin/zkData/myid" ssh $host "echo $i > /bigdata/apache-zookeeper-3.5.8-bin/zkData/myid" ssh $host "cp /bigdata/apache-zookeeper-3.5.8-bin/conf/zoo_sample.cfg /bigdata/apache-zookeeper-3.5.8-bin/conf/zoo.cfg" ssh $host 'sed -i "s#^dataDir=.*#dataDir=/bigdata/apache-zookeeper-3.5.8-bin/zkData#" /bigdata/apache-zookeeper-3.5.8-bin/conf/zoo.cfg' ssh $host 'echo "server.1=node01:2888:3888" >> /bigdata/apache-zookeeper-3.5.8-bin/conf/zoo.cfg' ssh $host 'echo "server.2=node02:2888:3888" >> /bigdata/apache-zookeeper-3.5.8-bin/conf/zoo.cfg' ssh $host 'echo "server.3=node03:2888:3888" >> /bigdata/apache-zookeeper-3.5.8-bin/conf/zoo.cfg'
let 'i+=1'
done
啟動腳本
#!/bin/sh
# 循環
for((host=1; host<=3; host++)); do echo ==================k8s-node$host================== ssh root@k8s-node$host "source /etc/profile;/opt/module/apache-zookeeper-3.5.7-bin/bin/zkServer.sh start" done
修改為你自己的主機名和目錄
關閉所有節點
#!/bin/sh
# 循環
for((host=1; host<=3; host++)); do echo ==================k8s-node$host================== ssh root@k8s-node$host "source /etc/profile;/opt/module/apache-zookeeper-3.5.7-bin/bin/zkServer.sh stop" done
查看所有節點狀態
#!/bin/sh
# 循環
for((host=1; host<=3; host++)); do echo ==================k8s-node$host================== ssh root@k8s-node$host "source /etc/profile;/opt/module/apache-zookeeper-3.5.7-bin/bin/zkServer.sh status" done
綜合為一個
#! /bin/bash case $1 in "start"){ for host in node01 node02 node03; do ssh $host "/bigdata/apache-zookeeper-3.5.8-bin/bin/zkServer.sh start" done };; "stop"){ for host in node01 node02 node03; do ssh $host "/bigdata/apache-zookeeper-3.5.8-bin/bin/zkServer.sh stop" done };; "status"){ for host in node01 node02 node03; do ssh $host "/bigdata/apache-zookeeper-3.5.8-bin/bin/zkServer.sh status" done };; esac
mysql
Hadoop
配置HDFS
core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- 指定hdfs的nameservice名稱空間為ns --> <property> <name>fs.defaultFS</name> <value>hdfs://ns</value> </property> <!-- 指定hadoop臨時目錄,默認在/tmp/{$user}目錄下,不安全,每次開機都會被清空--> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/hdpdata/</value> <description>需要手動創建hdpdata目錄</description> </property> <!-- 指定zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>node01:2181,node02:2181,node03:2181</value> <description>zookeeper地址,多個用逗號隔開</description> </property> </configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- NameNode HA配置 --> <property> <name>dfs.nameservices</name> <value>ns</value> <description>指定hdfs的nameservice為ns,需要和core-site.xml中的保持一致</description> </property> <property> <name>dfs.ha.namenodes.ns</name> <value>nn1,nn2</value> <description>ns命名空間下有兩個NameNode,邏輯代號,隨便起名字,分別是nn1,nn2</description> </property> <property> <name>dfs.namenode.rpc-address.ns.nn1</name> <value>node01:9000</value> <description>nn1的RPC通信地址</description> </property> <property> <name>dfs.namenode.http-address.ns.nn1</name> <value>node01:50070</value> <description>nn1的http通信地址</description> </property> <property> <name>dfs.namenode.rpc-address.ns.nn2</name> <value>node02:9000</value> <description>nn2的RPC通信地址</description> </property> <property> <name>dfs.namenode.http-address.ns.nn2</name> <value>node02:50070</value> <description>nn2的http通信地址</description> </property> <!--JournalNode配置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node01:8485;node02:8485;node03:8485/ns</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/usr/local/hadoop/journaldata</value> <description>指定JournalNode在本地磁盤存放數據的位置</description> </property> <!--namenode高可用主備切換配置 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> <description>開啟NameNode失敗自動切換</description> </property> <property> <name>dfs.client.failover.proxy.provider.ns</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> <description>配置失敗自動切換實現方式,使用內置的zkfc</description> </property> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> <description>配置隔離機制,多個機制用換行分割,先執行sshfence,執行失敗后執行shell(/bin/true),/bin/true會直接返回0表示成功</description> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> <description>使用sshfence隔離機制時需要ssh免登陸</description> </property> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> <description>配置sshfence隔離機制超時時間</description> </property> <!--dfs文件屬性設置--> <property> <name>dfs.replication</name> <value>3</value> <description>默認block副本數為3,測試環境這里設置為1,注意生產環境一定要設置3個副本以上</description> </property> <property> <name>dfs.block.size</name> <value>134217728</value> <description>設置block大小是128M</description> </property> </configuration>
配置YARN
mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> <description>指定mr框架為yarn方式 </description> </property> <!-- 歷史日志服務jobhistory相關配置 --> <property> <name>mapreduce.jobhistory.address</name> <value>node02:10020</value> <description>歷史服務器端口號</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>node02:19888</value> <description>歷史服務器的WEB UI端口號</description> </property> </configuration>
yarn-site.xml
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- 開啟RM高可用 --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- 指定RM的cluster id,一組高可用的rm共同的邏輯id --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarn-ha</value> </property> <!-- 指定RM的名字,可以隨便自定義 --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!-- 分別指定RM的地址 --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>node01</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>${yarn.resourcemanager.hostname.rm1}:8088</value> <description>HTTP訪問的端口號</description> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>node02</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>${yarn.resourcemanager.hostname.rm2}:8088</value> </property> <!-- 指定zookeeper集群地址 --> <property> <name>yarn.resourcemanager.zk-address</name> <value>node01:2181,node02:2181,node03:2181</value> </property> <!--NodeManager上運行的附屬服務,需配置成mapreduce_shuffle,才可運行MapReduce程序--> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 開啟日志聚合 --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- 日志聚合HDFS目錄 --> <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/data/hadoop/yarn-logs</value> </property> <!-- 日志保存時間3days,單位秒 --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>259200</value> </property> </configuration>
在/usr/local/hadoop路徑下創建hdpdata文件夾
cd /usr/local/hadoop mkdir hdpdata
修改/usr/local/hadoop/etc/hadoop下的slaves文件
設置datanode和nodemanager啟動節點主機名稱
在slaves文件中添加節點的主機名稱
node02 node03
將hadoop文件夾復制到各個節點
集群啟動
(注意嚴格按照順序啟動)
啟動journalnode(分別在node01、node02、node03上執行啟動)
/usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode
運行jps命令檢驗,node01、node02、node03上多了JournalNode進程
格式化HDFS
在node01上執行命令:
hdfs namenode -format
格式化成功之后會在core-site.xml中的hadoop.tmp.dir指定的路徑下生成dfs文件夾,將該文件夾拷貝到node02的相同路徑下
scp -r hdpdata root@node02:/usr/local/hadoop
在node01上執行格式化ZKFC操作
hdfs zkfc -formatZK
執行成功,日志輸出如下信息
INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/ns in ZK
在node01上啟動HDFS
sbin/start-dfs.sh
在node02上啟動YARN
sbin/start-yarn.sh
在node01單獨啟動一個ResourceManger作為備份節點
sbin/yarn-daemon.sh start resourcemanager
在node02上啟動JobHistoryServer
sbin/mr-jobhistory-daemon.sh start historyserver
啟動完成node02會增加一個JobHistoryServer進程
hadoop安裝啟動完成
HDFS HTTP訪問地址
NameNode (active):http://node01:50070
NameNode (standby):http://node02:50070
ResourceManager HTTP訪問地址
ResourceManager :http://node02:8088
歷史日志HTTP訪問地址
JobHistoryServer:http:/node02:19888
集群驗證
驗證HDFS 是否正常工作及HA高可用首先向hdfs上傳一個文件
hadoop fs -put /usr/local/hadoop/README.txt /
在active節點手動關閉active的namenode
sbin/hadoop-daemon.sh stop namenode
通過HTTP 50070端口查看standby namenode的狀態是否轉換為active
手動啟動上一步關閉的namenode
sbin/hadoop-daemon.sh start namenode
驗證ResourceManager HA高可用
手動關閉node02的ResourceManager
sbin/yarn-daemon.sh stop resourcemanager
通過HTTP 8088端口訪問node01的ResourceManager查看狀態
手動啟動node02 的ResourceManager
sbin/yarn-daemon.sh start resourcemanager
安裝腳本
#! /bin/bash
tar -zxvf /bigdata/downloads/hadoop-2.8.5.tar.gz -C /bigdata
\cp /bigdata/downloads/yarn-site.xml /usr/local/hadoop/etc/hadoop/
\cp /bigdata/downloads/mapred-site.xml /usr/local/hadoop/etc/hadoop/
\cp /bigdata/downloads/hdfs-site.xml /usr/local/hadoop/etc/hadoop/
\cp /bigdata/downloads/core-site.xml /usr/local/hadoop/etc/hadoop/
cat /dev/null > /usr/local/hadoop/etc/hadoop/slaves"
echo "node02" >> /usr/local/hadoop/etc/hadoop/slaves'
echo "node03" >> /usr/local/hadoop/etc/hadoop/slaves'
xsync /bigdata/hadoop-2.8.5
# 追加環境變量
echo 'export HADOOP_HOME=/usr/local/hadoop' >> /etc/profile
echo 'export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop' >> /etc/profile
echo 'export YARN_HOME=$HADOOP_HOME' >> /etc/profile
echo 'export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop' >> /etc/profile
echo 'export PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin' >> /etc/profile
xsync /etc/profile
# 循環
i=0
for host in node01 node02 node03; do
echo ==================node$host==================
# 建立軟連接
#ssh $host "ln -s /bigdata/hadoop-2.8.5 /usr/local/hadoop"
# 環境變量生效
ssh $host "source /etc/profile"
done
格式化,初次啟動集群
#! /bin/bash
for host in node01 node02 node03; do
echo ==================node$host==================
# 啟動journalnode
ssh $host "/usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode"
done
/usr/local/hadoop/bin/hdfs namenode -format
scp -r /usr/local/hadoop/hdpdata root@node02:/usr/local/hadoop
/usr/local/hadoop/bin/hdfs zkfc -formatZK
/usr/local/hadoop/sbin/start-dfs.sh
ssh node02 "/usr/local/hadoop/sbin/start-yarn.sh"
/usr/local/hadoop/sbin/yarn-daemon.sh start resourcemanager
ssh node02 "/usr/local/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver"
Hive
這里筆者的MySql使用的是docker,在hvie-site.xml根據主機實際情況配置即可
1.創建HDFS數據倉庫目錄
hadoop fs -mkdir -p /user/hive/warehouse
2.為所有用戶添加數據倉庫目錄的寫權限
hadoop fs -chmod a+w /user/hive/warehouse
3.開放HDFS 中tmp臨時目錄的權限
hadoop fs -chmod -R 777 /tmp
5.將Hive安裝包解壓到/bigdata/安裝目錄
tar -zxvf apache-hive-1.2.2-bin.tar.gz -C /bigdata
6.創建軟鏈接
ln -s /bigdata/apache-hive-1.2.2-bin /usr/local/hive
7.設置環境變量
vim /etc/profile
添加如下內容:
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$PATH:${HIVE_HOME}/bin
8.重新編譯使環境變量生效
source /etc/profile
9.hive-site.xml配置文件上傳到hive/conf目錄中,添加用於存儲元數據的MySQL數據庫配置信息
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://192.168.10.100:3307/hive?createDatabaseIfNotExist=true&useSSL=false</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive1234</value> </property> </configuration>
10.將mysql驅動jar文件拷貝到${HIVE_HOME}/lib目錄下
11.登錄MySQL創建用戶hive
登錄MySQL:mysql -u root -p
創建用戶:create user 'hive'@'%' identified by 'hive1234';
查詢用戶表確定用戶創建成功:select user,host from mysql.user;
為用戶授權:grant all privileges on *.* to 'hive'@'%';
刷新權限:flush privileges;
12.啟動hive
/usr/local/hive/bin/hive
腳本
mysql已經配置好
hiveInstall.sh
#! /bin/bash
hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -chmod a+w /user/hive/warehouse
hadoop fs -chmod -R 777 /tmp
tar -zxvf /bigdata/apache-hive-2.3.6-bin.tar.gz -C /bigdata
ln -s /bigdata/apache-hive-2.3.6-bin /usr/local/hive
echo 'export HIVE_HOME=/usr/local/hive' >> /etc/profile
echo 'export PATH=$PATH:$PATH:${HIVE_HOME}/bin' >> /etc/profile
source /etc/profile
\cp /bigdata/downloads/hive-site.xml /usr/local/hive/conf/
\cp /bigdata/downloads/mysql-connector-java-5.1.47.jar /usr/local/hive/lib
如果腳本中設置了環境變量,執行腳本的時候用source或 .
. hiveInstall.sh
或
source hiveInstall.sh
否則使用
./hiveInstall.sh
會通過子shell執行
則里面的source /etc/profile只在子shell中生效,執行完腳本退出子shell,回到當前shell,環境變量沒有生效
初始化hive,在mysql中生成相關數據
schematool -dbType mysql -initSchema
啟動hive
/usr/local/hive/bin/hive
https://www.cnblogs.com/aidata/p/11571111.html#_label3
Hbase
conf目錄下:
配置hbase-env.sh
設置jdk路徑:export JAVA_HOME=/usr/local/jdk
啟用外部zookeeper:export HBASE_MANAGES_ZK=false
配置hbase-site.xml
<configuration> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/usr/local/zookeeper/data</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://node02:9000/user/hbase</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>node01:2181,node02:2181,node03:2181</value> </property> </configuration>
配置regionservers
node02
node03
新建文件backup-masters
node02
進入lib下,拷貝client-facing-thirdparty下的jar包到lib目錄:
cp client-facing-thirdparty/htrace-core-3.1.0-incubating.jar
安裝腳本
#! /bin/bash
tar -zxvf /bigdata/downloads/hbase-2.1.8-bin.tar.gz -C /bigdata
# 循環
for host in node01 node02 node03; do
echo ==================node$host==================
# 建立軟連接
ssh $host "ln -s /bigdata/hbase-2.1.8 /usr/local/hbase"
done
# 覆蓋配置文件
\cp /bigdata/downloads/hbase-site.xml /usr/local/hbase/conf
# 配置regionservers
cat /dev/null > /usr/local/hbase/conf/regionservers
echo "node02" >> /usr/local/hbase/conf/regionservers
echo "node03" >> /usr/local/hbase/conf/regionservers
# 創建backup-masters
touch /usr/local/hbase/conf/backup-masters
echo "node02" >> /usr/local/hbase/conf/backup-masters
\cp /usr/local/hbase/lib/client-facing-thirdparty/htrace-core-3.1.0-incubating.jar /usr/local/hbase/lib
xsync /bigdata/hbase-2.1.8-bin
啟動
bin目錄下
./start-hbase.sh
./hbase shell
Kafka
1.集群規划
使用3台機器部署,分別是node01、node02、node03
2.下載Kafka安裝包
下載地址http://kafka.apache.org/downloads,選擇Kafka版本kafka_2.11-0.10.2.1.tgz
3.安裝kafka
將安裝包上傳到其中一台機器node01上,並解壓到/bigdata目錄下
tar -zxvf kafka_2.11-0.10.2.1.tgz
創建軟連接
ln -s /bigdata/kafka_2.11-0.10.2.1 /usr/local/kafka
4.添加到環境變量:vim /etc/profile
添加內容
export KAFKA_HOME=/usr/local/kafka
export PATH=$PATH:${KAFKA_HOME}/bin
刷新環境變量:source /etc/profile
5.修改配置文件
cd /usr/local/kafka/config
vim server.properties
6.在/usr/local/kafka中創建kafka-logs文件夾
mkdir /usr/local/kafka/kafka-logs
7.使用scp將配置好的kafka安裝包拷貝到node02和node03兩個節點
scp -r /bigdata/kafka_2.11-0.10.2.1 root@node02:/bigdata/
scp -r /bigdata/kafka_2.11-0.10.2.1 root@node03:/bigdata/
8.分別修改node02和node03的配置文件server.properties 具體文件在下面
8.1 node02的server.properties修改項
broker.id=1
host.name=node02
8.2 node03的server.properties修改項
broker.id=2
host.name=node03
9.分別在node01、node02、node03啟動kafka
cd /usr/local/kafka
啟動的時候使用-daemon選項,則kafka將以守護進程的方式啟動
bin/kafka-server-start.sh -daemon config/server.properties
10.日志目錄
默認在kafka安裝路徑生成的logs文件夾中
server.properties
############################# Server Basics #############################
#每個borker的id是唯一的,多個broker要設置不同的id
broker.id=0
#訪問端口號
port=9092
#訪問地址
host.name=node01
#允許刪除topic
delete.topic.enable=true
# The number of threads handling network requests
num.network.threads=3
# The number of threads doing disk I/O
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
#存儲數據路徑,默認是在/tmp目錄下,需要修改
log.dirs=/usr/local/kafka/kafka-logs
#創建topic默認分區數
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
#數據保存時間,默認7天,單位小時
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
#zookeeper地址,多個地址用逗號隔開
zookeeper.connect=node01:2181,node02:2181,node03:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
如果想要內網中連接kafka集群,如windows中IDEA操作虛擬機中的Kafka,添加配置
listeners=PLAINTEXT://192.168.10.108:9092
advertised.listeners=PLAINTEXT://192.168.10.108:9092
如果是公網則需進一步設置
listeners 是kafka真正bind的地址
advertised.listeners 是暴露給外部的listeners,如果沒有設置,會用listeners,將Broker的Listener信息發布到Zookeeper中
分別在三個節點啟動kafka
bin/kafka-server-start.sh -daemon config/server.properties
創建主題
bin/kafka-topics.sh --create --zookeeper node01:2181 --topic topic1 --replication-factor 2 --partitions 2
查看主題信息
bin/kafka-topics.sh --describe --zookeeper node01:2181 --topic topic1
查看kafka中已經創建的主題列表
bin/kafka-topics.sh --list --zookeeper node01:2181
刪除topic:
bin/kafka-topics.sh --delete --zookeeper node01:2181 --topic topic1
增加分區
bin/kafka-topics.sh --alter --zookeeper node01:2181 --topic topic1 --partitions 3
生產端
bin/kafka-console-producer.sh --broker-list node01:9092,node02:9092,node03:9092 --topic topic1
消費端
bin/kafka-console-consumer.sh --bootstrap-server node01:9092 --from-beginning --topic topic1
安裝腳本
#! /bin/bash
tar -zxvf /bigdata/downloads/kafka_2.12-2.2.1.tgz -C /bigdata
# 循環
for host in node01 node02 node03; do
echo ==================node$host==================
# 建立軟連接
ssh $host "ln -s /bigdata/kafka_2.12-2.2.1 /usr/local/kafka"
ssh $host 'echo "export KAFKA_HOME=/usr/local/kafka" >> /etc/profile'
ssh $host "echo 'export PATH=\$PATH:\${KAFKA_HOME}/bin' >> /etc/profile"
#ssh $host 'source /etc/profile' # 無效
done
## 覆蓋配置文件
\cp /bigdata/downloads/server.properties /usr/local/kafka/config
#
mkdir -p /usr/local/kafka/kafka-logs
#xsync /bigdata/kafka_2.12-2.2.1
## 循環
m=0
for host in node01 node02 node03; do
echo ==================node$host==================
ssh $host "sed -i s#^broker.id=.*#broker.id="$m"# /usr/local/kafka/config/server.properties"
ssh $host "sed -i s#^host.name=.*#host.name=node0"`expr $m + 1`"# /usr/local/kafka/config/server.properties"
let 'm+=1'
done
Flume
下載
解壓
flume-env.sh
export JAVA_HOME=/usr/local/jdk
Sqoop
Spark
- 在所有節點上下載或上傳spark文件,解壓縮安裝,建立軟連接
- 配置所有節點spark安裝目錄下的spark-evn.sh文件
- 配置slaves
- 配置spark-default.conf
- 配置所有節點的環境變量
spark-evn.sh
[root@node01 conf]# mv spark-env.sh.template spark-env.sh [root@node01 conf]# vi spark-env.sh
加入
export JAVA_HOME=/usr/local/jdk #export SCALA_HOME=/software/scala-2.11.8 export HADOOP_HOME=/usr/local/hadoop export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop #Spark歷史服務分配的內存尺寸 #export SPARK_DAEMON_MEMORY=512m #下面的這一項就是Spark的高可用配置,如果是配置master的高可用,master就必須有;如果是slave的高可用,slave就必須有;但是建議都配置。 export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=node01:2181,node02:2181,node03:2181 -Dspark.deploy.zookeeper.dir=/spark" #當啟用了Spark的高可用之后,下面的這一項應該被注釋掉(即不能再被啟用,后面通過提交應用時使用--master參數指定高可用集群節點) #export SPARK_MASTER_IP=master01 #export SPARK_WORKER_MEMORY=1500m #export SPARK_EXECUTOR_MEMORY=100m
-Dspark.deploy.recoveryMode=ZOOKEEPER #說明整個集群狀態是通過zookeeper來維護的,整個集群狀態的恢復也是通過zookeeper來維護的。就是說用zookeeper做了spark的HA配置,Master(Active)掛掉的話,Master(standby)要想變成Master(Active)的話,Master(Standby)就要像zookeeper讀取整個集群狀態信息,然后進行恢復所有Worker和Driver的狀態信息,和所有的Application狀態信息;
-Dspark.deploy.zookeeper.url=potter2:2181,potter3:2181,potter4:2181,potter5:2181#將所有配置了zookeeper,並且在這台機器上有可能做master(Active)的機器都配置進來;(我用了4台,就配置了4台)
-Dspark.deploy.zookeeper.dir=/spark
-Dspark.deploy.zookeeper.dir是保存spark的元數據,保存了spark的作業運行狀態;
zookeeper會保存spark集群的所有的狀態信息,包括所有的Workers信息,所有的Applactions信息,所有的Driver信息,如果集群
slaves
[root@node03 conf]# mv slaves.template slaves [root@node03 conf]# vi slaves
將localhost刪掉,三個節點都加進去
node01 node02 node03
配置環境變量
vi /etc/profile
添加
export SPARK_HOME=/usr/local/spark export PATH=$PATH:$SPARK_HOME/bin
source /etc/profile
配置spark-default.conf
spark默認本地模式
修改下面一項:
spark.master spark://node01:7077,node02:7077,node03:7077
以上工作是在所有節點都要進行的
啟動
zookeeper啟動
hadoop啟動
在一個節點上
/usr/local/spark/sbin/start-all.sh
在另外兩個節點上單獨啟動master,實現高可用
/usr/local/spark/sbin/start-master.sh
spark-shell命令可以啟動shell
web界面
node01:8080
node02:8080
node03:8080
如果8080被占用,spark默認會加1
安裝腳本
#! /bin/bash
tar -zxvf /bigdata/downloads/spark-2.4.6-bin-hadoop2.7.tgz -C /bigdata
# 循環
for host in node01 node02 node03; do
echo ==================node$host==================
# 建立軟連接
ssh $host "ln -s /bigdata/spark-2.4.6-bin-hadoop2.7 /usr/local/spark"
ssh $host "echo 'export SPARK_HOME=/usr/local/spark' >> /etc/profile"
ssh $host "echo 'export PATH=\$PATH:\$SPARK_HOME/bin' >> /etc/profile"
done
mv /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh
echo "export JAVA_HOME=/usr/local/jdk" >> /usr/local/spark/conf/spark-env.sh
echo "export HADOOP_HOME=/usr/local/hadoop" >> /usr/local/spark/conf/spark-env.sh
echo "export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop" >> /usr/local/spark/conf/spark-env.sh
echo 'export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=node01:2181,node02:2181,node03:2181 -Dspark.deploy.zookeeper.dir=/spark"
' >> /usr/local/spark/conf/spark-env.sh
mv /usr/local/spark/conf/slaves.template /usr/local/spark/conf/slaves
cat /dev/null > /usr/local/spark/conf/slaves
echo "node01" >> /usr/local/spark/conf/slaves
echo "node02" >> /usr/local/spark/conf/slaves
echo "node03" >> /usr/local/spark/conf/slaves
mv /usr/local/spark/conf/spark-defaults.conf.template /usr/local/spark/conf/spark-defaults.conf
echo "spark.master spark://node01:7077,node02:7077,node03:7077" >> /usr/local/spark/conf/spark-defaults.conf
xsync /bigdata/spark-2.4.6-bin-hadoop2.7
https://www.cnblogs.com/aidata/p/11453991.html#_label0
Flink
下載 https://flink.apache.org/downloads.html
flink-1.10.1-bin-scala_2.12
flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
解壓縮
[root@node01 software]# tar -zxvf flink-1.10.1-bin-scala_2.12.tgz -C /bigdata/application/
配置環境變量,建立軟連接
ln -s /bigdata/flink-1.10.1 /usr/local/flink
將官網hadoop的jar包 flink-shaded-hadoop-2-uber-2.8.3-10.0.jar 放入lib目錄下
編輯flink-conf.yaml
jobmanager.rpc.address:值設置成你master節點的IP地址
taskmanager.heap.mb:每個TaskManager可用的總內存
taskmanager.numberOfTaskSlots:每台機器上可用CPU的總數
parallelism.default:每個Job運行時默認的並行度
taskmanager.tmp.dirs:臨時目錄
jobmanager.heap.mb:每個節點的JVM能夠分配的最大內存
jobmanager.rpc.port: 6123
jobmanager.web.port: 8081
################################################################################ # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. ################################################################################ #============================================================================== # Common #============================================================================== # The external address of the host on which the JobManager runs and can be # reached by the TaskManagers and any clients which want to connect. This setting # is only used in Standalone mode and may be overwritten on the JobManager side # by specifying the --host <hostname> parameter of the bin/jobmanager.sh executable. # In high availability mode, if you use the bin/start-cluster.sh script and setup # the conf/masters file, this will be taken care of automatically. Yarn/Mesos # automatically configure the host name based on the hostname of the node where the # JobManager runs. jobmanager.rpc.address: node03 # The RPC port where the JobManager is reachable. jobmanager.rpc.port: 6123 # The heap size for the JobManager JVM jobmanager.heap.size: 1024m # The heap size for the TaskManager JVM taskmanager.heap.size: 1024m # The number of task slots that each TaskManager offers. Each slot runs one parallel pipeline. taskmanager.numberOfTaskSlots: 2 # The parallelism used for programs that did not specify and other parallelism. parallelism.default: 2 # The default file system scheme and authority. # # By default file paths without scheme are interpreted relative to the local # root file system 'file:///'. Use this to override the default and interpret # relative paths relative to a different file system, # for example 'hdfs://mynamenode:12345' # fs.default-scheme: hdfs://ns/ #============================================================================== # High Availability #============================================================================== # The high-availability mode. Possible options are 'NONE' or 'zookeeper'. # high-availability: zookeeper # The path where metadata for master recovery is persisted. While ZooKeeper stores # the small ground truth for checkpoint and leader election, this location stores # the larger objects, like persisted dataflow graphs. # # Must be a durable file system that is accessible from all nodes # (like HDFS, S3, Ceph, nfs, ...) # high-availability.storageDir: hdfs://ns/flink/ha/ # The list of ZooKeeper quorum peers that coordinate the high-availability # setup. This must be a list of the form: # "host1:clientPort,host2:clientPort,..." (default clientPort: 2181) # high-availability.zookeeper.quorum: node01:2181,node02:2181,node03:2181 high-availability.zookeeper.path.root: /flink # ACL options are based on https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_BuiltinACLSchemes # It can be either "creator" (ZOO_CREATE_ALL_ACL) or "open" (ZOO_OPEN_ACL_UNSAFE) # The default value is "open" and it can be changed to "creator" if ZK security is enabled # # high-availability.zookeeper.client.acl: open #============================================================================== # Fault tolerance and checkpointing #============================================================================== # The backend that will be used to store operator state checkpoints if # checkpointing is enabled. # # Supported backends are 'jobmanager', 'filesystem', 'rocksdb', or the # <class-name-of-factory>. # state.backend: filesystem # Directory for checkpoints filesystem, when using any of the default bundled # state backends. # state.checkpoints.dir: hdfs://ns/flink-checkpoints # Default target directory for savepoints, optional. # state.savepoints.dir: hdfs://ns/flink-checkpoints # Flag to enable/disable incremental checkpoints for backends that # support incremental checkpoints (like the RocksDB state backend). # # state.backend.incremental: false #============================================================================== # Rest & web frontend #============================================================================== # The port to which the REST client connects to. If rest.bind-port has # not been specified, then the server will bind to this port as well. # rest.port: 8081 # The address to which the REST client will connect to # #rest.address: 0.0.0.0 # Port range for the REST and web server to bind to. # #rest.bind-port: 8080-8090 # The address that the REST & web server binds to # #rest.bind-address: 0.0.0.0 # Flag to specify whether job submission is enabled from the web-based # runtime monitor. Uncomment to disable. web.submit.enable: true #============================================================================== # Advanced #============================================================================== # Override the directories for temporary files. If not specified, the # system-specific Java temporary directory (java.io.tmpdir property) is taken. # # For framework setups on Yarn or Mesos, Flink will automatically pick up the # containers' temp directories without any need for configuration. # # Add a delimited list for multiple directories, using the system directory # delimiter (colon ':' on unix) or a comma, e.g.: # /data1/tmp:/data2/tmp:/data3/tmp # # Note: Each directory entry is read from and written to by a different I/O # thread. You can include the same directory multiple times in order to create # multiple I/O threads against that directory. This is for example relevant for # high-throughput RAIDs. # # io.tmp.dirs: /tmp # Specify whether TaskManager's managed memory should be allocated when starting # up (true) or when memory is requested. # # We recommend to set this value to 'true' only in setups for pure batch # processing (DataSet API). Streaming setups currently do not use the TaskManager's # managed memory: The 'rocksdb' state backend uses RocksDB's own memory management, # while the 'memory' and 'filesystem' backends explicitly keep data as objects # to save on serialization cost. # # taskmanager.memory.preallocate: false # The classloading resolve order. Possible values are 'child-first' (Flink's default) # and 'parent-first' (Java's default). # # Child first classloading allows users to use different dependency/library # versions in their application than those in the classpath. Switching back # to 'parent-first' may help with debugging dependency issues. # # classloader.resolve-order: child-first # The amount of memory going to the network stack. These numbers usually need # no tuning. Adjusting them may be necessary in case of an "Insufficient number # of network buffers" error. The default min is 64MB, the default max is 1GB. # # taskmanager.network.memory.fraction: 0.1 # taskmanager.network.memory.min: 64mb # taskmanager.network.memory.max: 1gb #============================================================================== # Flink Cluster Security Configuration #============================================================================== # Kerberos authentication for various components - Hadoop, ZooKeeper, and connectors - # may be enabled in four steps: # 1. configure the local krb5.conf file # 2. provide Kerberos credentials (either a keytab or a ticket cache w/ kinit) # 3. make the credentials available to various JAAS login contexts # 4. configure the connector to use JAAS/SASL # The below configure how Kerberos credentials are provided. A keytab will be used instead of # a ticket cache if the keytab path and principal are set. # security.kerberos.login.use-ticket-cache: true # security.kerberos.login.keytab: /path/to/kerberos/keytab # security.kerberos.login.principal: flink-user # The configuration below defines which JAAS login contexts # security.kerberos.login.contexts: Client,KafkaClient #============================================================================== # ZK Security Configuration #============================================================================== # Below configurations are applicable if ZK ensemble is configured for security # Override below configuration to provide custom ZK service name if configured # zookeeper.sasl.service-name: zookeeper # The configuration below must match one of the values set in "security.kerberos.login.contexts" # zookeeper.sasl.login-context-name: Client #============================================================================== # HistoryServer #============================================================================== # The HistoryServer is started and stopped via bin/historyserver.sh (start|stop) # Directory to upload completed jobs to. Add this directory to the list of # monitored directories of the HistoryServer as well (see below). #jobmanager.archive.fs.dir: hdfs:///completed-jobs/ # The address under which the web-based HistoryServer listens. #historyserver.web.address: 0.0.0.0 # The port under which the web-based HistoryServer listens. historyserver.web.port: 8082 # Comma separated list of directories to monitor for completed jobs. #historyserver.archive.fs.dir: hdfs:///completed-jobs/ # Interval in milliseconds for refreshing the monitored directories. #historyserver.archive.fs.refresh-interval: 10000 yarn.application-attempts: 10
編輯master文件
node03:8086 node01:8086
編輯slaves文件
node01 node02 node03
編輯zoo.cfg文件
################################################################################ # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. ################################################################################ # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial synchronization phase can take initLimit=10 # The number of ticks that can pass between sending a request and getting an acknowledgement syncLimit=5 # The directory where the snapshot is stored. # dataDir=/tmp/zookeeper # The port at which the clients will connect clientPort=2181 # ZooKeeper quorum peers server.1=node01:2888:3888 server.2=node02:2888:3888 server.3=node03:2888:3888 # server.2=host:peer-port:leader-port
將配置好的flink目錄復制到各個節點,配置環境變量,軟連接
啟動
bin下通過 start-cluster.sh 啟動
訪問node03:8086
安裝腳本
#! /bin/bash
tar -zxvf /bigdata/downloads/flink-1.10.1-bin-scala_2.12.tgz -C /bigdata
# 循環
for host in node01 node02 node03; do
echo ==================node$host==================
# 建立軟連接
ssh $host "ln -s /bigdata/flink-1.10.1 /usr/local/flink"
ssh $host "echo 'export FLINK_HOME=/usr/local/flink' >> /etc/profile"
ssh $host "echo 'export PATH=\$PATH:\$FLINK_HOME/bin' >> /etc/profile"
done
# 復制jar包
\cp /bigdata/downloads/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar /usr/local/flink/lib
\cp /bigdata/downloads/flink-conf.yaml /usr/local/flink/conf
# 編輯masters和slaves
cat /dev/null > /usr/local/flink/conf/masters
cat /dev/null > /usr/local/flink/conf/slaves
echo "node01" >> /usr/local/flink/conf/slaves
echo "node02" >> /usr/local/flink/conf/slaves
echo "node03" >> /usr/local/flink/conf/slaves
echo "node03:8086" >> /usr/local/flink/conf/masters
echo "node01:8086" >> /usr/local/flink/conf/masters
\cp /bigdata/downloads/zoo.cfg /usr/local/flink/conf
xsync /bigdata/flink-1.10.1
ClickHouse
Rpm包下載 http://repo.red-soft.biz/repos/clickhouse/stable/el7/
下載到了downloads目錄下了
# 可能用到的相關依賴
rpm -ivh downloads/libtool-ltdl-2.4.2-21.el7_2.x86_64.rpm rpm -ivh downloads/unixODBC-2.3.1-11.el7.x86_64.rpm yum install libicu.x86_64
rpm -ivh downloads/clickhouse-server-common-1.1.54236-4.el7.x86_64.rpm rpm -ivh downloads/clickhouse-server-1.1.54236-4.el7.x86_64.rpm #安裝server rpm -ivh downloads/clickhouse-server-1.1.54236-4.el7.x86_64.rpm rpm -ivh downloads/clickhouse-debuginfo-1.1.54236-4.el7.x86_64.rpm rpm -ivh downloads/clickhouse-client-1.1.54236-4.el7.x86_64.rpm rpm -ivh downloads/clickhouse-compressor-1.1.54236-4.el7.x86_64.rpm #clickhouse-server配置文件目錄 cd /etc/clickhouse-server/ config.xml配置相應的IP地址(《listen host》)
允許遠程連接
<!-- Listen specified host. use :: (wildcard IPv6 address), if you want to accept connections both with IPv4 and IPv6 from everywhere. -->
<!-- <listen_host>::</listen_host> -->
<listen_host>0.0.0.0</listen_host>
可修改端口
<tcp_port>9006</tcp_port>
users.xml(配置相應的IP地址)(<networks><ip>)
允許所有連接
<networks incl="networks" replace="replace"> <ip>::/0</ip> </networks>
啟動服務
clickhouse-server --config-file=/etc/clickhouse-server/config.xml
client連接
clickhouse-client --host=192.168.10.108 --port=9006
簡單操作
show tables;
select 1;
關閉ClickHouse服務
ps -aux|grep clickhouse-server
后台托管啟動服務
nohup clickhouse-server --config-file=/etc/clickhouse-server/config.xml >null 2>&1 &