使用騰訊雲主機,docker構建集群測試環境。
環境
1、操作系統: CentOS 7.2 64位
網路設置
hostname | IP |
---|---|
cluster-master | 172.18.0.2 |
cluster-slave1 | 172.18.0.3 |
cluster-slave2 | 172.18.0.4 |
cluster-slave3 | 172.18.0.5 |
Docker 安裝
curl -sSL https://get.daocloud.io/docker | sh
##換源
###這里可以參考這篇文章http://www.jianshu.com/p/34d3b4568059
curl -sSL https://get.daocloud.io/daotools/set_mirror.sh | sh -s http://67e93489.m.daocloud.io
##開啟自啟動
systemctl enable docker
systemctl start docker
拉去Centos鏡像
docker pull daocloud.io/library/centos:latest
使用docker ps
查看下載的鏡像
創建容器
按照集群的架構,創建容器時需要設置固定IP,所以先要在docker使用如下命令創建固定IP的子網
docker network create --subnet=172.18.0.0/16 netgroup
docker的子網創建完成之后就可以創建固定IP的容器了
#cluster-master
#-p 設置docker映射到容器的端口 后續查看web管理頁面使用
docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-master -h cluster-master -p 18088:18088 -p 9870:9870 --net netgroup --ip 172.18.0.2 daocloud.io/library/centos /usr/sbin/init
#cluster-slaves
docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-slave1 -h cluster-slave1 --net netgroup --ip 172.18.0.3 daocloud.io/library/centos /usr/sbin/init
docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-slave2 -h cluster-slave2 --net netgroup --ip 172.18.0.4 daocloud.io/library/centos /usr/sbin/init
docker run -d --privileged -ti -v /sys/fs/cgroup:/sys/fs/cgroup --name cluster-slave3 -h cluster-slave3 --net netgroup --ip 172.18.0.5 daocloud.io/library/centos /usr/sbin/init
啟動控制台並進入docker
容器中:
docker exec -it cluster-master /bin/bash
安裝OpenSSH免密登錄
1、cluster-master
安裝:
#cluster-master需要修改配置文件(特殊)
#cluster-master
#安裝openssh
[root@cluster-master /]# yum -y install openssh openssh-server openssh-clients
[root@cluster-master /]# systemctl start sshd
####ssh自動接受新的公鑰
####master設置ssh登錄自動添加kown_hosts
[root@cluster-master /]# vi /etc/ssh/ssh_config
#將原來的StrictHostKeyChecking ask
#設置StrictHostKeyChecking為no
#保存
[root@cluster-master /]# systemctl restart sshd
2、分別對slaves安裝OpenSSH
#安裝openssh
[root@cluster-slave1 /]#yum -y install openssh openssh-server openssh-clients
[root@cluster-slave1 /]# systemctl start sshd
3、cluster-master公鑰分發
在master機上執行
ssh-keygen -t rsa
並一路回車,完成之后會生成~/.ssh目錄,目錄下有id_rsa(私鑰文件)和id_rsa.pub(公鑰文件),再將id_rsa.pub重定向到文件authorized_keys
ssh-keygen -t rsa
#一路回車
[root@cluster-master /]# cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
文件生成之后用scp將公鑰文件分發到集群slave主機
[root@cluster-master /]# ssh root@cluster-slave1 'mkdir ~/.ssh'
[root@cluster-master /]# scp ~/.ssh/authorized_keys root@cluster-slave1:~/.ssh
[root@cluster-master /]# ssh root@cluster-slave2 'mkdir ~/.ssh'
[root@cluster-master /]# scp ~/.ssh/authorized_keys root@cluster-slave2:~/.ssh
[root@cluster-master /]# ssh root@cluster-slave3 'mkdir ~/.ssh'
[root@cluster-master /]# scp ~/.ssh/authorized_keys root@cluster-slave3:~/.ssh
分發完成之后測試(ssh root@cluster-slave1)是否已經可以免輸入密碼登錄
Ansible安裝
[root@cluster-master /]# yum -y install epel-release
[root@cluster-master /]# yum -y install ansible
#這樣的話ansible會被安裝到/etc/ansible目錄下
此時我們再去編輯ansible的hosts文件
vi /etc/ansible/hosts
[cluster]
cluster-master
cluster-slave1
cluster-slave2
cluster-slave3
[master]
cluster-master
[slaves]
cluster-slave1
cluster-slave2
cluster-slave3
配置docker容器hosts
由於/etc/hosts文件在容器啟動時被重寫,直接修改內容在容器重啟后不能保留,為了讓容器在重啟之后獲取集群hosts,使用了一種啟動容器后重寫hosts的方法。
需要在~/.bashrc中追加以下指令
:>/etc/hosts
cat >>/etc/hosts<<EOF
127.0.0.1 localhost
172.18.0.2 cluster-master
172.18.0.3 cluster-slave1
172.18.0.4 cluster-slave2
172.18.0.5 cluster-slave3
EOF
source ~/.bashrc
使配置文件生效,可以看到/etc/hosts文件已經被改為需要的內容
[root@cluster-master ansible]# cat /etc/hosts
127.0.0.1 localhost
172.18.0.2 cluster-master
172.18.0.3 cluster-slave1
172.18.0.4 cluster-slave2
172.18.0.5 cluster-slave3
用ansible分發.bashrc至集群slave下
ansible cluster -m copy -a "src=~/.bashrc dest=~/"
軟件環境配置
下載JDK1.8並解壓縮至/opt
目錄下
下載hadoop3 到/opt
目錄下,解壓安裝包,並創建鏈接文件
tar -xzvf hadoop-3.2.0.tar.gz
ln -s hadoop-3.2.0 hadoop
配置java和hadoop環境變量
編輯 ~/.bashrc
文件
# hadoop
export HADOOP_HOME=/opt/hadoop-3.2.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
#java
export JAVA_HOME=/opt/jdk8
export PATH=$HADOOP_HOME/bin:$PATH
使文件生效:
source .bashrc
配置hadoop運行所需配置文件
cd $HADOOP_HOME/etc/hadoop/
1、修改core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<!-- file system properties -->
<property>
<name>fs.default.name</name>
<value>hdfs://cluster-master:9000</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>4320</value>
</property>
</configuration>
2、修改hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>staff</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
</configuration>
3、修改mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>cluster-master:9001</value>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>cluster-master:50030</value>
</property>
<property>
<name>mapreduce.jobhisotry.address</name>
<value>cluster-master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>cluster-master:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/jobhistory/done</value>
</property>
<property>
<name>mapreduce.intermediate-done-dir</name>
<value>/jobhisotry/done_intermediate</value>
</property>
<property>
<name>mapreduce.job.ubertask.enable</name>
<value>true</value>
</property>
</configuration>
4、yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>cluster-master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>cluster-master:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>cluster-master:18030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>cluster-master:18025</value>
</property> <property>
<name>yarn.resourcemanager.admin.address</name>
<value>cluster-master:18141</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>cluster-master:18088</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
</property>
<property>
<name>yarn.log-aggregation.retain-check-interval-seconds</name>
<value>86400</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>
</configuration>
打包hadoop 向slaves分發
tar -cvf hadoop-dis.tar hadoop hadoop-3.2.0
使用ansible-playbook分發.bashrc和hadoop-dis.tar至slave主機
---
- hosts: cluster
tasks:
- name: copy .bashrc to slaves
copy: src=~/.bashrc dest=~/
notify:
- exec source
- name: copy hadoop-dis.tar to slaves
unarchive: src=/opt/hadoop-dis.tar dest=/opt
handlers:
- name: exec source
shell: source ~/.bashrc
將以上yaml保存為hadoop-dis.yaml,並執行
ansible-playbook hadoop-dis.yaml
hadoop-dis.tar會自動解壓到slave主機的/opt目錄下
Hadoop 啟動
格式化namenode
hadoop namenode -format
如果看到storage format success等字樣,即可格式化成功
啟動集群
cd $HADOOP_HOME/sbin
start-all.sh
啟動后可使用jps命令查看是否啟動成功
注意:
在實踐中遇到節點slaves 上的datanode服務沒有啟動,查看slave上目錄結構發現
沒有生成配置文件中設置的文件夾,比如:core-site.xml中
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
hdfs-site.xml文件中:
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data</value>
</property>
手動到節點中生成這些文件夾,之后刪除master中這些文件夾和$HADOOP_HOME下的logs文件夾,之后重新格式化namenode
hadoop namenode -format
再次啟動集群服務:
start-all.sh
這時在到從節點觀察應該會看到節點服務
驗證服務
訪問
http://host:18088
http://host:9870
來查看服務是否啟動