Mapreduce基礎編程模型:將一個大任務拆分成一個個小任務,再進行匯總。
MapReduce是分兩個階段:map階段:拆;reduce階段:聚合。
hadoop環境安裝
安裝:
1、解壓 : tar -zxvf hadoop-2.4.1.tar.gz -C /root/training/
2、設置環境變量: vi ~/.bash_profile
HADOOP_HOME=/root/training/hadoop-2.7.3
export HADOOP_HOME
PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export PATH
生效環境變量: source ~/.bash_profile
第一節:Hadoop的目錄結構
第二節:Hadoop的本地模式
1、特點:不具備HDFS,只能測試MapReduce程序
2、修改hadoop-env.sh(echo $JAVA_HOME查出jdk安裝路徑:xx,將export JAVA_HOME=${JAVA_HOME}替換成export JAVA_HOME=xx)
修改第25行:export JAVA_HOME=/usr/java/jdk8u202-b08(行號可通過:esc后再set number來顯示)
3、演示Demo: $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar
命令:hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount ~/data/hadoop/input/test.txt ~/data/hadoop/output/wc
日志:19/09/16 10:45:00 INFO mapreduce.Job: map 100% reduce 100%
結果查看:
cd ~/data/hadoop/output/
ls
(前者是運行的結果集,后者是執行程序的狀態)
more part-r-00000
注意:MR有一個默認的排序規則
第三節:Hadoop的偽分布模式
1、特點:具備Hadoop的所有功能,在單機上模擬一個分布式的環境
(1)HDFS:主:NameNode,數據節點:DataNode
(2)Yarn:容器,運行MapReduce程序
主節點:ResourceManager
從節點:NodeManager
2、步驟:
(1)hdfs-site.xml
<!--配置HDFS的冗余度-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!--配置是否檢查權限-->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
(2)core-site.xml
<!--配置HDFS的NameNode-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.88.11:9000</value>
</property>
<!--配置DataNode保存數據的位置-->
<property>
<name>hadoop.tmp.dir</name>
<value>/root/training/hadoop-2.7.3/tmp</value>
</property>
(3) mapred-site.xml
<!--配置MR運行的框架-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
(4) yarn-site.xml
<!--配置ResourceManager的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>192.168.88.11</value>
</property>
<!--配置NodeManager執行任務的方式-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
(5) 格式化NameNode
hdfs namenode -format
日志:Storage directory /root/training/hadoop-2.7.3/tmp/dfs/name has been successfully formatted.
(6) 啟動:start-all.sh
(*) HDFS: 存儲數據
(*) Yarn:執行計算
(7) 訪問:(*)命令行
(*)Java API
(*)Web Console:
HDFS:http://192.168.88.11:50070
Yarn:http://192.168.88.11:8088
到這里已經能夠通過外部訪問了
web console無法通過http://ip:port訪問服務頁面問題排查
原文出自(https://blog.csdn.net/hanwenshan123/article/details/78717782)
問題1:hdfs-site.xml配置項
通過jps命令查看java進程的狀態,HADOOP相關的進程運行正常。(jps是jdk提供的一個查看當前java進程的小工具, 可以看做是JavaVirtual Machine Process Status Tool的縮寫)
[root@node4 ~]# jps
25059 SecondaryNameNode
25347 ResourceManager
25556 NodeManager
24805 DataNode
29269 Jps
24633 NameNode
通過netstat命令查看網絡端口服務情況,發現local address列給出的ip地址除了127.0.0.1就是0.0.0.0,這些本地有效的地址,是無法對外提供服務的,這才是問題的關鍵。
[root@node4 ~]# netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:43759 0.0.0.0:* LISTEN 24805/java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 24633/java
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 12782/sshd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 2325/master
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 24805/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 24805/java
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 24805/java
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN 24633/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 25059/java
tcp6 0 0 :::22 :::* LISTEN 12782/sshd
tcp6 0 0 127.0.0.1:8088 :::* LISTEN 25347/java
tcp6 0 0 ::1:25 :::* LISTEN 2325/master
tcp6 0 0 :::13562 :::* LISTEN 25556/java
tcp6 0 0 :::43451 :::* LISTEN 25556/java
tcp6 0 0 127.0.0.1:8030 :::* LISTEN 25347/java
tcp6 0 0 127.0.0.1:8031 :::* LISTEN 25347/java
tcp6 0 0 127.0.0.1:8032 :::* LISTEN 25347/java
tcp6 0 0 127.0.0.1:8033 :::* LISTEN 25347/java
tcp6 0 0 :::8040 :::* LISTEN 25556/java
tcp6 0 0 :::8042 :::* LISTEN 25556/java
修改HADOOP_HOME/etc/hadoop/hdfs-site.xml文件,加入
<property>
<name>dfs.namenode.http-address</name>
<value>node4:50070</value>
</property>
或者加入
<property>
<name>dfs.namenode.http-address</name>
<value>hdfs://192.168.88.11:50070</value>
</property>
再次用netstat -ntlp查看
[root@node4 ~]# netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:43759 0.0.0.0:* LISTEN 24805/java
tcp 0 0 10.60.8.28.50070 0.0.0.0:* LISTEN 24633/java
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 12782/sshd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 2325/master
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 24805/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 24805/java
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 24805/java
tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN 24633/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 25059/java
tcp6 0 0 :::22 :::* LISTEN 12782/sshd
tcp6 0 0 127.0.0.1:8088 :::* LISTEN 25347/java
tcp6 0 0 ::1:25 :::* LISTEN 2325/master
tcp6 0 0 :::13562 :::* LISTEN 25556/java
tcp6 0 0 :::43451 :::* LISTEN 25556/java
tcp6 0 0 127.0.0.1:8030 :::* LISTEN 25347/java
tcp6 0 0 127.0.0.1:8031 :::* LISTEN 25347/java
tcp6 0 0 127.0.0.1:8032 :::* LISTEN 25347/java
tcp6 0 0 127.0.0.1:8033 :::* LISTEN 25347/java
tcp6 0 0 :::8040 :::* LISTEN 25556/java
tcp6 0 0 :::8042 :::* LISTEN 25556/java
問題2:selinux
按照道理應該可以訪問50070端口了,但是仍然不行。再檢查selinux,發現狀態是enabled。
- 查看SELINUX的狀態
[root@node4 ~]# /usr/sbin/sestatus -v
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 28
Process contexts:
Current context: unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
Init context: system_u:system_r:init_t:s0
/usr/sbin/sshd system_u:system_r:sshd_t:s0-s0:c0.c1023
File contexts:
Controlling terminal: unconfined_u:object_r:user_devpts_t:s0
/etc/passwd system_u:object_r:passwd_file_t:s0
/etc/shadow system_u:object_r:shadow_t:s0
/bin/bash system_u:object_r:shell_exec_t:s0
/bin/login system_u:object_r:login_exec_t:s0
/bin/sh system_u:object_r:bin_t:s0 -> system_u:object_r:shell_exec_t:s0
/sbin/agetty system_u:object_r:getty_exec_t:s0
/sbin/init system_u:object_r:bin_t:s0 -> system_u:object_r:init_exec_t:s0
/usr/sbin/sshd system_u:object_r:sshd_exec_t:s0
編輯/etc/selinux/config文件SELINUX=enforcing修改成SELINUX=disable,重啟服務器。再試。修改后的selinux
[root@node4 ~]# /usr/sbin/sestatus -v
SELinux status: disabled
問題3:firewall(iptables端口開放)
關閉selinux之后,仍然無法訪問頁面,再查看iptables防火牆的設置
[root@node4 sbin]# firewall-cmd --state
running
[root@node4 sbin]# firewall-cmd --get-service
RH-Satellite-6 amanda-client amanda-k5-client bacula bacula-client bitcoin bitcoin-rpc bitcoin-testnet bitcoin-testnet-rpc ceph ceph-mon cfengine condor-collector ctdb dhcp dhcpv6 dhcpv6-client dns docker-registry dropbox-lansync elasticsearch freeipa-ldap freeipa-ldaps freeipa-replication freeipa-trust ftp ganglia- client ganglia-master high-availability http https imap imaps ipp ipp-client ipsec iscsi-target kadmin kerberos kibana klogin kpasswd kshell ldap ldaps libvirt libvirt-tls managesieve mdns mosh mountd ms-wbt mssql mysql nfs nrpe ntp openvpn ovirt-imageio ovirt-storageconsole ovirt-vmconsole pmcd pmproxy pmwebapi pmwebapis pop3 pop3s postgresql privoxy proxy-dhcp ptp pulseaudio puppetmaster quassel radius rpc-bind rsh rsyncd samba samba-client sane sip sips smtp smtp-submission smtps snmp snmptrap spideroak-lansync squid ssh synergy syslog syslog-tls telnet tftp tftp-client tinc tor-socks transmission-client vdsm vnc-server wbem-https xmpp-bosh xmpp-client xmpp-local xmpp-server
增加50070端口到允許,重啟防火牆服務
[root@node4 sbin]# firewall-cmd --zone=public --add-port=50070/tcp --permanent
success
[root@node4 sbin]# firewall-cmd --reload
success
處理結果
問題4.8088端口無法訪問yarn
修改yarn-site.xml文件,在<configuration></configuration>添加:
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>192.168.88.11:8088</value>
</property>
集群分布模式
1.將hadoop整個安裝目錄拷貝到其他兩台機器
scp -r /home/xxxx/hadoop XXX@hadoop02:/home/xxxx/
scp -r /home/xxxx/hadoop XXX@hadoop03:/home/xxxx/
2.修改主機上的slaves文件內容為從節點的主機名稱:
hadoop1
hadoop2
hadoop3