在Ubuntu18.04下配置hadoop集群


服務器准備

啟動hadoop最小集群的典型配置是3台服務器, 一台作為Master, NameNode, 兩台作為Slave, DataNode. 操作系統使用的Ubuntu18.04 Server, 安裝過程就省略了, 使用的是LVM文件系統, XFS文件格式, 為了避免浪費空間, 除了划分1G給/boot以外, 其他都划為/

服務器規划

192.168.1.148 vm148 -- 作為master, NameNode, ResourceManager
192.168.1.149 vm149 -- 作為slave, DataNode, NodeManager
192.168.1.150 vm150 -- 作為slave, DataNode, NodeManager

注意: 這里是第一個坑, 主機名里面不能帶下划線 _ , 會導致DataServer創建socket失敗無法啟動.

安裝后的升級

sudo apt update
sudo apt upgrade

添加普通用戶

用於運行hadoop的受限用戶, 我習慣用tomcat作為用戶名, 這里使用adduser而不是useradd, 因為后者不帶參數時, 有時候不會創建home目錄

sudo adduser tomcat
# 按提示輸入

.如果是虛機, 這時候就可以以當前狀態創建模板了.

設置hostname和hosts

# view current hostname
sudo hostnamectl status
# set
sudo hostnamectl set-hostname vm148

# add entries to hosts
sudo vi /etc/hosts
# add following lines
192.168.1.148  vm148
192.168.1.149  vm149
192.168.1.150  vm150

.依次將服務器設置為vm148, vm149, vm150. 重啟后登入檢查是否生效, 互相ping看看是否生效

對tomcat用戶互相添加免密登錄

# 生成id_rsa和id_rsa.pub
ssh-keygen 
cd .ssh/
# 創建 authorized_keys
mv id_rsa.pub authorized_keys
# 注意權限必須是600
chmod 600 authorized_keys 
# 將本服務器的私鑰改名為id_rsa_mine
mv id_rsa id_rsa_mine
# 修改config
vi config
# 添加如下內容
Host vm149
IdentityFile ~/.ssh/id_rsa_mine
User tomcat

Host vm150
IdentityFile ~/.ssh/id_rsa_mine
User tomcat

Host vm148
IdentityFile ~/.ssh/id_rsa_mine
User tomcat

# 如果是master機器, 還需要添加如下, 用於啟動Secondary name server
Host 0.0.0.0
IdentityFile ~/.ssh/id_rsa_mine
User tomcat

將各個服務器的authorized_keys的內容互相合並, 最后各服務器的authorized_keys文件都是一樣的.
在以上工作完成后, 從各個機器嘗試ssh tomcat@[主機名], 確保登錄沒問題, 也避免在啟動服務時提示發現新key是否接受

防火牆ufw

如果是初次設置, 建議關閉, 確保不會因為防火牆而導致服務啟動失敗, 可以等服務配置完成后, 再根據實際的端口, 打開並配置ufw

sudo ufw disable

配置JDK

將jdk解壓縮至/opt/jdk, 並創建latest軟鏈, 完成后結構如下

$ ll /opt/jdk/
total 0
drwxr-xr-x 7 root root 245 Oct  6 13:58 jdk1.8.0_192/
lrwxrwxrwx 1 root root  12 Jan 18 05:49 latest -> jdk1.8.0_192/

需要將jps軟鏈到/usr/bin

cd /usr/bin
sudo ln -s /opt/jdk/latest/bin/jps jps

配置Hadoop

將hadoop解壓縮至 /opt/hadoop, 並創建latest 軟鏈, 完成后目錄結構如下

$ ll /opt/hadoop/
total 0
drwxr-xr-x 9 root root 149 Nov 13 15:15 hadoop-2.9.2/
lrwxrwxrwx 1 root root  12 Jan 18 10:26 latest -> hadoop-2.9.2/

修改配置文件 etc/hadoop/hadoop-env.sh

需要修改的變量有兩處

# The java implementation to use.
export JAVA_HOME=/opt/jdk/latest

# Where log files are stored.  $HADOOP_HOME/logs by default.
export HADOOP_LOG_DIR=/home/tomcat/run/hadoop/logs

修改配置文件 etc/hadoop/yarn-env.sh

需要修改的變量有兩處

# some Java parameters
export JAVA_HOME=/opt/jdk/latest

# default log directory & file
export YARN_LOG_DIR=/home/tomcat/run/yarn/logs

修改配置文件/etc/hadoop/slaves 

將內容修改為兩個slave的主機名

vm149
vm150

修改配置文件/etc/hadoop/core-site.xml

添加以下內容. 配置明細需要參考 share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml

<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/tomcat/run/hadoop</value>
  </property>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://vm148:9000</value>
  </property>
</configuration>

修改配置文件/etc/hadoop/hdfs-site.xml

添加以下內容

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
  </property>
</configuration>

修改配置文件/etc/hadoop/mapred-site.xml

添加以下內容

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

修改配置文件/etc/hadoop/yarn-site.xml

添加以下內容. 配置明細需要參考 share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

<configuration>
  <property>
    <description>The hostname of the RM.</description>
    <name>yarn.resourcemanager.hostname</name>
    <value>vm148</value>
  </property>    
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
</configuration>

將配置好的hadoop, 按當前的目錄結構, 復制到另外兩個服務器中

啟動Hadoop

第一次啟動前, 需要format nameserver, 在master服務器上執行

/opt/hadoop/latest/bin/hdfs namenode -format

然后啟動hdfs服務

/opt/hadoop/latest/sbin/start-dfs.sh

然后啟動yarn服務

/opt/hadoop/latest/sbin/start-yarn.sh

每一步, 都需要用jps命令查看服務是否正常啟動, 對於master服務器, 正常啟動后應該顯示如下進程

tomcat@vm148:/opt$ jps
3173 SecondaryNameNode
3495 ResourceManager
4583 Jps
2906 NameNode

slave服務器

tomcat@vm149:~/run$ jps
3074 NodeManager
2691 DataNode
3591 Jps

.

WEB訪問

服務啟動后, 可以通過 http://vm148:50070/ 訪問web界面

服務端口

master端

21, FTP for ?

8030, YARN resourcemanager scheduler
8031, YARN resourcemanager tracker
8032, YARN resourcemanager
8033, YARN resourcemanager admin
8088, YARN resourcemanager webapp
8090, YARN resourcemanager webapp https

9000, HDFS

50070, WEB UI
50090, 

slave, data node端

50075

運行WordCount Example

例子代碼來源: https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

首先編譯java, 生成class, 生成jar. 因為JAVA_HOME已經在hadoop里配置過, 而PATH在此環境不需要, 只需要配置一個tools.jar的classpath就可以了

export HADOOP_CLASSPATH=/opt/jdk/latest/lib/tools.jar
/opt/hadoop/latest/bin/hadoop com.sun.tools.javac.Main WordCount.java 
/opt/jdk/latest/bin/jar cf wc.jar WordCount*.class

然后將兩個輸入文件上載到hdfs.

/opt/hadoop/latest/bin/hadoop fs -put file01 /workspace/input/
/opt/hadoop/latest/bin/hadoop fs -ls /workspace/input
/opt/hadoop/latest/bin/hadoop fs -put file02 /workspace/input/
/opt/hadoop/latest/bin/hadoop fs -cat /workspace/input/file01
/opt/hadoop/latest/bin/hadoop fs -cat /workspace/input/file02

一開始我在這里遇到了個坑: 我把文件放到/tmp/下面去了, 把/tmp作為輸入目錄, 結果在運行中yarn會把staging信息存在 /tmp/hadoop-yarn/staging 文件中, 然后任務就拋異常了. 教訓就是: 任務文件不要放到/tmp下

執行任務

/opt/hadoop/latest/bin/hadoop jar wc.jar WordCount /workspace/input /workspace/output

這里最后一個路徑是輸出路徑, 這個路徑在運行任務前不能存在, 否則也會報錯

最后的執行結果

tomcat@vm148:~$ /opt/hadoop/latest/bin/hadoop jar wc.jar WordCount /workspace/input /workspace/output
19/01/30 08:24:55 INFO client.RMProxy: Connecting to ResourceManager at vm148/192.168.1.148:8032
19/01/30 08:24:55 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/01/30 08:24:56 INFO input.FileInputFormat: Total input files to process : 2
19/01/30 08:24:56 INFO mapreduce.JobSubmitter: number of splits:2
19/01/30 08:24:56 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/01/30 08:24:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1547812325179_0004
19/01/30 08:24:56 INFO impl.YarnClientImpl: Submitted application application_1547812325179_0004
19/01/30 08:24:56 INFO mapreduce.Job: The url to track the job: http://vm148:8088/proxy/application_1547812325179_0004/
19/01/30 08:24:56 INFO mapreduce.Job: Running job: job_1547812325179_0004
19/01/30 08:25:03 INFO mapreduce.Job: Job job_1547812325179_0004 running in uber mode : false
19/01/30 08:25:03 INFO mapreduce.Job:  map 0% reduce 0%
19/01/30 08:25:10 INFO mapreduce.Job:  map 100% reduce 0%
19/01/30 08:25:18 INFO mapreduce.Job:  map 100% reduce 100%
19/01/30 08:25:18 INFO mapreduce.Job: Job job_1547812325179_0004 completed successfully
19/01/30 08:25:18 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=97
		FILE: Number of bytes written=594622
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=266
		HDFS: Number of bytes written=38
		HDFS: Number of read operations=9
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=2
		Launched reduce tasks=1
		Data-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=10309
		Total time spent by all reduces in occupied slots (ms)=3850
		Total time spent by all map tasks (ms)=10309
		Total time spent by all reduce tasks (ms)=3850
		Total vcore-milliseconds taken by all map tasks=10309
		Total vcore-milliseconds taken by all reduce tasks=3850
		Total megabyte-milliseconds taken by all map tasks=10556416
		Total megabyte-milliseconds taken by all reduce tasks=3942400
	Map-Reduce Framework
		Map input records=2
		Map output records=10
		Map output bytes=96
		Map output materialized bytes=103
		Input split bytes=210
		Combine input records=10
		Combine output records=8
		Reduce input groups=5
		Reduce shuffle bytes=103
		Reduce input records=8
		Reduce output records=5
		Spilled Records=16
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=379
		CPU time spent (ms)=2090
		Physical memory (bytes) snapshot=778280960
		Virtual memory (bytes) snapshot=5914849280
		Total committed heap usage (bytes)=507510784
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=56
	File Output Format Counters 
		Bytes Written=38
tomcat@vm148:~$ /opt/hadoop/latest/bin/hadoop fs -ls /workspace/output
Found 2 items
-rw-r--r--   2 tomcat supergroup          0 2019-01-30 08:25 /workspace/output/_SUCCESS
-rw-r--r--   2 tomcat supergroup         38 2019-01-30 08:25 /workspace/output/part-r-00000
tomcat@vm148:~$ /opt/hadoop/latest/bin/hadoop fs -cat /workspace/output/part-r-00000
Day	2
Good	2
Hadoop	2
Hello	2
World	2

一個簡單的Map Reduce 例子

輸入的內容格式是這樣的, 每一行是一個日志記錄, 記錄了用戶, IP和時間戳, 需要統計每個 (用戶+IP) 出現的次數

1571	76	738	legnd	166.111.8.133	870876781
1572	121	697	kuoc	202.116.65.16	870909489
1573	121	697	kuoc	202.116.65.16	870910644
1574	121	739	maerick		870926284

代碼 pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
		 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
		 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>com.rockbb</groupId>
	<artifactId>hdtask</artifactId>
	<packaging>jar</packaging>
	<version>1.0-SNAPSHOT</version>

	<name>HD Task</name>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
	</properties>

	<dependencies>
		<dependency>
	      <groupId>junit</groupId>
	      <artifactId>junit</artifactId>
	      <version>4.8.2</version>
	      <scope>test</scope>
	    </dependency>

	    <dependency>
		    <groupId>org.apache.hadoop</groupId>
		    <artifactId>hadoop-common</artifactId>
		    <version>2.4.1</version>
	    </dependency>
     
	    <dependency>
		    <groupId>org.apache.hadoop</groupId>
		    <artifactId>hadoop-hdfs</artifactId>
		    <version>2.4.1</version>
	    </dependency>

	    <dependency>
		    <groupId>org.apache.hadoop</groupId>
		    <artifactId>hadoop-mapreduce-client-core</artifactId>
		    <version>2.4.1</version>
	    </dependency>

	</dependencies>

	<build>
		<pluginManagement>
			<plugins>
				<plugin>
					<groupId>org.apache.maven.plugins</groupId>
					<artifactId>maven-compiler-plugin</artifactId>
					<version>3.3</version>
					<configuration>
						<source>1.8</source>
						<target>1.8</target>
						<encoding>UTF-8</encoding>
					</configuration>
				</plugin>
				<plugin>
					<groupId>org.apache.maven.plugins</groupId>
					<artifactId>maven-resources-plugin</artifactId>
					<configuration>
						<encoding>UTF-8</encoding>
					</configuration>
				</plugin>
			</plugins>
		</pluginManagement>
	</build>
</project>

代碼 DataBean.java

package com.rockbb.hdtask;

import org.apache.hadoop.io.Writable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

public class DataBean implements Writable {
    private String nameIp;
    private long count;

    public DataBean() {
    }

    public DataBean(String nameIp, long count) {
        this.nameIp = nameIp;
        this.count = count;
    }

    public String getNameIp() {
        return nameIp;
    }

    public void setNameIp(String nameIp) {
        this.nameIp = nameIp;
    }

    public long getCount() {
        return count;
    }

    public void setCount(long count) {
        this.count = count;
    }

    /**
     * Important: this will be use for the final output.
     */
    @Override
    public String toString() {
        return this.nameIp + "\t" + this.count;
    }

    @Override
    public void write(DataOutput dataOutput) throws IOException {
        dataOutput.writeUTF(nameIp);
        dataOutput.writeLong(count);
    }

    @Override
    public void readFields(DataInput dataInput) throws IOException {
        this.nameIp = dataInput.readUTF();
        this.count = dataInput.readLong();
    }
}

代碼 IpCount.java

package com.rockbb.hdtask;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class IpCount {
    public static class IpMapper extends Mapper<LongWritable, Text, Text, DataBean> {
        @Override
        public void map(LongWritable keyIn, Text valueIn, Context context) throws IOException, InterruptedException {
            String line = valueIn.toString();
            String[] fields = line.split("\t");
            String keyOut = fields[3] + '-' + fields[4];
            long valueOut = 1;
            DataBean bean = new DataBean(keyOut, valueOut);
            context.write(new Text(keyOut), bean);
        }
    }

    public static class IpReducer extends Reducer<Text, DataBean, Text, DataBean> {
        @Override
        public void reduce(Text keyIn, Iterable<DataBean> valuesIn, Context context) throws IOException, InterruptedException {
            long total = 0;
            for (DataBean bean : valuesIn) {
                total += bean.getCount();
            }
            DataBean bean = new DataBean("", total);
            context.write(keyIn, bean);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);
        job.setJarByClass(IpCount.class);
        job.setMapperClass(IpMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(DataBean.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));

        job.setReducerClass(IpReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(DataBean.class);
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);
    }
}

運行命令

/opt/hadoop/latest/bin/hadoop jar hdtask.jar com.rockbb.hdtask.IpCount /workspace/input/ /workspace/output3

.數據文件有2.3GB, 因為默認的block大小為128MB, 所以提交后產生了19個Map任務和一個Reduce任務, 任務的命令行輸出

19/01/31 10:08:01 INFO client.RMProxy: Connecting to ResourceManager at vm148/192.168.31.148:8032
19/01/31 10:08:02 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/01/31 10:08:02 INFO input.FileInputFormat: Total input files to process : 1
19/01/31 10:08:02 INFO mapreduce.JobSubmitter: number of splits:19
19/01/31 10:08:02 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/01/31 10:08:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1547812325179_0008
19/01/31 10:08:03 INFO impl.YarnClientImpl: Submitted application application_1547812325179_0008
19/01/31 10:08:03 INFO mapreduce.Job: The url to track the job: http://vm148:8088/proxy/application_1547812325179_0008/
19/01/31 10:08:03 INFO mapreduce.Job: Running job: job_1547812325179_0008
19/01/31 10:08:13 INFO mapreduce.Job: Job job_1547812325179_0008 running in uber mode : false
19/01/31 10:08:13 INFO mapreduce.Job:  map 0% reduce 0%
19/01/31 10:08:41 INFO mapreduce.Job:  map 11% reduce 0%
19/01/31 10:08:45 INFO mapreduce.Job:  map 21% reduce 0%
19/01/31 10:08:47 INFO mapreduce.Job:  map 23% reduce 0%
19/01/31 10:08:51 INFO mapreduce.Job:  map 28% reduce 0%
19/01/31 10:08:53 INFO mapreduce.Job:  map 30% reduce 0%
19/01/31 10:08:57 INFO mapreduce.Job:  map 31% reduce 0%
19/01/31 10:08:59 INFO mapreduce.Job:  map 38% reduce 0%
19/01/31 10:09:09 INFO mapreduce.Job:  map 39% reduce 0%
19/01/31 10:09:10 INFO mapreduce.Job:  map 40% reduce 0%
19/01/31 10:09:11 INFO mapreduce.Job:  map 44% reduce 0%
19/01/31 10:09:14 INFO mapreduce.Job:  map 46% reduce 0%
19/01/31 10:09:16 INFO mapreduce.Job:  map 48% reduce 0%
19/01/31 10:09:17 INFO mapreduce.Job:  map 49% reduce 0%
19/01/31 10:09:22 INFO mapreduce.Job:  map 55% reduce 0%
19/01/31 10:09:24 INFO mapreduce.Job:  map 56% reduce 0%
19/01/31 10:09:28 INFO mapreduce.Job:  map 61% reduce 0%
19/01/31 10:09:40 INFO mapreduce.Job:  map 64% reduce 0%
19/01/31 10:09:42 INFO mapreduce.Job:  map 64% reduce 7%
19/01/31 10:09:46 INFO mapreduce.Job:  map 66% reduce 7%
19/01/31 10:09:48 INFO mapreduce.Job:  map 68% reduce 9%
19/01/31 10:09:52 INFO mapreduce.Job:  map 71% reduce 9%
19/01/31 10:09:54 INFO mapreduce.Job:  map 71% reduce 12%
19/01/31 10:09:58 INFO mapreduce.Job:  map 73% reduce 12%
19/01/31 10:09:59 INFO mapreduce.Job:  map 74% reduce 12%
19/01/31 10:10:01 INFO mapreduce.Job:  map 75% reduce 12%
19/01/31 10:10:04 INFO mapreduce.Job:  map 80% reduce 12%
19/01/31 10:10:06 INFO mapreduce.Job:  map 81% reduce 12%
19/01/31 10:10:10 INFO mapreduce.Job:  map 85% reduce 12%
19/01/31 10:10:12 INFO mapreduce.Job:  map 86% reduce 12%
19/01/31 10:10:13 INFO mapreduce.Job:  map 87% reduce 12%
19/01/31 10:10:15 INFO mapreduce.Job:  map 88% reduce 12%
19/01/31 10:10:18 INFO mapreduce.Job:  map 88% reduce 16%
19/01/31 10:10:22 INFO mapreduce.Job:  map 90% reduce 16%
19/01/31 10:10:23 INFO mapreduce.Job:  map 91% reduce 16%
19/01/31 10:10:24 INFO mapreduce.Job:  map 91% reduce 18%
19/01/31 10:10:25 INFO mapreduce.Job:  map 92% reduce 18%
19/01/31 10:10:29 INFO mapreduce.Job:  map 93% reduce 18%
19/01/31 10:10:31 INFO mapreduce.Job:  map 93% reduce 21%
19/01/31 10:10:32 INFO mapreduce.Job:  map 94% reduce 21%
19/01/31 10:10:34 INFO mapreduce.Job:  map 96% reduce 21%
19/01/31 10:10:35 INFO mapreduce.Job:  map 97% reduce 21%
19/01/31 10:10:37 INFO mapreduce.Job:  map 98% reduce 23%
19/01/31 10:10:38 INFO mapreduce.Job:  map 99% reduce 23%
19/01/31 10:10:41 INFO mapreduce.Job:  map 100% reduce 23%
19/01/31 10:10:43 INFO mapreduce.Job:  map 100% reduce 30%
19/01/31 10:10:49 INFO mapreduce.Job:  map 100% reduce 33%
19/01/31 10:11:25 INFO mapreduce.Job:  map 100% reduce 67%
19/01/31 10:11:31 INFO mapreduce.Job:  map 100% reduce 70%
19/01/31 10:11:37 INFO mapreduce.Job:  map 100% reduce 74%
19/01/31 10:11:43 INFO mapreduce.Job:  map 100% reduce 78%
19/01/31 10:11:49 INFO mapreduce.Job:  map 100% reduce 83%
19/01/31 10:11:55 INFO mapreduce.Job:  map 100% reduce 86%
19/01/31 10:12:01 INFO mapreduce.Job:  map 100% reduce 89%
19/01/31 10:12:07 INFO mapreduce.Job:  map 100% reduce 93%
19/01/31 10:12:13 INFO mapreduce.Job:  map 100% reduce 97%
19/01/31 10:12:18 INFO mapreduce.Job:  map 100% reduce 100%
19/01/31 10:12:19 INFO mapreduce.Job: Job job_1547812325179_0008 completed successfully
19/01/31 10:12:19 INFO mapreduce.Job: Counters: 50
	File System Counters
		FILE: Number of bytes read=6635434217
		FILE: Number of bytes written=9269615741
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=2551940756
		HDFS: Number of bytes written=134288980
		HDFS: Number of read operations=60
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Killed map tasks=3
		Launched map tasks=22
		Launched reduce tasks=1
		Data-local map tasks=22
		Total time spent by all maps in occupied slots (ms)=1737403
		Total time spent by all reduces in occupied slots (ms)=178563
		Total time spent by all map tasks (ms)=1737403
		Total time spent by all reduce tasks (ms)=178563
		Total vcore-milliseconds taken by all map tasks=1737403
		Total vcore-milliseconds taken by all reduce tasks=178563
		Total megabyte-milliseconds taken by all map tasks=1779100672
		Total megabyte-milliseconds taken by all reduce tasks=182848512
	Map-Reduce Framework
		Map input records=49458230
		Map output records=49458230
		Map output bytes=2531297616
		Map output materialized bytes=2630214190
		Input split bytes=2052
		Combine input records=0
		Combine output records=0
		Reduce input groups=5453085
		Reduce shuffle bytes=2630214190
		Reduce input records=49458230
		Reduce output records=5453085
		Spilled Records=174185483
		Shuffled Maps =19
		Failed Shuffles=0
		Merged Map outputs=19
		GC time elapsed (ms)=9585
		CPU time spent (ms)=389790
		Physical memory (bytes) snapshot=5763260416
		Virtual memory (bytes) snapshot=39333715968
		Total committed heap usage (bytes)=4077912064
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=2551938704
	File Output Format Counters 
		Bytes Written=134288980

  


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM