寫在前面

一、大數據全棧

頭兩節講完HDFS & MapReduce，這一部分聊一聊它們之間的“人物關系”。

其中也討論下k8s的學習必要性。

Ref: [Distributed ML] Yi WANG's talk

二、知識點

容器技術與Kubernetes

Goto: 3 萬容器，知乎基於Kubernetes容器平台實踐

Goto: 如何學習、了解kubernetes？

Goto: 選K8S是對的，但是用不好就是你的不對了

Yarn資源管理

一、重要概念

ResouceManager
ApplicationMaster
NodeManager
Container
JobHistoryServer
Timeline Server

JobHistoryServer

所有node啟動如下命令，能記錄mapreduce應用程序的記錄。（對作業信息進行記錄）

mr-jobhistory-daemon.sh start historyserver

Timeline Server

寫與第三方結合的日志服務數據（比如spark等），是更細粒度的信息記錄。。

任務在哪個隊列中運行；

運行任務時設置的用戶是哪個用戶；

二、啟動流程

Ref: 實戰案例玩轉Hadoop系列11--運行Map Reduce程序

在真實的生產環境中，MAP REDUCE程序應該提交到Yarn集群上分布式運行，這樣才能發揮出MAP REDUCE分布式並行計算的效果。

MAP REDUCE程序提交給Yarn執行的過程如下：

1、客戶端代碼中設置好MAP REDUCE程序運行時所要使用的Mapper類、Reducer類、程序Jar包所在路徑、Job名稱、Job輸入數據的切片信息、Configuration所配置的參數等資源，統一提交給Yarn所指定的位於HDFS上的Job資源提交路徑；

2、客戶端向Yarn中的Resource Manager請求運行Jar包中MRAppMaster進程的資源容器Container；

分配application id、輸出是否存在、輸入 --> split（一個分片對應一個map task）

3、Yarn將提供Container的任務指派給某個擁有空閑資源的 Node Manager節點，Node Manager接受任務后創建資源容器（即所謂的Container）；

容器所需分配的“資源描述信息” ---> 某個空閑的Node Manager節點 ---> 啟動一個contrainer

4、客戶端向創建好容器的Node Manager發送啟動MRAppMaster進程的shell腳本命令，啟動MRAppMaster；

5、MRAppMaster啟動后，讀取 job相關配置及程序資源，向Resource Manager請求N個資源容器來啟動若干個Map Task進程和若干個Reduce Task進程，並監控這些Map Task進程和Reduce Task進程的運行狀態；

6、當整個Job的所有Map Task進程和Reduce Task進程任務處理完成后，整個Job的所有進程全部注銷，Yarn則銷毀Container，回收運算資源。

三、Yarn調度器

FIFO Scheduler

Capacity Scheduler

Fair Scheduler

新建一個capacity-scheduler.xml，也要同步拷貝到其他node中。

<configuration>

　　<property>

　　　　<name>yarn.scheduler.capacity.root.queues</name>

　　　　<value>prod,dev</value>

　　</property>

　　<property>

　　　　<name>yarn.scheduler.capacity.root.dev.queues</name>

　　　　<value>hdp,spark</value>

　　</property>

　　<property>

　　　　<name>yarn.scheduler.capacity.root.prod.capacity</name>

　　　　<value>40</value>

　　</property>

　   <property>

　　　　<name>yarn.scheduler.capacity.root.dev.capacity</name>

　　　　<value>60</value>

　　</property>

　   <property>

　　　　<name>yarn.scheduler.capacity.root.dev.maximum-capacity</name>

　　　　<value>75</value>

　　</property>

　　<property>

　　　　<name>yarn.scheduler.capacity.root.dev.hdp.capacity</name>

　　　　<value>50</value>

　　</property>

　   <property>

　　　　<name>yarn.scheduler.capacity.root.dev.spark.capacity</name>

　　　　<value>50</value>

　　</property>

</configuration>

MR程序中添加代碼：

Configuration configuration = new Configuration();
configuration.set("mapreduce.job.queuename", "hdp")

Job job = Job.getInstance(configuration, WordCountMain.class.getSimpleName());

Cluster UI在運行的MR查看：

Kubernetes

Ref：Big Data: Google Replaces YARN with Kubernetes to Schedule Apache Spark

Ref: Running Spark on Kubernetes

Ref: Running Spark on YARN

The Kubernetes scheduler is currently experimental. In future versions, there may be behavioral changes around configuration, container images and entrypoints. - 2019/10/28

既然這樣，暫時不提。

End.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 [Hadoop] Yarn & k8s [k8s]-k8s入門啥叫K8s？啥是k8s？使用 Yarn workspace,TypeScript,esbuild,React 和 Express 構建 K8S 雲原生應用(一) k8s簡介 k8s之Calico 【k8s】terminationMessagePath k8s的service k8s之namespace k8s安裝