Cloudera Manager 5.9 和 CDH 5.9 離線安裝指南及個人采坑填坑記


公司的CDH早就裝好了,一直想自己裝一個玩玩,最近組了台電腦,筆記本就淘汰下來了,加上之前的,一共3台,就在X寶上買了CPU和內存升級了下筆記本,就自己組了個集群。

話說,好想去撿垃圾,撿台8核16線程64G內存的回來,手動滑稽。

3台筆記本的配置和角色分配:

宿主CPU     宿主內存  虛擬機  虛擬機CPU/台    角色及內存

雙核雙線程     4G    1台    雙核雙線程    nexus、yum、ntp、svn

雙核四線程     8G    2台    雙核四線程    master(4G)、node01(2G)

雙核四線程     8G    3台    雙核四線程    node02、node03、node04(各2G)

雙核四線程     8G    1台    雙核四線程    master(6G)

雙核四線程     8G    2台    雙核四線程    node01、node02(各3G)

虛擬機的網卡都使用橋接模式,保證在同一網段下。

雙核雙線程那台本想也虛擬出2台來,但實在太老了,就給集群當個通用服務器吧。

試安裝的時候,原本想整個7個節點,每個給2G,但master給2G,根本不夠用,卡得要死,空余內存就20多M,所以建議master給4G以上,我也就只能整5台了。

master給4G,選擇安裝所有服務也是不夠的,最后的時候一共10步驟,執行到第八步會報錯,網上老司機對這個錯誤解釋是內存不足,所以只給4G的話就少裝點服務吧,給了6G可以裝所有服務。

另據 hyj 所述,master上要裝mysql,小於4G可能會出現未知錯誤。

感覺CDH比Apache原生的Hadoop多吃好多內存,畢竟要起服務管理和監控。

 

我安裝過程中參考了很多教程,主要參考了 hyj 寫的:http://www.aboutyun.com/thread-9086-1-1.html,感謝 hyj 和其它小伙伴的教程。

因為我裝的是5.9版,與 hyj 裝的5.0稍有出入,本篇算是對 hyj 教程的補充和更新,並加入了個人的心得總結(采坑填坑)。

PS:善用虛擬機的快照功能。

本文命令一律使用root用戶,非root用戶,請設置sudo。

 

〇、安裝文件准備

Cloudera Manager 5.9:http://archive-primary.cloudera.com/cm5/cm/5/cloudera-manager-el6-cm5.9.0_x86_64.tar.gz

CDH5.9 主文件:http://archive-primary.cloudera.com/cdh5/parcels/5.9.0.23/CDH-5.9.0-1.cdh5.9.0.p0.23-el6.parcel

CDH5.9 sha文件:http://archive-primary.cloudera.com/cdh5/parcels/5.9.0.23/CDH-5.9.0-1.cdh5.9.0.p0.23-el6.parcel.sha1

manifest 文件:http://archive-primary.cloudera.com/cdh5/parcels/5.9.0.23/manifest.json

一、虛擬機准備

系統用的是Centos6.8的Minimal安裝(事實證明,最小安裝也是個坑啊~),常規的設置或最小安裝不帶的軟件,諸如:設置ip地址,hosts,ssh免密登錄,scp,sudo,關閉防火牆,yum,ntp時間同步,我這不再敘述(可參考hyj的文章),這些是安裝CDH的基礎,請務必設置好。

虛擬機我用的完整克隆,這樣網絡會不可用,解決方案如下:

1、修改hostname

vi /etc/sysconfig/network
NETWORKING
=yes HOSTNAME=node02

2、修改ip

vi /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE
=eth0 TYPE=Ethernet ONBOOT=yes NM_CONTROLLED=yes BOOTPROTO=static IPADDR=192.168.2.102 PREFIX=24 GATEWAY=192.168.2.99

3、刪除舊網卡

vi /etc/udev/rules.d/70-persistent-net.rules

可以看到會有2張PCI device網卡,刪除eth0那行,再把eth1的那行里的"eth1"改成"eth0"。

4、重啟

reboot

 

二、實用shell腳本

本文除特別說明,都是在master機上操作

0、節點List文件(文件名nodes,不將master列在其中,不然下面的scp腳本會出問題的,詳見二-2)

vi nodes

node01
node02
node03
node04

1、ssh免密登錄批處理(需要同級目錄下的nodes文件)

#!/bin/bash
PASSWORD=hadoop

auto_ssh_copy_id() {
    expect -c "set timeout -1;
        spawn ssh-copy-id $1;
        expect {
            *(yes/no)* {send -- yes\r;exp_continue;}
            *assword:* {send -- $2\r;exp_continue;}
            eof        {exit 0;}
        }";
}

cat nodes | while read host
do
{
    auto_ssh_copy_id $host $PASSWORD
}&wait
done

2、scp批處理(需要同級目錄下的nodes文件),即下文命令中的scp.sh

nodes文件中不能包含master,不然就是master scp到master,scp目錄的時候會反復在目錄下創建目錄再拷貝,直至路徑太長而創建失敗,然后才scp到node01等

#!/bin/bash
cat nodes | while read host
do
{
    scp -r $1 $host:$2
}&wait
done

3、ssh批處理(需要同級目錄下的nodes文件),即下文命令中的ssh.sh

#!/bin/bash
cat nodes | while read host
do
{
    ssh $host $1
}&wait
done

 

三、Cloudera推薦設置

在試安裝的過程,發現Cloudera給出了一些警告,如下圖:

身為一個有潔癖的碼農,自然是連黃色的感嘆號都要消滅的。因此在安裝CM/CDH之前就先全部設置好。

1、設置swap空間

vi /etc/sysctl.conf
末尾加上
vm.swappiness=10

使用scp批處理拷貝/etc/sysctl.conf到各節點

./scp.sh /etc/sysctl.conf /etc/

使用ssh批處理生效

./ssh.sh "sysctl -p"

2、關閉大頁面壓縮

試過只設置defrag,但貌似個別節點還是會有警告,干脆全部設置

vi /etc/rc.local
末尾加上(永久生效)
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

批處理拷貝

./scp.sh /etc/rc.local /etc/

生效

reboot
或
./ssh.sh "echo never > /sys/kernel/mm/transparent_hugepage/enabled"
./ssh.sh "echo never > /sys/kernel/mm/transparent_hugepage/defrag"

3、關閉SELinux

Cloudera企業版本支持在SELinux平台上部署,咱們不花錢的還是關掉吧。

文檔地址:http://www.cloudera.com/documentation/enterprise/release-notes/topics/rn_consolidated_pcm.html#xd_583c10bfdbd326ba--5a52cca-1476e7473cd--7f8d

說明原文:Cloudera Enterprise is supported on platforms with Security-Enhanced Linux (SELinux) enabled. However, Cloudera does not support use of SELinux with Cloudera Navigator. Cloudera is not responsible for policy support nor policy enforcement. If you experience issues with SELinux, contact your OS provider.

vi /etc/selinux/config

SELINUX=disabled

批處理拷貝

./scp.sh /etc/selinux/config /etc/selinux/

重啟

reboot

 

四、Java安裝(安裝按照個人習慣)

1、解壓並創建軟連接

tar -zxvf jdk-8u112-linux-x64.tar.gz -C /opt/program/
ln -s /opt/program/jdk1.8.0_112/ /opt/java

2、設置環境變量

vi /etc/profile
末尾添加
export JAVA_HOME=/opt/java
export PATH=$JAVA_HOME/bin:$PATH

3、批處理拷貝

./scp.sh /opt/program/jdk1.8.0_112/ /opt/program/jdk1.8.0_112/
./scp.sh /etc/profile /etc/

4、生效

./ssh.sh "ln -s /opt/program/jdk1.8.0_112/ /opt/java"
./ssh.sh "source /etc/profile"(此條無效,請在各節點手動執行)

5、設置全局變量(重要,我試安裝的時候沒加導致CM找不到JAVA_HOME而安裝失敗)

echo "JAVA_HOME=/opt/java" >> /etc/environment

 

五、安裝Mysql(供CM使用)

1、yum安裝Mysql

yum install -y mysql mysql-server mysql-devel 

2、設置隨系統啟動

chkconfig mysqld on

3、啟動mysql

service mysqld start

4、設置root用戶密碼

mysql
USE mysql; 
UPDATE user SET Password=PASSWORD('hdp') WHERE user='root'; 
FLUSH PRIVILEGES; 
exit;

5、設置允許遠程登錄

mysql -u root -p 
hdp
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'hdp' WITH GRANT OPTION;

6、創建CM用的數據庫

安裝集群時按需創建,詳見第七章第13步

--hive數據庫
create database hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
--oozie數據庫
create database oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
--hue數據庫
create database hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci;

 

六、安裝CM

1、解壓到/opt目錄下,不能解壓到其他地方,因為cdh5的源會默認在/opt/cloudera/parcel-repo尋找,而CM可以按照個人喜好安裝

tar -zxvf cloudera-manager-el6-cm5.9.0_x86_64.tar.gz -C /opt/
mv /opt/cm-5.9.0/ /opt/program/
ln -s /opt/program/cm-5.9.0/ /opt/cm

2、將CDH-5.9.0-1.cdh5.9.0.p0.23-el6.parcel 和 CDH-5.9.0-1.cdh5.9.0.p0.23-el6.parcel.sha1移動到/opt/cloudera/parcel-repo

  這樣安裝時CM就能直接找到了。

mv CDH-5.9.0-1.cdh5.9.0.p0.23-el6.parcel CDH-5.9.0-1.cdh5.9.0.p0.23-el6.parcel.sha1 /opt/cloudera/parcel-repo/

3、將CDH-5.9.0-1.cdh5.9.0.p0.23-el6.parcel.sha1重命名為CDH-5.9.0-1.cdh5.9.0.p0.23-el6.parcel.sha(去掉結尾的1)

  非常重要。我試安裝時,這點遺漏了,導致安裝CDH的時候一直刷不出5.9版本。

  舊版本好像還需要manifest.json,用manifest.json里的信息來生成*.sha,新版本直接可以下載,也就不用manifest.json了,嚴謹一點的話可以比對一下。

  通過日志發現,沒有manifest.json就會去下載,不能訪問外網就報錯了,但不影響安裝CDH,還是mv一下吧

mv manifest.json /opt/cloudera/parcel-repo/
cd /opt/cloudera/parcel-repo/
mv CDH-5.9.0-1.cdh5.9.0.p0.23-el6.parcel.sha1 CDH-5.9.0-1.cdh5.9.0.p0.23-el6.parcel.sha

4、修改配置文件中的server_host

vi /opt/cm/etc/cloudera-scm-agent/config.ini

server_host=master

5、將mysql的JDBC驅動放入CM的lib目錄下

JDBC驅動下載:http://dev.mysql.com/downloads/connector/j/

gz和zip都無所謂,最終要的是里面的jar包。

解壓獲得mysql-connector-java-5.1.40-bin.jar上傳到集群。

mv mysql-connector-java-5.1.40-bin.jar /opt/cm/share/cmf/lib/

6、為CM創建數據庫

/opt/cm/share/cmf/schema/scm_prepare_database.sh mysql cm -hlocalhost -uroot -phdp --scm-host localhost scm scm scm

7、為每個節點創建cloudera-scm用戶

useradd --system --home=/opt/cm/run/cloudera-scm-server --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm

接下來其實可以直接在master上啟動服務安裝了,但因為其它節點沒有CM,最后還是通過遠程的yum在線下載再安裝,我這設置了內部網絡,其它節點是訪問不了外網的,所以拷貝CM到其它節點進行完全離線安裝。

8、拷貝CM到每個節點

./scp.sh /opt/program/cm-5.9.0/ /opt/program/cm-5.9.0/
./ssh.sh "ln -s /opt/program/cm-5.9.0/ /opt/cm"

看着滾動着的屏幕,等吧~

PS:我單獨搞了個二級路由器給集群當交換機,但用的是無線,這路由器也是個便宜貨,scp的時候看了下,速度峰值只有2Mb/s左右,普遍在1.2Mb/s,所以scp的時候特別慢,應該和路由器及筆記本的無線網卡有關(筆記本比較老,無線網卡型號也挺舊)。有線沒試,估計會好很多,畢竟100M的有線網卡很早就普及了。所以大的數據量,無線是搞不起來的。大數據搞不了,小數據還是可以搞搞的嘛~小數據要什么分布式集群啊?

所以,大數據處理速度想要快,CPU、內存、網速和IO速度,這幾項硬件還是大頭啊。大頭既然搞不了(關鍵是沒錢),小頭還是可以搞搞的嘛~比如:重構啊,序列化啊什么的,下次在這上面試試效果。

拷貝了快1個小時,完了趕緊虛擬機快照一下。

9、在master上啟動CM的service服務

/opt/cm/etc/init.d/cloudera-scm-server start

10、在所有你想作為worker的節點上啟動CM的agent服務

/opt/cm/etc/init.d/cloudera-scm-agent start

然后就能在http://master:7180/上開始安裝CDH了。但因為是剛啟動,需要等上個幾分鍾,才能看到web頁面。當然,你電腦快,當我沒說0.0

等頁面能訪問了,看了下master上剩余內存只有300M了。

 

七、安裝CDH(圖大部分是以5台節點的時候截的,最后幾張是3台節點。圖要是看不清楚,可以右擊選擇"在新標簽中查看")

1、使用admin(密碼admin)登錄

 2、勾選然后“繼續”

3、按需選擇,我選擇免費

4、繼續

5、因為我們在節點上啟動了agent,所以直接點“當前管理的主機”。如果節點上沒有CM,只有master上有,這邊可以在新主機下去搜索,例如192.168.2.[100-104],但這樣最后從節點會去yum在線安裝。

6、全部勾上,然后繼續

7、選擇版本,然后繼續

8、開始安裝了,等着吧

 9、好了之后繼續

10、 全部是勾,完美!

這里可能會有各種各樣的問題,但每個問題,cloudera都會給出相應的建議,當然也有一些很惡心的黃色感嘆號。比如:

我明明JDK都是一個tar.gz解壓出來的,它硬說我版本不一致,查看提示發現一個JAVA_HOME是/opt/java,一個是/opt/java/,多了個斜杠他就認為不一致了,我回檔快照然后再整就沒問題了,這種其實可以直接忽略掉。

所以這邊有問題,請仔細看提示,看看是不是哪個漏了或搞錯了。

11、集群設置,按需選擇

12、角色分配,按需分配

13、創建Mysql數據庫並測試(按需創建,比如你沒選oozie,就不用創建oozie的數據庫)

--hive數據庫
create database hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
--oozie數據庫
create database oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
--hue數據庫
create database hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci;

這時hue是連不上的,會報錯:Unexpected error. Unable to verify database connection.

這是因為我是Centos最小安裝,缺了個東西:yum install -y python-lxml

這個坑了我很久,還是在http://www.cnblogs.com/jasondan/p/4011153.html下面的回復里找到的解決方案。感謝那位小伙伴。

14、集群設置

15、報錯了,看日志顯示沒裝perl,每個節點都裝一下

yum install -y perl

16、然后就讓我蛋疼了很久,如下圖,JAVA_HOME竟然找不到了,但只是spark找不到,其它hdfs、yarn等都好的,說明我肯定都設置好了啊。

我做了2個嘗試,就好了,也不知道到底是哪個是有效操作,還是2個操作是有依賴的。

嘗試一:

cat /etc/environment 

然后在web上點“重試”試試看。不行的話:

嘗試二:

find / -type f -name "*.sh" | xargs grep "as ALT_NAME"

定位到/opt/cm/lib64/cmf/service/client/deploy-cc.sh

直接在上面加上

JAVA_HOME=/opt/java
export JAVA_HOME=/opt/java

scp到每個節點

./scp.sh /opt/cm/lib64/cmf/service/client/deploy-cc.sh /opt/cm/lib64/cmf/service/client/

再搞一遍這個

cat /etc/environment 

然后再點“重試”就好了

17、繼續安裝,啟動hive的時候失敗,看日志是沒有Mysql的JDBC,cp一個過去再繼續

cp mysql-connector-java-5.1.40-bin.jar /opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hive/lib/

18、安裝oozie的時候失敗,還是JDBC,再cp一個

cp mysql-connector-java-5.1.40-bin.jar /var/lib/oozie/

19、mater給4G的時候,執行到oozie我就報錯了,圖忘記截了,網上說是內存不足。下圖是master給了6G。

20、終於裝好了0.0

八、測試,使用spark計算PI

因為權限問題,先切換到hdfs用戶,之前安裝過程中CDH已經創建好hdfs用戶了。

su hdfs

spark-submit \
    --master yarn-client \
    --class org.apache.spark.examples.SparkPi \
    --driver-memory 512m \
    --executor-memory 512m \
    --executor-cores 2 \
    /opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/spark/examples/lib/spark-examples-1.6.0-cdh5.9.0-hadoop2.6.0-cdh5.9.0.jar \
    10
17/01/12 23:28:30 INFO spark.SparkContext: Running Spark version 1.6.0
17/01/12 23:28:33 INFO spark.SecurityManager: Changing view acls to: hdfs
17/01/12 23:28:33 INFO spark.SecurityManager: Changing modify acls to: hdfs
17/01/12 23:28:33 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); users with modify permissions: Set(hdfs)
17/01/12 23:28:34 INFO util.Utils: Successfully started service 'sparkDriver' on port 38078.
17/01/12 23:28:36 INFO slf4j.Slf4jLogger: Slf4jLogger started
17/01/12 23:28:37 INFO Remoting: Starting remoting
17/01/12 23:28:37 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.2.100:34306]
17/01/12 23:28:37 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@192.168.2.100:34306]
17/01/12 23:28:37 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 34306.
17/01/12 23:28:37 INFO spark.SparkEnv: Registering MapOutputTracker
17/01/12 23:28:37 INFO spark.SparkEnv: Registering BlockManagerMaster
17/01/12 23:28:37 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-d9897e9d-bdd0-424a-acdb-b636ba57cd04
17/01/12 23:28:37 INFO storage.MemoryStore: MemoryStore started with capacity 265.1 MB
17/01/12 23:28:38 INFO spark.SparkEnv: Registering OutputCommitCoordinator
17/01/12 23:28:39 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
17/01/12 23:28:39 INFO ui.SparkUI: Started SparkUI at http://192.168.2.100:4040
17/01/12 23:28:39 INFO spark.SparkContext: Added JAR file:/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/spark/examples/lib/spark-examples-1.6.0-cdh5.9.0-hadoop2.6.0-cdh5.9.0.jar at spark://192.168.2.100:38078/jars/spark-examples-1.6.0-cdh5.9.0-hadoop2.6.0-cdh5.9.0.jar with timestamp 1484234919364
17/01/12 23:28:40 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.2.100:8032
17/01/12 23:28:41 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers
17/01/12 23:28:42 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (1024 MB per container)
17/01/12 23:28:42 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
17/01/12 23:28:42 INFO yarn.Client: Setting up container launch context for our AM
17/01/12 23:28:42 INFO yarn.Client: Setting up the launch environment for our AM container
17/01/12 23:28:42 INFO yarn.Client: Preparing resources for our AM container
17/01/12 23:28:44 INFO yarn.Client: Uploading resource file:/tmp/spark-40750070-91a7-4a5b-ae27-1cfd733d0be8/__spark_conf__3134783970337565626.zip -> hdfs://master:8020/user/hdfs/.sparkStaging/application_1484232210824_0004/__spark_conf__3134783970337565626.zip
17/01/12 23:28:45 INFO spark.SecurityManager: Changing view acls to: hdfs
17/01/12 23:28:45 INFO spark.SecurityManager: Changing modify acls to: hdfs
17/01/12 23:28:45 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); users with modify permissions: Set(hdfs)
17/01/12 23:28:45 INFO yarn.Client: Submitting application 4 to ResourceManager
17/01/12 23:28:46 INFO impl.YarnClientImpl: Submitted application application_1484232210824_0004
17/01/12 23:28:47 INFO yarn.Client: Application report for application_1484232210824_0004 (state: ACCEPTED)
17/01/12 23:28:47 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: root.users.hdfs
         start time: 1484234925930
         final status: UNDEFINED
         tracking URL: http://master:8088/proxy/application_1484232210824_0004/
         user: hdfs
17/01/12 23:28:48 INFO yarn.Client: Application report for application_1484232210824_0004 (state: ACCEPTED)
17/01/12 23:28:49 INFO yarn.Client: Application report for application_1484232210824_0004 (state: ACCEPTED)
17/01/12 23:28:50 INFO yarn.Client: Application report for application_1484232210824_0004 (state: ACCEPTED)
17/01/12 23:28:51 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
17/01/12 23:28:51 INFO yarn.Client: Application report for application_1484232210824_0004 (state: ACCEPTED)
17/01/12 23:28:51 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> master, PROXY_URI_BASES -> http://master:8088/proxy/application_1484232210824_0004), /proxy/application_1484232210824_0004
17/01/12 23:28:51 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
17/01/12 23:28:52 INFO yarn.Client: Application report for application_1484232210824_0004 (state: RUNNING)
17/01/12 23:28:52 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 192.168.2.101
         ApplicationMaster RPC port: 0
         queue: root.users.hdfs
         start time: 1484234925930
         final status: UNDEFINED
         tracking URL: http://master:8088/proxy/application_1484232210824_0004/
         user: hdfs
17/01/12 23:28:52 INFO cluster.YarnClientSchedulerBackend: Application application_1484232210824_0004 has started running.
17/01/12 23:28:52 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42832.
17/01/12 23:28:52 INFO netty.NettyBlockTransferService: Server created on 42832
17/01/12 23:28:52 INFO storage.BlockManager: external shuffle service port = 7337
17/01/12 23:28:52 INFO storage.BlockManagerMaster: Trying to register BlockManager
17/01/12 23:28:52 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.2.100:42832 with 265.1 MB RAM, BlockManagerId(driver, 192.168.2.100, 42832)
17/01/12 23:28:52 INFO storage.BlockManagerMaster: Registered BlockManager
17/01/12 23:28:53 INFO scheduler.EventLoggingListener: Logging events to hdfs://master:8020/user/spark/applicationHistory/application_1484232210824_0004
17/01/12 23:28:53 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
17/01/12 23:28:55 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:36
17/01/12 23:28:55 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:36) with 10 output partitions
17/01/12 23:28:55 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:36)
17/01/12 23:28:55 INFO scheduler.DAGScheduler: Parents of final stage: List()
17/01/12 23:28:55 INFO scheduler.DAGScheduler: Missing parents: List()
17/01/12 23:28:55 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32), which has no missing parents
17/01/12 23:28:57 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
17/01/12 23:28:57 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1904.0 B, free 1904.0 B)
17/01/12 23:28:58 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1202.0 B, free 3.0 KB)
17/01/12 23:28:58 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.2.100:42832 (size: 1202.0 B, free: 265.1 MB)
17/01/12 23:28:58 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
17/01/12 23:28:58 INFO spark.ExecutorAllocationManager: Requesting 2 new executors because tasks are backlogged (new desired total will be 3)
17/01/12 23:28:58 INFO scheduler.DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32)
17/01/12 23:28:58 INFO cluster.YarnScheduler: Adding task set 0.0 with 10 tasks
17/01/12 23:28:59 INFO spark.ExecutorAllocationManager: Requesting 2 new executors because tasks are backlogged (new desired total will be 5)
17/01/12 23:29:09 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (node02:49339) with ID 1
17/01/12 23:29:09 INFO storage.BlockManagerMasterEndpoint: Registering block manager node02:37527 with 265.1 MB RAM, BlockManagerId(1, node02, 37527)
17/01/12 23:29:10 INFO spark.ExecutorAllocationManager: New executor 1 has registered (new total is 1)
17/01/12 23:29:11 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, node02, executor 1, partition 0,PROCESS_LOCAL, 2071 bytes)
17/01/12 23:29:11 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, node02, executor 1, partition 1,PROCESS_LOCAL, 2073 bytes)
17/01/12 23:29:16 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on node02:37527 (size: 1202.0 B, free: 265.1 MB)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, node02, executor 1, partition 2,PROCESS_LOCAL, 2073 bytes)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, node02, executor 1, partition 3,PROCESS_LOCAL, 2073 bytes)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, node02, executor 1, partition 4,PROCESS_LOCAL, 2073 bytes)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, node02, executor 1, partition 5,PROCESS_LOCAL, 2073 bytes)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, node02, executor 1, partition 6,PROCESS_LOCAL, 2073 bytes)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 6402 ms on node02 (executor 1) (1/10)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 250 ms on node02 (executor 1) (2/10)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 308 ms on node02 (executor 1) (3/10)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 6855 ms on node02 (executor 1) (4/10)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, node02, executor 1, partition 7,PROCESS_LOCAL, 2073 bytes)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 266 ms on node02 (executor 1) (5/10)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 190 ms on node02 (executor 1) (6/10)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, node02, executor 1, partition 8,PROCESS_LOCAL, 2073 bytes)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 156 ms on node02 (executor 1) (7/10)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, node02, executor 1, partition 9,PROCESS_LOCAL, 2073 bytes)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 149 ms on node02 (executor 1) (8/10)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 131 ms on node02 (executor 1) (9/10)
17/01/12 23:29:17 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 128 ms on node02 (executor 1) (10/10)
17/01/12 23:29:17 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:36) finished in 19.406 s
17/01/12 23:29:17 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 
17/01/12 23:29:17 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:36, took 22.241001 s
Pi is roughly 3.142676
17/01/12 23:29:18 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.2.100:4040
17/01/12 23:29:18 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
17/01/12 23:29:18 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
17/01/12 23:29:18 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down
17/01/12 23:29:19 INFO cluster.YarnClientSchedulerBackend: Stopped
17/01/12 23:29:19 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/01/12 23:29:19 INFO storage.MemoryStore: MemoryStore cleared
17/01/12 23:29:19 INFO storage.BlockManager: BlockManager stopped
17/01/12 23:29:19 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/01/12 23:29:19 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/01/12 23:29:19 INFO spark.SparkContext: Successfully stopped SparkContext
17/01/12 23:29:20 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/01/12 23:29:20 INFO util.ShutdownHookManager: Shutdown hook called
17/01/12 23:29:20 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
17/01/12 23:29:20 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-40750070-91a7-4a5b-ae27-1cfd733d0be8

上面已經計算出來了:Pi is roughly 3.142676

收工收工。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM