Spark on K8S環境部署細節


Spark on K8S環境部署細節

time: 2020-1-3

Spark on K8S環境部署細節

本文基於阿里雲ACK托管K8S集群
分為以下幾個部分:

  • spark-operator on ACK 安裝
  • spark wordcount讀寫OSS
  • spark histroy server on ACK 安裝

Spark operator安裝

准備kubectl客戶端和Helm客戶端

  • 配置本地或者內網機器kubectl客戶端.
  • 安裝helm

使用Aliyun 提供的CloudShell進行操作的時候,一來默認不會保存文件,二來容易連接超時,導致安裝spark operator失敗,重新安裝需要手動刪除spark operator的各類資源.

安裝Helm的方式:

mkdir -pv helm && cd helm
wget https://storage.googleapis.com/kubernetes-helm/helm-v2.9.1-linux-amd64.tar.gz
tar xf helm-v2.9.1-linux-amd64.tar.gz
sudo mv linux-amd64/helm /usr/local/bin
rm -rf linux-amd64
 # 查看版本,不顯示出server版本,因為還沒有安裝server
helm version

安裝spark operator

helm install incubator/sparkoperator \
--namespace spark-operator \
--set sparkJobNamespace=default \
--set operatorImageName=registry-vpc.us-east-1.aliyuncs.com/eci_open/spark-operator \
--set operatorVersion=v1beta2-1.0.1-2.4.4 \
--set enableWebhook=true \
--set ingressUrlFormat="\{\{\$appName\}\}.ACK測試域名" \
--set enableBatchScheduler=true	

Note:

  • operatorImageName:這里的region需要改成k8s集群所在區域,默認谷歌的鏡像是沒辦法拉到的,這里使用aliyun提供的鏡像.registry-vpc表示使用內網訪問registry下載鏡像.
  • ingressUrlFormat: 阿里雲的K8S集群會提供一個測試域名,可以替換成自己的.
    安裝完畢,我們需要手動創建下serviceaccount,使得后面提交的spark作業可以有權限創建driver,executor對應的pod,configMap等資源.

以下創建default:spark servicecount並綁定相關權限:
創建spark-rbac.yaml,並執行kubectl apply -f spark-rbac.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
 name: spark
 namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
 namespace: default
 name: spark-role
rules:
- apiGroups: [""]
 resources: ["pods"]
 verbs: ["*"]
- apiGroups: [""]
 resources: ["services"]
 verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
 name: spark-role-binding
 namespace: default
subjects:
- kind: ServiceAccount
 name: spark
 namespace: default
roleRef:
 kind: Role
 name: spark-role
 apiGroup: rbac.authorization.k8s.io

Spark wordcount 讀寫OSS

這里分為以下幾步:

  • 准備oss依賴的jar包
  • 准備支持oss文件系統的core-site.xml
  • 打包支持讀寫oss的spark容器鏡像
  • 准備wordcount作業

准備oss依賴的jar包

參照鏈接:https://help.aliyun.com/document_detail/146237.html?spm=a2c4g.11186623.2.16.4dce2e14IGuHEv
以下可以直接操作,下載到oss依賴的jar包

wget http://gosspublic.alicdn.com/hadoop-spark/hadoop-oss-hdp-2.6.1.0-129.tar.gz?spm=a2c4g.11186623.2.11.54b56c18VGGAzb&file=hadoop-oss-hdp-2.6.1.0-129.tar.gz

tar -xvf hadoop-oss-hdp-2.6.1.0-129.tar

hadoop-oss-hdp-2.6.1.0-129/
hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-ram-3.0.0.jar
hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-core-3.4.0.jar
hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-ecs-4.2.0.jar
hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-sts-3.0.0.jar
hadoop-oss-hdp-2.6.1.0-129/jdom-1.1.jar
hadoop-oss-hdp-2.6.1.0-129/aliyun-sdk-oss-3.4.1.jar
hadoop-oss-hdp-2.6.1.0-129/hadoop-aliyun-2.7.3.2.6.1.0-129.jar

准備core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. -->
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <!-- OSS配置 -->
    <property>
        <name>fs.oss.impl</name>
        <value>org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem</value>
    </property>
    <property>
        <name>fs.oss.endpoint</name>
        <value>oss-cn-hangzhou-internal.aliyuncs.com</value>
    </property>
    <property>
        <name>fs.oss.accessKeyId</name>
        <value>{臨時AK_ID}</value>
    </property>
    <property>
        <name>fs.oss.accessKeySecret</name>
        <value>{臨時AK_SECRET}</value>
    </property>
    <property>
        <name>fs.oss.buffer.dir</name>
        <value>/tmp/oss</value>
    </property>
    <property>
        <name>fs.oss.connection.secure.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>fs.oss.connection.maximum</name>
        <value>2048</value>
    </property>
</configuration>

打包支持讀寫oss的鏡像

下載spark安裝包解壓

wget http://apache.communilink.net/spark/spark-3.0.0-preview/spark-3.0.0-preview-bin-hadoop2.7.tgz
tar -xzvf spark-3.0.0-preview-bin-hadoop2.7.tgz

打包發布鏡像

在打包之前,需要准備一個docker registry, 可以是docker hub或者是aliyun提供的遠程鏡像服務.
這里我們使用aliyun的容器鏡像服務

  1. docker登錄鏡像服務
docker login --username=lanrish@1416336129779449 registry.us-east-1.aliyuncs.com

注:

  • 登錄建議使用docker免sudo的方式登錄,否則執行sudo docker login登錄之后,當前用戶無法創建鏡像.
  • registry.us-east-1.aliyuncs.com這里根據具體選擇的地區來決定,默認通過公網訪問,我們可以創建k8s集群和鏡像服務在同一個地區下(即配置統一的VPC服務),然后在registry后面加一個-vpc,即registry-vpc.us-east-1.aliyuncs.com,這樣k8s可以通過內網快速加載容器鏡像.
  1. 打包spark鏡像
    進入下載解壓好的spark路徑: cd spark-3.0.0-preview-bin-hadoop2.7
  2. 將oss依賴的jar拷貝到jars目錄.
  3. 將支持oss的core-site.xml放入conf目錄.
  4. 修改kubernetes/dockerfiles/spark/Dockerfile
    修改如下,重點在19,34,37行,主要為了可以讓spark通過HADOOP_CONF_DIR環境變量去自動加載core-site.xml,之所以這么麻煩而不使用ConfigMap,是因為spark 3.0目前存在bug,詳見: https://www.jianshu.com/p/d051aa95b241
  1. FROM openjdk:8-jdk-slim 
  2.  
  3. ARG spark_uid=185 
  4.  
  5. # Before building the docker image, first build and make a Spark distribution following 
  6. # the instructions in http://spark.apache.org/docs/latest/building-spark.html. 
  7. # If this docker file is being used in the context of building your images from a Spark 
  8. # distribution, the docker build command should be invoked from the top level directory 
  9. # of the Spark distribution. E.g.: 
  10. # docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile . 
  11.  
  12. RUN set -ex && \ 
  13. apt-get update && \ 
  14. ln -s /lib /lib64 && \ 
  15. apt install -y bash tini libc6 libpam-modules krb5-user libnss3 && \ 
  16. mkdir -p /opt/spark && \ 
  17. mkdir -p /opt/spark/examples && \ 
  18. mkdir -p /opt/spark/work-dir && \ 
  19. mkdir -p /opt/hadoop/conf && \ 
  20. touch /opt/spark/RELEASE && \ 
  21. rm /bin/sh && \ 
  22. ln -sv /bin/bash /bin/sh && \ 
  23. echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \ 
  24. chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \ 
  25. rm -rf /var/cache/apt/* 
  26.  
  27. COPY jars /opt/spark/jars 
  28. COPY bin /opt/spark/bin 
  29. COPY sbin /opt/spark/sbin 
  30. COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/ 
  31. COPY examples /opt/spark/examples 
  32. COPY kubernetes/tests /opt/spark/tests 
  33. COPY data /opt/spark/data 
  34. COPY conf/core-site.xml /opt/hadoop/conf 
  35. ENV SPARK_HOME /opt/spark 
  36. ENV HADOOP_HOME /opt/hadoop 
  37. ENV HADOOP_CONF_DIR /opt/hadoop/conf 
  38. WORKDIR /opt/spark/work-dir 
  39. RUN chmod g+w /opt/spark/work-dir 
  40.  
  41. ENTRYPOINT [ "/opt/entrypoint.sh" ] 
  42.  
  43. # Specify the User that the actual main process will run as 
  44. USER ${spark_uid} 
  1. 構建鏡像
# 構建鏡像
./bin/docker-image-tool.sh -r registry.us-east-1.aliyuncs.com/engineplus -t 3.0.0 build  
# 發布鏡像
docker push registry.us-east-1.aliyuncs.com/engineplus/spark:3.0.0

如果需要在鏡像中部署額外的依賴環境,則需要使用以下方式:
在spark當前目錄spark-3.0.0-preview-bin-hadoop2.7通過Dockerfile的方式構建自定義鏡像:

docker build -t registry.us-east-1.aliyuncs.com/spark:3.0.0 -f  kubernetes/dockerfiles/spark/Dockerfile

可以將自定義的依賴環境定義到kubernetes/dockerfiles/spark/Dockerfile中.

准備wordcount作業

wordcount作業可以從這里clone: https://github.com/i-mine/spark_k8s_wordcount
下載可以直接執行mvn clean package
得到wordcount jar: target/spark_k8s_wordcount-1.0-SNAPSHOT.jar

1. spark submit 提交

注: 這種提交方式中,可以上傳本地的jar,但是同時需要本地提交環境已經配置過hadoop關於oss的環境.

bin/spark-submit \
--master k8s://https://192.168.17.175:6443 \
--deploy-mode cluster \
--name com.mobvista.dataplatform.WordCount \
--class com.mobvista.dataplatform.WordCount \
--conf spark.kubernetes.file.upload.path=oss://mob-emr-test/lei.du/tmp \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=registry.us-east-1.aliyuncs.com/engineplus/spark:3.0.0-oss \
/home/hadoop/dulei/spark-3.0.0-preview2-bin-hadoop2.7/spark_k8s_wordcount-1.0-SNAPSHOT.jar

2. spark operator 提交

注: 這種提交方式中,spark依賴的jar只可以是鏡像中已經存在的或者是通過遠程訪問,無法自動將本地的jar上傳給spark作業,需要自己手動上傳到oss或者s3,且spark鏡像中已經存在oss或者s3的訪問配置和依賴的jar.
編寫spark operator word-count.yaml,這種方式需要提前將jar包打包到鏡像中,或者上傳到雲上.

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
 name: wordcount
 namespace: default
spec:
 type: Scala
 mode: cluster
 image: "registry.us-east-1.aliyuncs.com/engineplus/spark:3.0.0-oss"
 imagePullPolicy: IfNotPresent
 mainClass: com.mobvista.dataplatform.WordCount
 mainApplicationFile: "oss://mob-emr-test/lei.du/lib/spark_k8s_wordcount-1.0-SNAPSHOT.jar"
 sparkVersion: "3.0.0"
 restartPolicy:
 type: OnFailure
 onFailureRetries: 2
 onFailureRetryInterval: 5
 onSubmissionFailureRetries: 2
 onSubmissionFailureRetryInterval: 10
 timeToLiveSeconds: 3600
 sparkConf:
    "spark.kubernetes.allocation.batch.size": "10"
    "spark.eventLog.enabled": "true"
    "spark.eventLog.dir": "oss://mob-emr-test/lei.du/tmp/logs"
 hadoopConfigMap: oss-hadoop-dir
 driver:
 cores: 1
 memory: "1024m"
 labels:
 version: 3.0.0
 spark-app: spark-wordcount
 role: driver
 annotations:
      k8s.aliyun.com/eci-image-cache: "true"
 serviceAccount: spark
 executor:
 cores: 1
 instances: 1
 memory: "1024m"
 labels:
 version: 3.0.0
 role: executor
 annotations:
      k8s.aliyun.com/eci-image-cache: "true"

作業執行過程中我們可以獲取ingress-url進行訪問WEB UI查看作業執行狀態,但是作業執行完畢無法查看:

  1. $ kubectl describe sparkapplication 
  2. Name: wordcount 
  3. Namespace: default 
  4. Labels: <none> 
  5. Annotations: kubectl.kubernetes.io/last-applied-configuration: 
  6. {"apiVersion":"sparkoperator.k8s.io/v1beta2","kind":"SparkApplication","metadata":{"annotations":{},"name":"wordcount","namespace":"defaul... 
  7. API Version: sparkoperator.k8s.io/v1beta2 
  8. Kind: SparkApplication 
  9. Metadata: 
  10. Creation Timestamp: 2020-01-03T08:18:58Z 
  11. Generation: 2 
  12. Resource Version: 53192098 
  13. Self Link: /apis/sparkoperator.k8s.io/v1beta2/namespaces/default/sparkapplications/wordcount 
  14. UID: b0b1ff99-2e01-11ea-bf95-7e8505108e63 
  15. Spec: 
  16. Driver: 
  17. Annotations: 
  18. k8s.aliyun.com/eci-image-cache: true 
  19. Cores: 1 
  20. Labels: 
  21. Role: driver 
  22. Spark - App: spark-wordcount 
  23. Version: 3.0.0 
  24. Memory: 1024m 
  25. Service Account: spark 
  26. Executor: 
  27. Annotations: 
  28. k8s.aliyun.com/eci-image-cache: true 
  29. Cores: 1 
  30. Instances: 1 
  31. Labels: 
  32. Role: executor 
  33. Version: 3.0.0 
  34. Memory: 1024m 
  35. Image: registry.us-east-1.aliyuncs.com/engineplus/spark:3.0.0-oss-wordcount 
  36. Image Pull Policy: IfNotPresent 
  37. Main Application File: /opt/spark/jars/spark_k8s_wordcount-1.0-SNAPSHOT.jar 
  38. Main Class: WordCount 
  39. Mode: cluster 
  40. Restart Policy: 
  41. On Failure Retries: 2 
  42. On Failure Retry Interval: 5 
  43. On Submission Failure Retries: 2 
  44. On Submission Failure Retry Interval: 10 
  45. Type: OnFailure 
  46. Spark Conf: 
  47. spark.kubernetes.allocation.batch.size: 10 
  48. Spark Version: 3.0.0 
  49. Time To Live Seconds: 3600 
  50. Type: Scala 
  51. Status: 
  52. Application State: 
  53. Error Message: driver pod failed with ExitCode: 1, Reason: Error 
  54. State: FAILED 
  55. Driver Info: 
  56. Pod Name: wordcount-driver 
  57. Web UI Address: 172.21.14.219:4040 
  58. Web UI Ingress Address: wordcount.cac1e2ca4865f4164b9ce6dd46c769d59.us-east-1.alicontainer.com 
  59. Web UI Ingress Name: wordcount-ui-ingress 
  60. Web UI Port: 4040 
  61. Web UI Service Name: wordcount-ui-svc 
  62. Execution Attempts: 3 
  63. Last Submission Attempt Time: 2020-01-03T08:21:51Z 
  64. Spark Application Id: spark-4c66cd4e3e094571844bbc355a1b6a16 
  65. Submission Attempts: 1 
  66. Submission ID: e4ce0cb8-7719-4c6f-ade1-4c13e137de77 
  67. Termination Time: 2020-01-03T08:22:01Z 
  68. Events: 
  69. Type Reason Age From Message 
  70. ---- ------ ---- ---- ------- 
  71. Normal SparkApplicationAdded 7m20s spark-operator SparkApplication wordcount was added, enqueuing it for submission 
  72. Warning SparkApplicationFailed 6m20s spark-operator SparkApplication wordcount failed: driver pod failed with ExitCode: 101, Reason: Error 
  73. Normal SparkApplicationSpecUpdateProcessed 5m43s spark-operator Successfully processed spec update for SparkApplication wordcount 
  74. Warning SparkDriverFailed 4m47s (x5 over 7m10s) spark-operator Driver wordcount-driver failed 
  75. Warning SparkApplicationPendingRerun 4m32s (x5 over 7m2s) spark-operator SparkApplication wordcount is pending rerun 
  76. Normal SparkApplicationSubmitted 4m27s (x6 over 7m16s) spark-operator SparkApplication wordcount was submitted successfully 
  77. Normal SparkDriverRunning 4m24s (x6 over 7m14s) spark-operator Driver wordcount-driver is running 

安裝Spark Histroy Server On K8S

這里我們使用由Helm chart提供的Spark History Server
GitHub: https://github.com/SnappyDataInc/spark-on-k8s/tree/master/charts/spark-hs?spm=5176.2020520152.0.0.2d5916ddP2xqfh
為了方便,直接通過Aliyun的應用市場進行安裝:
應用介紹: https://cs.console.aliyun.com/#/k8s/catalog/detail/incubator_ack-spark-history-server

在創建之前,填寫oss相關的配置,然后創建即可:
enter description here

安裝完畢通過查看k8s的server,可以獲取到spark history server的訪問地址
enter description here

創建成功后,提交作業的時候,需要添加兩條配置:

 "spark.eventLog.enabled": "true"
 "spark.eventLog.dir": "oss://mob-emr-test/lei.du/tmp/logs"

這樣提交的作業日志就會存儲在OSS.

enter description here
enter description here


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM