Spark on K8S環境部署細節

本文轉載自查看原文 2020-02-06 09:20 6574 Spark

Spark on K8S環境部署細節

spark k8s

time: 2020-1-3

Spark on K8S環境部署細節

Spark on K8S環境部署細節

本文基於阿里雲ACK托管K8S集群
分為以下幾個部分:

spark-operator on ACK 安裝
spark wordcount讀寫OSS
spark histroy server on ACK 安裝

Spark operator安裝

准備kubectl客戶端和Helm客戶端

配置本地或者內網機器kubectl客戶端.
安裝helm

使用Aliyun 提供的CloudShell進行操作的時候,一來默認不會保存文件,二來容易連接超時,導致安裝spark operator失敗,重新安裝需要手動刪除spark operator的各類資源.

安裝Helm的方式:

mkdir -pv helm && cd helm
wget https://storage.googleapis.com/kubernetes-helm/helm-v2.9.1-linux-amd64.tar.gz
tar xf helm-v2.9.1-linux-amd64.tar.gz
sudo mv linux-amd64/helm /usr/local/bin
rm -rf linux-amd64
 # 查看版本，不顯示出server版本，因為還沒有安裝server
helm version

安裝spark operator

helm install incubator/sparkoperator \
--namespace spark-operator \
--set sparkJobNamespace=default \
--set operatorImageName=registry-vpc.us-east-1.aliyuncs.com/eci_open/spark-operator \
--set operatorVersion=v1beta2-1.0.1-2.4.4 \
--set enableWebhook=true \
--set ingressUrlFormat="\{\{\$appName\}\}.ACK測試域名" \
--set enableBatchScheduler=true

Note:

operatorImageName:這里的region需要改成k8s集群所在區域,默認谷歌的鏡像是沒辦法拉到的,這里使用aliyun提供的鏡像.registry-vpc表示使用內網訪問registry下載鏡像.
ingressUrlFormat: 阿里雲的K8S集群會提供一個測試域名,可以替換成自己的.
安裝完畢,我們需要手動創建下serviceaccount,使得后面提交的spark作業可以有權限創建driver,executor對應的pod,configMap等資源.

以下創建default:spark servicecount並綁定相關權限:
創建spark-rbac.yaml,並執行kubectl apply -f spark-rbac.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
 name: spark
 namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
 namespace: default
 name: spark-role
rules:
- apiGroups: [""]
 resources: ["pods"]
 verbs: ["*"]
- apiGroups: [""]
 resources: ["services"]
 verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
 name: spark-role-binding
 namespace: default
subjects:
- kind: ServiceAccount
 name: spark
 namespace: default
roleRef:
 kind: Role
 name: spark-role
 apiGroup: rbac.authorization.k8s.io

Spark wordcount 讀寫OSS

這里分為以下幾步:

准備oss依賴的jar包
准備支持oss文件系統的core-site.xml
打包支持讀寫oss的spark容器鏡像
准備wordcount作業

准備oss依賴的jar包

參照鏈接:https://help.aliyun.com/document_detail/146237.html?spm=a2c4g.11186623.2.16.4dce2e14IGuHEv
以下可以直接操作,下載到oss依賴的jar包

wget http://gosspublic.alicdn.com/hadoop-spark/hadoop-oss-hdp-2.6.1.0-129.tar.gz?spm=a2c4g.11186623.2.11.54b56c18VGGAzb&file=hadoop-oss-hdp-2.6.1.0-129.tar.gz

tar -xvf hadoop-oss-hdp-2.6.1.0-129.tar

hadoop-oss-hdp-2.6.1.0-129/
hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-ram-3.0.0.jar
hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-core-3.4.0.jar
hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-ecs-4.2.0.jar
hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-sts-3.0.0.jar
hadoop-oss-hdp-2.6.1.0-129/jdom-1.1.jar
hadoop-oss-hdp-2.6.1.0-129/aliyun-sdk-oss-3.4.1.jar
hadoop-oss-hdp-2.6.1.0-129/hadoop-aliyun-2.7.3.2.6.1.0-129.jar

准備core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. -->
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <!-- OSS配置 -->
    <property>
        <name>fs.oss.impl</name>
        <value>org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem</value>
    </property>
    <property>
        <name>fs.oss.endpoint</name>
        <value>oss-cn-hangzhou-internal.aliyuncs.com</value>
    </property>
    <property>
        <name>fs.oss.accessKeyId</name>
        <value>{臨時AK_ID}</value>
    </property>
    <property>
        <name>fs.oss.accessKeySecret</name>
        <value>{臨時AK_SECRET}</value>
    </property>
    <property>
        <name>fs.oss.buffer.dir</name>
        <value>/tmp/oss</value>
    </property>
    <property>
        <name>fs.oss.connection.secure.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>fs.oss.connection.maximum</name>
        <value>2048</value>
    </property>
</configuration>

打包支持讀寫oss的鏡像

下載spark安裝包解壓

wget http://apache.communilink.net/spark/spark-3.0.0-preview/spark-3.0.0-preview-bin-hadoop2.7.tgz
tar -xzvf spark-3.0.0-preview-bin-hadoop2.7.tgz

打包發布鏡像

在打包之前,需要准備一個docker registry, 可以是docker hub或者是aliyun提供的遠程鏡像服務.
這里我們使用aliyun的容器鏡像服務

docker登錄鏡像服務

docker login --username=lanrish@1416336129779449 registry.us-east-1.aliyuncs.com

注:

登錄建議使用docker免sudo的方式登錄,否則執行sudo docker login登錄之后,當前用戶無法創建鏡像.
registry.us-east-1.aliyuncs.com這里根據具體選擇的地區來決定,默認通過公網訪問,我們可以創建k8s集群和鏡像服務在同一個地區下(即配置統一的VPC服務),然后在registry后面加一個-vpc,即registry-vpc.us-east-1.aliyuncs.com,這樣k8s可以通過內網快速加載容器鏡像.

打包spark鏡像
進入下載解壓好的spark路徑: cd spark-3.0.0-preview-bin-hadoop2.7
將oss依賴的jar拷貝到jars目錄.
將支持oss的core-site.xml放入conf目錄.
修改kubernetes/dockerfiles/spark/Dockerfile
修改如下,重點在19,34,37行,主要為了可以讓spark通過HADOOP_CONF_DIR環境變量去自動加載core-site.xml,之所以這么麻煩而不使用ConfigMap,是因為spark 3.0目前存在bug,詳見: https://www.jianshu.com/p/d051aa95b241

 
             FROM openjdk:8-jdk-slim 
 
ARG spark_uid=185 
 
# Before building the docker image, first build and make a Spark distribution following 
 
# the instructions in http://spark.apache.org/docs/latest/building-spark.html. 
 
# If this docker file is being used in the context of building your images from a Spark 
 
# distribution, the docker build command should be invoked from the top level directory 
 
# of the Spark distribution. E.g.: 
 
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile . 
 
RUN set -ex && \ 
 
 apt-get update && \ 
 
 ln -s /lib /lib64 && \ 
 
 apt install -y bash tini libc6 libpam-modules krb5-user libnss3 && \ 
 
 mkdir -p /opt/spark && \ 
 
 mkdir -p /opt/spark/examples && \ 
 
 mkdir -p /opt/spark/work-dir && \ 
 
 mkdir -p /opt/hadoop/conf && \ 
 
 touch /opt/spark/RELEASE && \ 
 
 rm /bin/sh && \ 
 
 ln -sv /bin/bash /bin/sh && \ 
 
 echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \ 
 
 chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \ 
 
 rm -rf /var/cache/apt/* 
 
COPY jars /opt/spark/jars 
 
COPY bin /opt/spark/bin 
 
COPY sbin /opt/spark/sbin 
 
COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/ 
 
COPY examples /opt/spark/examples 
 
COPY kubernetes/tests /opt/spark/tests 
 
COPY data /opt/spark/data 
 
COPY conf/core-site.xml /opt/hadoop/conf 
 
ENV SPARK_HOME /opt/spark 
 
ENV HADOOP_HOME /opt/hadoop 
 
ENV HADOOP_CONF_DIR /opt/hadoop/conf 
 
WORKDIR /opt/spark/work-dir 
 
RUN chmod g+w /opt/spark/work-dir 
 
ENTRYPOINT [ "/opt/entrypoint.sh" ] 
 
# Specify the User that the actual main process will run as 
 
USER ${spark_uid}

構建鏡像

# 構建鏡像
./bin/docker-image-tool.sh -r registry.us-east-1.aliyuncs.com/engineplus -t 3.0.0 build  
# 發布鏡像
docker push registry.us-east-1.aliyuncs.com/engineplus/spark:3.0.0

如果需要在鏡像中部署額外的依賴環境,則需要使用以下方式:
在spark當前目錄spark-3.0.0-preview-bin-hadoop2.7通過Dockerfile的方式構建自定義鏡像:

docker build -t registry.us-east-1.aliyuncs.com/spark:3.0.0 -f  kubernetes/dockerfiles/spark/Dockerfile

可以將自定義的依賴環境定義到kubernetes/dockerfiles/spark/Dockerfile中.

准備wordcount作業

wordcount作業可以從這里clone: https://github.com/i-mine/spark_k8s_wordcount
下載可以直接執行mvn clean package
得到wordcount jar: target/spark_k8s_wordcount-1.0-SNAPSHOT.jar

1. spark submit 提交

注: 這種提交方式中,可以上傳本地的jar,但是同時需要本地提交環境已經配置過hadoop關於oss的環境.

bin/spark-submit \
--master k8s://https://192.168.17.175:6443 \
--deploy-mode cluster \
--name com.mobvista.dataplatform.WordCount \
--class com.mobvista.dataplatform.WordCount \
--conf spark.kubernetes.file.upload.path=oss://mob-emr-test/lei.du/tmp \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=registry.us-east-1.aliyuncs.com/engineplus/spark:3.0.0-oss \
/home/hadoop/dulei/spark-3.0.0-preview2-bin-hadoop2.7/spark_k8s_wordcount-1.0-SNAPSHOT.jar

2. spark operator 提交

注: 這種提交方式中,spark依賴的jar只可以是鏡像中已經存在的或者是通過遠程訪問,無法自動將本地的jar上傳給spark作業,需要自己手動上傳到oss或者s3,且spark鏡像中已經存在oss或者s3的訪問配置和依賴的jar.
編寫spark operator word-count.yaml,這種方式需要提前將jar包打包到鏡像中,或者上傳到雲上.

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
 name: wordcount
 namespace: default
spec:
 type: Scala
 mode: cluster
 image: "registry.us-east-1.aliyuncs.com/engineplus/spark:3.0.0-oss"
 imagePullPolicy: IfNotPresent
 mainClass: com.mobvista.dataplatform.WordCount
 mainApplicationFile: "oss://mob-emr-test/lei.du/lib/spark_k8s_wordcount-1.0-SNAPSHOT.jar"
 sparkVersion: "3.0.0"
 restartPolicy:
 type: OnFailure
 onFailureRetries: 2
 onFailureRetryInterval: 5
 onSubmissionFailureRetries: 2
 onSubmissionFailureRetryInterval: 10
 timeToLiveSeconds: 3600
 sparkConf:
    "spark.kubernetes.allocation.batch.size": "10"
    "spark.eventLog.enabled": "true"
    "spark.eventLog.dir": "oss://mob-emr-test/lei.du/tmp/logs"
 hadoopConfigMap: oss-hadoop-dir
 driver:
 cores: 1
 memory: "1024m"
 labels:
 version: 3.0.0
 spark-app: spark-wordcount
 role: driver
 annotations:
      k8s.aliyun.com/eci-image-cache: "true"
 serviceAccount: spark
 executor:
 cores: 1
 instances: 1
 memory: "1024m"
 labels:
 version: 3.0.0
 role: executor
 annotations:
      k8s.aliyun.com/eci-image-cache: "true"

作業執行過程中我們可以獲取ingress-url進行訪問WEB UI查看作業執行狀態,但是作業執行完畢無法查看:

 
             $ kubectl describe sparkapplication 
 
Name: wordcount 
 
Namespace: default 
 
Labels: <none> 
 
Annotations: kubectl.kubernetes.io/last-applied-configuration: 
 
 {"apiVersion":"sparkoperator.k8s.io/v1beta2","kind":"SparkApplication","metadata":{"annotations":{},"name":"wordcount","namespace":"defaul... 
 
API Version: sparkoperator.k8s.io/v1beta2 
 
Kind: SparkApplication 
 
Metadata: 
 
 Creation Timestamp: 2020-01-03T08:18:58Z 
 
 Generation: 2 
 
 Resource Version: 53192098 
 
 Self Link: /apis/sparkoperator.k8s.io/v1beta2/namespaces/default/sparkapplications/wordcount 
 
 UID: b0b1ff99-2e01-11ea-bf95-7e8505108e63 
 
Spec: 
 
 Driver: 
 
 Annotations: 
 
 k8s.aliyun.com/eci-image-cache: true 
 
 Cores: 1 
 
 Labels: 
 
 Role: driver 
 
 Spark - App: spark-wordcount 
 
 Version: 3.0.0 
 
 Memory: 1024m 
 
 Service Account: spark 
 
 Executor: 
 
 Annotations: 
 
 k8s.aliyun.com/eci-image-cache: true 
 
 Cores: 1 
 
 Instances: 1 
 
 Labels: 
 
 Role: executor 
 
 Version: 3.0.0 
 
 Memory: 1024m 
 
 Image: registry.us-east-1.aliyuncs.com/engineplus/spark:3.0.0-oss-wordcount 
 
 Image Pull Policy: IfNotPresent 
 
 Main Application File: /opt/spark/jars/spark_k8s_wordcount-1.0-SNAPSHOT.jar 
 
 Main Class: WordCount 
 
 Mode: cluster 
 
 Restart Policy: 
 
 On Failure Retries: 2 
 
 On Failure Retry Interval: 5 
 
 On Submission Failure Retries: 2 
 
 On Submission Failure Retry Interval: 10 
 
 Type: OnFailure 
 
 Spark Conf: 
 
 spark.kubernetes.allocation.batch.size: 10 
 
 Spark Version: 3.0.0 
 
 Time To Live Seconds: 3600 
 
 Type: Scala 
 
Status: 
 
 Application State: 
 
 Error Message: driver pod failed with ExitCode: 1, Reason: Error 
 
 State: FAILED 
 
 Driver Info: 
 
 Pod Name: wordcount-driver 
 
 Web UI Address: 172.21.14.219:4040 
 
 Web UI Ingress Address: wordcount.cac1e2ca4865f4164b9ce6dd46c769d59.us-east-1.alicontainer.com 
 
 Web UI Ingress Name: wordcount-ui-ingress 
 
 Web UI Port: 4040 
 
 Web UI Service Name: wordcount-ui-svc 
 
 Execution Attempts: 3 
 
 Last Submission Attempt Time: 2020-01-03T08:21:51Z 
 
 Spark Application Id: spark-4c66cd4e3e094571844bbc355a1b6a16 
 
 Submission Attempts: 1 
 
 Submission ID: e4ce0cb8-7719-4c6f-ade1-4c13e137de77 
 
 Termination Time: 2020-01-03T08:22:01Z 
 
Events: 
 
 Type Reason Age From Message 
 
 ---- ------ ---- ---- ------- 
 
 Normal SparkApplicationAdded 7m20s spark-operator SparkApplication wordcount was added, enqueuing it for submission 
 
 Warning SparkApplicationFailed 6m20s spark-operator SparkApplication wordcount failed: driver pod failed with ExitCode: 101, Reason: Error 
 
 Normal SparkApplicationSpecUpdateProcessed 5m43s spark-operator Successfully processed spec update for SparkApplication wordcount 
 
 Warning SparkDriverFailed 4m47s (x5 over 7m10s) spark-operator Driver wordcount-driver failed 
 
 Warning SparkApplicationPendingRerun 4m32s (x5 over 7m2s) spark-operator SparkApplication wordcount is pending rerun 
 
 Normal SparkApplicationSubmitted 4m27s (x6 over 7m16s) spark-operator SparkApplication wordcount was submitted successfully 
 
 Normal SparkDriverRunning 4m24s (x6 over 7m14s) spark-operator Driver wordcount-driver is running

安裝Spark Histroy Server On K8S

這里我們使用由Helm chart提供的Spark History Server
GitHub: https://github.com/SnappyDataInc/spark-on-k8s/tree/master/charts/spark-hs?spm=5176.2020520152.0.0.2d5916ddP2xqfh
為了方便,直接通過Aliyun的應用市場進行安裝:
應用介紹: https://cs.console.aliyun.com/#/k8s/catalog/detail/incubator_ack-spark-history-server

在創建之前,填寫oss相關的配置,然后創建即可:
enter description here

安裝完畢通過查看k8s的server,可以獲取到spark history server的訪問地址
enter description here

創建成功后,提交作業的時候,需要添加兩條配置:

 "spark.eventLog.enabled": "true"
 "spark.eventLog.dir": "oss://mob-emr-test/lei.du/tmp/logs"

這樣提交的作業日志就會存儲在OSS.

enter description here

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 k8s部署spark k8s環境部署（一）腳本部署k8s環境 k8s生產環境部署 Spark部署到K8S集群--Kubernetes Native 部署k8s（一） k8s部署 31, k8s 之 jenkins 部署到k8s 環境中自動部署成Pod Spark on K8S - Operator Spark on K8S（Standalone）