Spark on K8S - Operator




版本要求

Spark 官方沒有開發 Spark Operator,現在是由 Google 開發的

這個 Operator 使用的 Spark on K8S 方案和 Spark 官方一樣,只是包了一層,使得可以像聲明其他 K8S 的應用(比如聲明 Service)一樣聲明 Spark 任務,也可以像其他 K8S 應用一樣,支持自動重啟、失敗重試、更新、掛載配置文件等功能

缺點就是受 Operator 自身的版本和實現的制約

如果使用官方 Spark on K8S 的 cluster 模式,需要自己實現一個用於提交 spark 任務的 pod(起類似 operator 的作用)

如果使用官方 Spark on K8S 的 client 模式,就不需要額外的 pod 或 operator,但每個 Spark 任務需要自己為 driver 配置用於和 executor 之間連接的端口

對 Helm 和 K8S 的版本要求參考官網

https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/tree/master/charts/spark-operator-chart

Prerequisites

Helm >= 3
Kubernetes >= 1.13

Operator 和 Spark 版本

https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/#version-matrix

Spark Operator 啟動后也是一個 Pod,會在調用 K8S 的 kubectl apply -f test.yml 命令的時候,讀取配置文件的內容,然后調用 spark-submit 命令啟動 spark 任務,所以特定版本的 operator 是基於特定版本的 spark 的,除非這個 operator 安裝了多個版本的 spark 然后在 test.yml 指定版本,但現在的實現應該是沒有的

啟動 minikube

sudo minikube start --driver=none \
                    --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers \
                    --kubernetes-version="v1.16.3"

這是輕量的 K8S 環境

安裝 Helm

下載

wget https://get.helm.sh/helm-v3.0.2-linux-amd64.tar.gz
tar zxvf helm-v3.0.2-linux-amd64.tar.gz
sudo cp ./linux-amd64/helm /usr/local/bin/

查看 helm 命令

> helm version
version.BuildInfo{Version:"v3.0.2", GitCommit:"19e47ee3283ae98139d98460de796c1be1e3975f", GitTreeState:"clean", GoVersion:"go1.13.5"}

添加常用源

helm repo add stable https://kubenetes-charts.storage.googleapis.com
helm repo add incubator https://kubernetes-charts-incubator.storage.googleapis.com
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo add aliyuncs https://apphub.aliyuncs.com

查找 chart

helm search repo tomcat

返回

NAME            CHART VERSION   APP VERSION     DESCRIPTION
aliyuncs/tomcat 6.2.3           9.0.31          Chart for Apache Tomcat
bitnami/tomcat  9.5.3           10.0.12         Chart for Apache Tomcat

google 的源可能訪問不了

使用 helm 安裝 spark-operator

添加源

helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator

安裝 spark operator

helm install my-release spark-operator/spark-operator

返回

NAME: my-release
LAST DEPLOYED: Fri Nov  5 11:53:04 2021
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

如果要卸載

helm uninstall my-release

# 可能有殘留的東西要刪除
kubectl delete serviceaccounts my-release-spark-operator
kubectl delete clusterrole my-release-spark-operator
kubectl delete clusterrolebindings my-release-spark-operator

成功啟動后可以看到有個 spark operator 的 pod 在運行

spark operator 的 image 可能會沒權限下載,導致 operator 的 pod 報錯起不來

        message: Back-off pulling image "gcr.io/spark-operator/spark-operator:latest"
        reason: ImagePullBackOff

可以通過其他方式下載下來后,用 docker tag 20144a306214 gcr.io/spark-operator/spark-operator:latest 命令改 tag

通過 helm 創建 spark operator 的時候會自動創建 spark 的 service account 用於申請和操作 pod,也可以自己創建 service account

apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark
  namespace: spark
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: spark
  name: spark-role
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["*"]
- apiGroups: [""]
  resources: ["services"]
  verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: spark-role-binding
  namespace: spark
subjects:
- kind: ServiceAccount
  name: spark
  namespace: spark
roleRef:
  kind: Role
  name: spark-role
  apiGroup: rbac.authorization.k8s.io

提交 spark 應用的時候需要指定這個 service account

提交 spark 任務

配置文件像這樣

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: "gcr.io/spark-operator/spark:v3.1.1"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar"
  sparkVersion: "3.1.1"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.1.1
    serviceAccount: spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.1.1
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

這里的 image 必須包含要執行的 spark 代碼,local 指的是 image 里面的文件

這里雖然可以配 sparkVersion 但從 operator 的代碼看,應該是沒用到的,所以 operator 應該只用固定版本的 spark-submit 命令

啟動 spark 任務

kubectl apply -f spark-test.yaml

如果啟動成功,就可以看到相應的 driver pod 和 executor pod 在運行

Schedule 機制

spark operator 支持 cron 機制,只需要改成 ScheduledSparkApplication 類型

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: ScheduledSparkApplication
metadata:
  name: spark-pi-scheduled
  namespace: default
spec:
  schedule: "@every 5m"
  concurrencyPolicy: Allow
  template:
    type: Scala
    mode: cluster
    image: "gcr.io/spark-operator/spark:v3.1.1"
    imagePullPolicy: Always
    mainClass: org.apache.spark.examples.SparkPi
    mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar"
    sparkVersion: "3.1.1"
    restartPolicy:
      type: Never
    driver:
      cores: 1
      coreLimit: "1200m"
      memory: "512m"
      labels:
        version: 3.1.1
      serviceAccount: spark
    executor:
      cores: 1
      instances: 1
      memory: "512m"
      labels:
        version: 3.1.1

這里的 schedule 也可以是

"*/10 * * * *"

最小單位是分鍾

Metric

https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/quick-start-guide.md#enable-metric-exporting-to-prometheus

spark operator 有 metric 接口可以暴露一系列 metrics (比如成功數量、失敗數量、運行數量等等) 給 Prometheus

Helm 安裝 operator 的時候默認打開了 metric,如果要關掉可以

helm install my-release spark-operator/spark-operator --namespace spark-operator --set metrics.enable=false

可以配置 spark-operator 的 deployment 修改 metric 的路徑、端口等

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sparkoperator
  namespace: spark-operator
  labels:
    app.kubernetes.io/name: sparkoperator
    app.kubernetes.io/version: v1beta2-1.3.0-3.1.1
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: sparkoperator
      app.kubernetes.io/version: v1beta2-1.3.0-3.1.1
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "10254"
        prometheus.io/path: "/metrics"
      labels:
        app.kubernetes.io/name: sparkoperator
        app.kubernetes.io/version: v1beta2-1.3.0-3.1.1
    spec:
      serviceAccountName: sparkoperator
      containers:
      - name: sparkoperator
        image: gcr.io/spark-operator/spark-operator:v1beta2-1.3.0-3.1.1
        imagePullPolicy: Always
        ports:
          - containerPort: 10254
        args:
        - -logtostderr
        - -enable-metrics=true
        - -metrics-labels=app_type

可以配置 spark app 要不要暴露 metric

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: "gcr.io/spark-operator/spark:v3.1.1-gcs-prometheus"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar"
  arguments:
    - "100000"
  sparkVersion: "3.1.1"
  restartPolicy:
    type: Never
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.1.1
    serviceAccount: spark
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.1.1
  monitoring:
    exposeDriverMetrics: true
    exposeExecutorMetrics: true
    prometheus:
      jmxExporterJar: "/prometheus/jmx_prometheus_javaagent-0.11.0.jar"
      port: 8090

spark operator 如果重啟這些 metric 會被重置




免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM