一、thanos架構詳解
1.1、thanos是什么?
thanos是prometheus的高可用解決方案之一,thanos與prometheus無縫集成,並提高了一些高級特性,滿足了長期存儲 + 無限拓展 + 全局視圖 + 無侵入性的需求
1.2、thanos架構
這張圖中包含了 Thanos 的幾個核心組件,但並不包括所有組件,簡單介紹下圖中幾個組件:
Thanos Sidecar:連接 Prometheus,將其數據提供給 Thanos Query 查詢,並且/或者將其上傳到對象存儲,以供長期存儲
Thanos Query:實現了 Prometheus API,提供全局查詢視圖,將來StoreAPI提供的數據進行聚合最終返回給查詢數據的client(如grafana)
Thanos Store Gateway:將對象存儲的數據暴露給 Thanos Query 去查詢。
Thanos Ruler:對監控數據進行評估和告警,還可以計算出新的監控數據,將這些新數據提供給 Thanos Query 查詢並且/或者上傳到對象存儲,以供長期存儲。
Thanos Compact:將對象存儲中的數據進行壓縮和降低采樣率,加速大時間區間監控數據查詢的速度
Thanos Receiver:從 Prometheus 的遠程寫入 WAL 接收數據,將其公開和/或上傳到雲存儲。
1.3、架構設計剖析
Query 與 Sidecar
首先,監控數據的查詢肯定不能直接查 Prometheus 了,因為會存在許多個 Prometheus 實例,每個 Prometheus 實例只能感知它自己所采集的數據
Thanos Query 實現了 Prometheus 的 HTTP API,能夠 “看懂” PromQL。這樣,查詢 Prometheus 監控數據的 client 就不直接查詢 Prometheus 本身了,而是去查詢 Thanos Query,Thanos Query 再去下游多個存儲了數據的地方查數據,最后將這些數據聚合去重后返回給 client,也就實現了分布式 Prometheus 的數據查詢
那么 Thanos Query 又如何去查下游分散的數據呢?Thanos 為此抽象了一套叫 Store API 的內部 gRPC 接口,其它一些組件通過這個接口來暴露數據給 Thanos Query,它自身也就可以做到完全無狀態部署,實現高可用與動態擴展。
這些分散的數據可能來自哪些地方呢?
首先,Prometheus 會將采集的數據存到本機磁盤上,如果我們直接用這些分散在各個磁盤上的數據,可以給每個 Prometheus 附帶部署一個 Sidecar,這個 Sidecar 實現 Thanos Store API,當 Thanos Query 對其發起查詢時,Sidecar 就讀取跟它綁定部署的 Prometheus 實例上的監控數據返回給 Thanos Query
由於 Thanos Query 可以對數據進行聚合與去重,所以可以很輕松實現高可用:相同的 Prometheus 部署多個副本(都附帶 Sidecar),然后 Thanos Query 去所有 Sidecar 查數據,即便有一個 Prometheus 實例掛掉過一段時間,數據聚合與去重后仍然能得到完整數據
不過因為磁盤空間有限,Prometheus 存儲監控數據的能力也是有限的,通常會給 Prometheus 設置一個數據過期時間(默認 15 天)或者最大數據量大小,不斷清理舊數據以保證磁盤不被撐爆。因此,我們無法看到時間比較久遠的監控數據,有時候這也給我們的問題排查和數據統計造成一些困難
對於需要長期存儲的數據,並且使用頻率不那么高,最理想的方式是存進對象存儲
Store Gateway
那么這些被上傳到了對象存儲里的監控數據該如何查詢呢?理論上 Thanos Query 也可以直接去對象存儲查,但這會讓 Thanos Query 的邏輯變的很重。我們剛才也看到了,Thanos 抽象出了 Store API,只要實現了該接口的組件都可以作為 Thanos Query 查詢的數據源,Thanos Store Gateway 這個組件也實現了 Store API,向 Thanos Query 暴露對象存儲的數據。Thanos Store Gateway 內部還做了一些加速數據獲取的優化邏輯,一是緩存了 TSDB 索引,二是優化了對象存儲的請求 (用盡可能少的請求量拿到所有需要的數據)
這樣就實現了監控數據的長期儲存,由於對象存儲容量無限,所以理論上我們可以存任意時長的數據,監控歷史數據也就變得可追溯查詢,便於問題排查與統計分析
Ruler
有一個問題,Prometheus 不僅僅只支持將采集的數據進行存儲和查詢的功能,還可以配置一些 rules:
- 根據配置不斷計算出新指標數據並存儲,后續查詢時直接使用計算好的新指標,這樣可以減輕查詢時的計算壓力,加快查詢速度。
- 不斷計算和評估是否達到告警閥值,當達到閥值時就通知 AlertManager 來觸發告警。
由於我們將 Prometheus 進行分布式部署,每個 Prometheus 實例本地並沒有完整數據,有些有關聯的數據可能存在多個 Prometheus 實例中,單機 Prometheus 看不到數據的全局視圖,這種情況我們就不能依賴 Prometheus 來做這些工作
這時,Thanos Ruler 就能大顯身手了。它通過查詢 Thanos Query 獲取全局數據,然后根據 rules 配置計算新指標並存儲,同時也通過 Store API 將數據暴露給 Thanos Query,同樣還可以將數據上傳到對象存儲以供長期保存(這里上傳到對象存儲中的數據一樣也是通過 Thanos Store Gateway 暴露給 Thanos Query)
看起來 Thanos Query 跟 Thanos Ruler 之間會相互查詢,不過這個不沖突,Thanos Ruler 為 Thanos Query 提供計算出的新指標數據,而 Thanos Query 為 Thanos Ruler 提供計算新指標所需要的全局原始指標數據。
至此,Thanos 的核心能力基本實現了,完全兼容 Prometheus 情況下提供數據查詢的全局視圖、高可用以及數據的長期保存。
那我們還可以怎么進一步做優化呢?
Compact
由於我們有數據長期存儲的能力,也就可以實現查詢較大時間范圍的監控數據,當時間范圍很大時,查詢的數據量也會很大,這會導致查詢速度非常慢。
通常在查看較大時間范圍的監控數據時,我們並不需要那么詳細的數據,只需要看到大致就行。這時我們可以用到 Thanos Compact,它可以讀取對象存儲的數據,對其進行壓縮以及降采樣再上傳到對象存儲,這樣在查詢大時間范圍數據時就可以只讀取壓縮和降采樣后的數據,極大地減少了查詢的數據量,從而加速查詢
1.4、Sidecar模式和Receiver模式
Receiver 是做什么的呢?為什么需要 Receiver?它跟 Sidecar 有什么區別?
它們都可以將數據上傳到對象存儲以供長期保存,區別在於最新數據的存儲。
由於數據上傳不可能實時,Sidecar 模式將最新的監控數據存到 Prometheus 本機,Query 通過調所有 Sidecar 的 Store API 來獲取最新數據,這就成一個問題:如果 Sidecar 數量非常多或者 Sidecar 跟 Query 離的比較遠,每次查詢 Query 都調所有 Sidecar 會消耗很多資源,並且速度很慢,而我們查看監控大多數情況都是看的最新數據。
為了解決這個問題,Thanos Receiver 組件被提出,它適配了 Prometheus 的 remote write API,也就是所有 Prometheus 實例可以實時將數據 push 到 Thanos Receiver,最新數據也得以集中起來,然后 Thanos Query 也不用去所有 Sidecar 查最新數據了,直接查 Thanos Receiver 即可。
另外,Thanos Receiver 也將數據上傳到對象存儲以供長期保存,當然,對象存儲中的數據同樣由 Thanos Store Gateway 暴露給 Thanos Query。
有同學可能會問:如果規模很大,Receiver 壓力會不會很大,成為性能瓶頸?當然,設計者在設計這個組件時肯定會考慮這個問題,Receiver 實現了一致性哈希,支持集群部署,所以即使規模很大也不會成為性能瓶頸
二、Thanos部署
Thanos 支持雲原生部署方式,充分利用 Kubernetes 的資源調度與動態擴容能力。從官方文檔里可以看到,當前 Thanos 在 Kubernetes 上部署有以下三種:
- prometheus-operator:集群中安裝了 prometheus-operator 后,就可以通過創建 CRD 對象來部署 Thanos 了;
- 社區貢獻的一些 helm charts:很多個版本,目標都是能夠使用 helm 來一鍵部署 thanos;
- kube-thanos:Thanos 官方的開源項目,包含部署 thanos 到 kubernetes 的 jsonnet 模板與 yaml 示例。
本文將通過prometheus-operator方式部署thanos
2.1、架構圖
root@deploy:~# cat /etc/issue Ubuntu 20.04.3 LTS \n \l 192.168.1.100 deploy # 部署和管理k8s的節點 192.168.1.101 devops-master # 集群版本 v1.18.9 192.168.1.102 devops-node1 192.168.1.103 devops-node2 192.168.1.110 test-master # 集群版本 v1.18.9 192.168.1.111 test-node1 192.168.1.112 test-node2 192.168.1.200 nfs-server
部署k8s集群請參考:https://www.cnblogs.com/zhrx/p/15884118.html
2.2、部署nfs-server
root@nfs-server:~# apt install nfs-server nfs-common -y root@nfs-server:~# vim /etc/exports # /etc/exports: the access control list for filesystems which may be exported # to NFS clients. See exports(5). # # Example for NFSv2 and NFSv3: # /srv/homes hostname1(rw,sync,no_subtree_check) hostname2(ro,sync,no_subtree_check) # # Example for NFSv4: # /srv/nfs4 gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check) # /srv/nfs4/homes gss/krb5i(rw,sync,no_subtree_check) # /data *(rw,sync,no_root_squash) root@nfs-server:~# showmount -e Export list for nfs-server: /data * root@nfs-server:~# systemctl start nfs-server.service
2.2.1、創建nfs-server存儲類
在兩個集群中都執行
rbac.yaml
apiVersion: v1 kind: ServiceAccount metadata: name: nfs-provisioner namespace: default --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: nfs-provisioner-runner namespace: default rules: - apiGroups: [""] resources: ["persistentvolumes"] verbs: ["get", "list", "watch", "create", "delete"] - apiGroups: [""] resources: ["persistentvolumeclaims"] verbs: ["get", "list", "watch", "update"] - apiGroups: ["storage.k8s.io"] resources: ["storageclasses"] verbs: ["get", "list", "watch"] - apiGroups: [""] resources: ["events"] verbs: ["watch", "create", "update", "patch"] - apiGroups: [""] resources: ["services", "endpoints"] verbs: ["get","create","list", "watch","update"] - apiGroups: ["extensions"] resources: ["podsecuritypolicies"] resourceNames: ["nfs-provisioner"] verbs: ["use"] --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: run-nfs-provisioner subjects: - kind: ServiceAccount name: nfs-provisioner namespace: default roleRef: kind: ClusterRole name: nfs-provisioner-runner apiGroup: rbac.authorization.k8s.io
deployment.yaml
apiVersion: apps/v1 kind: Deployment metadata: name: nfs-client-provisioner namespace: default spec: replicas: 1 selector: matchLabels: app: nfs-client-provisioner strategy: type: Recreate template: metadata: labels: app: nfs-client-provisioner spec: serviceAccount: nfs-provisioner containers: - name: nfs-client-provisioner image: registry.cn-hangzhou.aliyuncs.com/open-ali/nfs-client-provisioner imagePullPolicy: IfNotPresent volumeMounts: - name: nfs-client-root mountPath: /persistentvolumes env: - name: PROVISIONER_NAME value: zhrx/nfs - name: NFS_SERVER value: 192.168.1.200 - name: NFS_PATH value: /data volumes: - name: nfs-client-root nfs: server: 192.168.1.200 path: /data
class.yaml
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: zhrx-nfs-storage provisioner: zhrx/nfs reclaimPolicy: Retain
創建存儲類
kubectl apply -f rbac.yaml kubectl apply -f deployment.yaml kubectl apply -f class.yaml
3.2、部署prometheus和thanos-sidecar容器
下載prometheus-opreator:https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.5.0.tar.gz
root@deploy:~/manifest/prometheus-operator# tar xf kube-prometheus-0.5.tar.gz root@deploy:~/manifest/prometheus-operator# cd kube-prometheus-0.5.0/manifests
默認鏡像指向的是官方的,最好的辦法是將鏡像逐個拉到本地並推送到自己的harbor倉庫方便以后部署,如果網絡環境OK的話也可以直接部署,這里我已經把鏡像拉下來推送到自己的harbor倉庫了,並且已經修改為自己的倉庫路徑
部署crd相關資源
root@deploy:~/manifest/prometheus-operator# cd kube-prometheus-0.5.0/manifests/setup/ root@deploy:~/manifest/prometheus-operator/kube-prometheus-0.5.0/manifests/setup# ls 0namespace-namespace.yaml prometheus-operator-0prometheusruleCustomResourceDefinition.yaml prometheus-operator-clusterRoleBinding.yaml prometheus-operator-0alertmanagerCustomResourceDefinition.yaml prometheus-operator-0servicemonitorCustomResourceDefinition.yaml prometheus-operator-deployment.yaml prometheus-operator-0podmonitorCustomResourceDefinition.yaml prometheus-operator-0thanosrulerCustomResourceDefinition.yaml prometheus-operator-service.yaml prometheus-operator-0prometheusCustomResourceDefinition.yaml prometheus-operator-clusterRole.yaml prometheus-operator-serviceAccount.yaml root@deploy:~/manifest/prometheus-operator/kube-prometheus-0.5.0/manifests/setup# k-devops apply -f . # devops環境 namespace/monitoring created customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created clusterrole.rbac.authorization.k8s.io/prometheus-operator created clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created deployment.apps/prometheus-operator created service/prometheus-operator created serviceaccount/prometheus-operator created root@deploy:~/manifest/prometheus-operator/kube-prometheus-0.5.0/manifests/setup# k-test apply -f . # test環境 namespace/monitoring created customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created clusterrole.rbac.authorization.k8s.io/prometheus-operator created clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created deployment.apps/prometheus-operator created service/prometheus-operator created serviceaccount/prometheus-operator created
部署promethues相關pod
修改prometheus-prometheus.yaml配置,添加thanos-sidecar容器和pvc模板配置
注意:部署到不同環境需要修改externalLabels 的標簽值
apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: labels: prometheus: k8s name: k8s namespace: monitoring spec: alerting: alertmanagers: - name: alertmanager-main namespace: monitoring port: web image: harbor.zhrx.com/monitoring/prometheus:v2.15.2 nodeSelector: kubernetes.io/os: linux podMonitorNamespaceSelector: {} podMonitorSelector: {} externalLabels: env: devops # 部署到不同環境需要修改此處label cluster: devops-idc-cluster # 部署到不同環境需要修改此處label replicas: 2 resources: requests: memory: 400Mi ruleSelector: matchLabels: prometheus: k8s role: alert-rules securityContext: fsGroup: 2000 runAsNonRoot: true runAsUser: 1000 serviceAccountName: prometheus-k8s serviceMonitorNamespaceSelector: {} serviceMonitorSelector: {} version: v2.15.2 storage: # 添加pvc模板,存儲類指向nfs volumeClaimTemplate: apiVersion: v1 kind: PersistentVolumeClaim spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: zhrx-nfs-storage thanos: # 添加thano-sidecar容器 baseImage: harbor.zhrx.com/monitoring/thanos version: v0.20.0
root@deploy:~/manifest/prometheus-operator/kube-prometheus-0.5.0/manifests# k-devops apply -f ./ alertmanager.monitoring.coreos.com/main created secret/alertmanager-main created service/alertmanager-main created serviceaccount/alertmanager-main created servicemonitor.monitoring.coreos.com/alertmanager created secret/grafana-datasources created configmap/grafana-dashboard-apiserver created configmap/grafana-dashboard-cluster-total created configmap/grafana-dashboard-controller-manager created configmap/grafana-dashboard-k8s-resources-cluster created configmap/grafana-dashboard-k8s-resources-namespace created configmap/grafana-dashboard-k8s-resources-node created configmap/grafana-dashboard-k8s-resources-pod created configmap/grafana-dashboard-k8s-resources-workload created configmap/grafana-dashboard-k8s-resources-workloads-namespace created configmap/grafana-dashboard-kubelet created configmap/grafana-dashboard-namespace-by-pod created configmap/grafana-dashboard-namespace-by-workload created configmap/grafana-dashboard-node-cluster-rsrc-use created configmap/grafana-dashboard-node-rsrc-use created configmap/grafana-dashboard-nodes created configmap/grafana-dashboard-persistentvolumesusage created configmap/grafana-dashboard-pod-total created configmap/grafana-dashboard-prometheus-remote-write created configmap/grafana-dashboard-prometheus created configmap/grafana-dashboard-proxy created configmap/grafana-dashboard-scheduler created configmap/grafana-dashboard-statefulset created configmap/grafana-dashboard-workload-total created configmap/grafana-dashboards created deployment.apps/grafana created service/grafana created serviceaccount/grafana created servicemonitor.monitoring.coreos.com/grafana created clusterrole.rbac.authorization.k8s.io/kube-state-metrics created clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created deployment.apps/kube-state-metrics created service/kube-state-metrics created serviceaccount/kube-state-metrics created servicemonitor.monitoring.coreos.com/kube-state-metrics created clusterrole.rbac.authorization.k8s.io/node-exporter created clusterrolebinding.rbac.authorization.k8s.io/node-exporter created daemonset.apps/node-exporter created service/node-exporter created serviceaccount/node-exporter created servicemonitor.monitoring.coreos.com/node-exporter created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created clusterrole.rbac.authorization.k8s.io/prometheus-adapter created clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created configmap/adapter-config created deployment.apps/prometheus-adapter created rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created service/prometheus-adapter created serviceaccount/prometheus-adapter created clusterrole.rbac.authorization.k8s.io/prometheus-k8s created clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created servicemonitor.monitoring.coreos.com/prometheus-operator created prometheus.monitoring.coreos.com/k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s-config created role.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s created prometheusrule.monitoring.coreos.com/prometheus-k8s-rules created service/prometheus-k8s created serviceaccount/prometheus-k8s created servicemonitor.monitoring.coreos.com/prometheus created servicemonitor.monitoring.coreos.com/kube-apiserver created servicemonitor.monitoring.coreos.com/coredns created servicemonitor.monitoring.coreos.com/kube-controller-manager created servicemonitor.monitoring.coreos.com/kube-scheduler created servicemonitor.monitoring.coreos.com/kubelet created root@deploy:~/manifest/prometheus-operator/kube-prometheus-0.5.0/manifests# vim prometheus-prometheus.yaml root@deploy:~/manifest/prometheus-operator/kube-prometheus-0.5.0/manifests# k-test apply -f ./ alertmanager.monitoring.coreos.com/main created secret/alertmanager-main created service/alertmanager-main created serviceaccount/alertmanager-main created servicemonitor.monitoring.coreos.com/alertmanager created secret/grafana-datasources created configmap/grafana-dashboard-apiserver created configmap/grafana-dashboard-cluster-total created configmap/grafana-dashboard-controller-manager created configmap/grafana-dashboard-k8s-resources-cluster created configmap/grafana-dashboard-k8s-resources-namespace created configmap/grafana-dashboard-k8s-resources-node created configmap/grafana-dashboard-k8s-resources-pod created configmap/grafana-dashboard-k8s-resources-workload created configmap/grafana-dashboard-k8s-resources-workloads-namespace created configmap/grafana-dashboard-kubelet created configmap/grafana-dashboard-namespace-by-pod created configmap/grafana-dashboard-namespace-by-workload created configmap/grafana-dashboard-node-cluster-rsrc-use created configmap/grafana-dashboard-node-rsrc-use created configmap/grafana-dashboard-nodes created configmap/grafana-dashboard-persistentvolumesusage created configmap/grafana-dashboard-pod-total created configmap/grafana-dashboard-prometheus-remote-write created configmap/grafana-dashboard-prometheus created configmap/grafana-dashboard-proxy created configmap/grafana-dashboard-scheduler created configmap/grafana-dashboard-statefulset created configmap/grafana-dashboard-workload-total created configmap/grafana-dashboards created deployment.apps/grafana created service/grafana created serviceaccount/grafana created servicemonitor.monitoring.coreos.com/grafana created clusterrole.rbac.authorization.k8s.io/kube-state-metrics created clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created deployment.apps/kube-state-metrics created service/kube-state-metrics created serviceaccount/kube-state-metrics created servicemonitor.monitoring.coreos.com/kube-state-metrics created clusterrole.rbac.authorization.k8s.io/node-exporter created clusterrolebinding.rbac.authorization.k8s.io/node-exporter created daemonset.apps/node-exporter created service/node-exporter created serviceaccount/node-exporter created servicemonitor.monitoring.coreos.com/node-exporter created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created clusterrole.rbac.authorization.k8s.io/prometheus-adapter created clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created configmap/adapter-config created deployment.apps/prometheus-adapter created rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created service/prometheus-adapter created serviceaccount/prometheus-adapter created clusterrole.rbac.authorization.k8s.io/prometheus-k8s created clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created servicemonitor.monitoring.coreos.com/prometheus-operator created prometheus.monitoring.coreos.com/k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s-config created role.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s created prometheusrule.monitoring.coreos.com/prometheus-k8s-rules created service/prometheus-k8s created serviceaccount/prometheus-k8s created servicemonitor.monitoring.coreos.com/prometheus created servicemonitor.monitoring.coreos.com/kube-apiserver created servicemonitor.monitoring.coreos.com/coredns created servicemonitor.monitoring.coreos.com/kube-controller-manager created servicemonitor.monitoring.coreos.com/kube-scheduler created servicemonitor.monitoring.coreos.com/kubelet created
驗證
# 驗證thanos-sidecar容器 root@deploy:~# k-devops describe pod prometheus-k8s-0 -n monitoring ............. thanos-sidecar: Container ID: docker://7c8b3442ba8f81a5e5828c02e8e4f08b80c416375aea3adab407e9c341ed9f1b Image: harbor.zhrx.com/monitoring/thanos:v0.20.0 Image ID: docker-pullable://harbor.zhrx.com/monitoring/thanos@sha256:8bcb077ca3c7d14fe242457d15dd3d98860255c21a673930645891138167d196 Ports: 10902/TCP, 10901/TCP Host Ports: 0/TCP, 0/TCP Args: sidecar --prometheus.url=http://localhost:9090/ --tsdb.path=/prometheus --grpc-address=[$(POD_IP)]:10901 --http-address=[$(POD_IP)]:10902 State: Running Started: Fri, 25 Mar 2022 15:42:09 +0800 Ready: True Restart Count: 0 Environment: POD_IP: (v1:status.podIP) Mounts: /prometheus from prometheus-k8s-db (rw,path="prometheus-db") /var/run/secrets/kubernetes.io/serviceaccount from prometheus-k8s-token-9h89g (ro) .............
暴露thanos-sidecar端口
root@deploy:~/manifest/prometheus-operator# vim thanos-sidecar-nodeport.yaml apiVersion: v1 kind: Service metadata: name: prometheus-k8s-nodeport namespace: monitoring spec: ports: - port: 10901 targetPort: 10901 nodePort: 30901 selector: app: prometheus prometheus: k8s type: NodePort root@deploy:~/manifest/prometheus-operator# k-devops apply -f thanos-sidecar-nodeport.yaml service/prometheus-k8s-nodeport created root@deploy:~/manifest/prometheus-operator# k-test apply -f thanos-sidecar-nodeport.yaml service/prometheus-k8s-nodeport created root@deploy:~/manifest/prometheus-operator# root@deploy:~/manifest/prometheus-operator# k-devops get svc -n monitoring | grep prometheus-k8s-nodeport prometheus-k8s-nodeport NodePort 10.68.17.73 <none> 10901:30901/TCP 25s
3.3、部署thanos-query組件
我這里是把thanos-query組件部署到了devops集群
root@deploy:~/manifest/prometheus-operator# vim thanos-query.yaml apiVersion: apps/v1 kind: Deployment metadata: name: thanos-query namespace: monitoring labels: app: thanos-query spec: selector: matchLabels: app: thanos-query template: metadata: labels: app: thanos-query spec: containers: - name: thanos image: harbor.zhrx.com/monitoring/thanos:v0.20.0 args: - query - --log.level=debug - --query.replica-label=prometheus_replica # prometheus-operator 里面配置的副本標簽為 prometheus_replica # Discover local store APIs using DNS SRV. - --store=192.168.1.101:30901 - --store=192.168.1.110:30901 ports: - name: http containerPort: 10902 - name: grpc containerPort: 10901 livenessProbe: httpGet: path: /-/healthy port: http initialDelaySeconds: 10 readinessProbe: httpGet: path: /-/healthy port: http initialDelaySeconds: 15 --- apiVersion: v1 kind: Service metadata: name: thanos-query namespace: monitoring labels: app: thanos-query spec: ports: - port: 9090 targetPort: http name: http nodePort: 30909 selector: app: thanos-query type: NodePort root@deploy:~/manifest/prometheus-operator# k-devops apply -f thanos-query.yaml deployment.apps/thanos-query created service/thanos-query created root@deploy:~/manifest/prometheus-operator# k-devops get pod -n monitoring | grep query thanos-query-f9bc76679-jp297 1/1 Running 0 34s
訪問thanos-quey,端口為宿主機的IP:30909
可以看到thanos-query已經識別devops集群和test集群的thanos-sidecar,下面就可以查詢這兩個集群的指標數據
可以查詢到兩個集群的指標數據