一、架構圖

二、prometheus安裝
2.1 可選的安裝方式
- 二進制安裝 # 一般針對於物理機安裝
- 容器安裝
- helm安裝 # 以下三種都是給k8s使用的
- prometheus operator
- kube-prometheus stack # 是一個項目技術棧,包含:prometheus operator、高可用的prometheus、高可用的alertmanager、主機監控node exporter、grafana等
2.2 使用kube-prometheus stack安裝,下圖是各版本的支持,如果k8s版本較新,就下載個最新的release一般都會支持。
https://github.com/prometheus-operator/kube-prometheus/

2.3 下載對應的安裝版本
git clone -b release-0.7 https://github.com/prometheus-operator/kube-prometheus.git
2.4安裝CRD(自定義的資源)
# cd kube-prometheus/manifests/
# kubectl create -f setup/
2.5 查看operator的狀態
# kubectl get pod -n monitoring | grep operator prometheus-operator-7649c7454f-pkbbl 2/2 Running 0 3m
2.6 按需求修改alertmanager的副本數,默認3個高可用組件
# vim alertmanager-alertmanager.yaml replicas: 1
2.7按需求修改prometheus的副本數,默認是2個
# vim prometheus-prometheus.yaml replicas: 1
2.8 修改鏡像,默認的鏡像無法直接下載,可以在dockerhub中查找
# cat kube-state-metrics-deployment.yaml | grep image image: quay.io/coreos/kube-state-metrics:v1.9.7 image: quay.io/brancz/kube-rbac-proxy:v0.8.0 image: quay.io/brancz/kube-rbac-proxy:v0.8.0
2.9創建prometheus集群
# kubectl create -f .
2.10 修改prometheus和grafana的web界面為nodeport的訪問方式,因為沒有配置pvc所以數據不是持久化的。生產環境需要配置兩個的持久化存儲
# kubectl edit svc -n monitoring prometheus-k8s ports: #在最下面添加type類型 type: NodePort # kubectl edit svc -n monitoring grafana type: NodePort
2.11 配置完成之后就可以通過主機的IP加端口進行訪問了
# kubectl get svc -n monitoring | egrep "grafana|prometheus-k8s" grafana NodePort 10.107.73.70 <none> 3000:32351/TCP 1d prometheus-k8s NodePort 10.101.129.206 <none> 9090:32021/TCP 1d
三、什么是ServiceMonitor
二進制安裝、容器安裝、helm安裝通過prometheus.yml加載配置
prometheus operator、kube-prometheus stack通過ServiceMonitor發現監控目標,進行監控。serviceMonitor 是通過對service 獲取數據的一種方式。
- promethus-operator可以通過serviceMonitor 自動識別帶有某些 label 的service ,並從這些service 獲取數據。
- serviceMonitor 也是由promethus-operator 自動發現的。
# kubectl get servicemonitor -n monitoring node-exporter -o yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor selector: matchLabels: app.kubernetes.io/name: node-exporter # kubectl get svc -n monitoring -l app.kubernetes.io/name=node-exporter NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE node-exporter ClusterIP None <none> 9100/TCP 8d
# kubectl get ep -n monitoring node-exporter -o yaml apiVersion: v1 kind: Endpoints metadata: labels: app.kubernetes.io/name: node-exporter app.kubernetes.io/version: v1.0.1 service.kubernetes.io/headless: "" name: node-exporter namespace: monitoring selfLink: /api/v1/namespaces/monitoring/endpoints/node-exporter subsets: - addresses: - ip: 192.168.0.21 nodeName: k8s-master targetRef: kind: Pod name: node-exporter-96jmq namespace: monitoring resourceVersion: "8821390" uid: ba368321-1c3f-483a-a747-1e1c7b709b65 - ip: 192.168.0.25 nodeName: k8s-node1 targetRef: kind: Pod name: node-exporter-qqzl2 namespace: monitoring resourceVersion: "8821365" uid: 5daf9ff7-c120-4fcc-8412-da243c1224ce ports: - name: https port: 9100 protocol: TCP
配置講解
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: etcd-k8s namespace: monitoring labels: app: etcd-k8s spec: jobLabel: etcd-k8s endpoints: - interval: 30s port: etcd-port # metrics端口 Service.spec.ports.name scheme: https # metrics接口協議,http或者https tlsConfig: caFile: /etc/prometheus/secrets/etcd-ssl/etcd-ca.pem # 證書路徑 (在prometheus pod里路徑) certFile: /etc/prometheus/secrets/etcd-ssl/etcd.pem keyFile: /etc/prometheus/secrets/etcd-ssl/etcd-key.pem insecureSkipVerify: true # 關閉證書校驗 selector: matchLabels: app: etcd-k8s # 監控目標svc的標簽 namespaceSelector: matchNames: - kube-system # 監控目標svc所在的命名空間 # 匹配Kube-system這個命名空間下面具有app=etcd-k8s這個label標簽的Serve,job label用於檢索job任務名稱的標簽。由於證書serverName和etcd中簽發的證書可能不匹配,
所以添加了insecureSkipVerify=true將不再對服務端的證書進行校驗
prometheus的監控流程

四、雲原生應用ETCD的監控
4.1 本地測試etcd的metrics接口,我是使用kubeadm安裝的集群
# grep -E "key-file|cert-file" /etc/kubernetes/manifests/etcd.yaml - --cert-file=/etc/kubernetes/pki/etcd/server.crt - --key-file=/etc/kubernetes/pki/etcd/server.key # curl -s --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key https://192.168.0.21:2379/metrics -k | tail -3 promhttp_metric_handler_requests_total{code="200"} 2 promhttp_metric_handler_requests_total{code="500"} 0 promhttp_metric_handler_requests_total{code="503"} 0
4.2 創建etcd的service和endpoints
# cat etcd.yaml apiVersion: v1 kind: Endpoints metadata: labels: app: etcd-k8s name: etcd-k8s namespace: kube-system subsets: - addresses: # etcd節點對應的主機ip,有幾台就寫幾台 - ip: 192.168.0.21 ports: - name: etcd-port port: 2379 # etcd端口 protocol: TCP --- apiVersion: v1 kind: Service metadata: labels: app: etcd-k8s name: etcd-k8s namespace: kube-system spec: ports: - name: etcd-port port: 2379 protocol: TCP targetPort: 2379 type: ClusterIP # kubectl create -f etcd.yaml # kubectl get svc -n kube-system -l app=etcd-k8s # 查找svc的ip,將上面的測試ip換成svc的地址再測試 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE etcd-k8s ClusterIP 10.110.151.13 <none> 2379/TCP 74s # curl -s --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key https://10.110.151.13:2379/metrics -k | tail -3 promhttp_metric_handler_requests_total{code="200"} 5 promhttp_metric_handler_requests_total{code="500"} 0 promhttp_metric_handler_requests_total{code="503"} 0
4.3 將etcd的證書創建到secret中,讓prometheus進行掛載,因為是prometheus去請求etcd,必須要的prometheus在同一命名空間
kubectl create secret generic etcd-ssl --from-file=/etc/kubernetes/pki/etcd/server.crt --from-file=/etc/kubernetes/pki/etcd/server.key --from-file=/etc/kubernetes/pki/etcd/ca.crt -n monitoring
4.4 將secret掛載到prometheus的pod是
# kubectl edit prometheus k8s -n monitoring replicas: 2 secrets: - etcd-ssl #添加secret名稱,保存退出后prometheus的pod會重啟 # kubectl get pod -n monitoring | grep prometheus-k8s prometheus-k8s-0 2/2 Running 1 46s prometheus-k8s-1 2/2 Running 1 54s # kubectl exec -it prometheus-k8s-0 -n monitoring -- sh # 查看是否掛載成功 /prometheus $ ls /etc/prometheus/secrets/etcd-ssl/ ca.crt server.crt server.key
4.4 創建ServiceMonitor將service的配置加載到Prometheus
# cat etcd-servicemonitor.yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: etcd-k8s namespace: monitoring labels: app: etcd-k8s spec: jobLabel: app endpoints: - interval: 30s port: etcd-port # kubectl get svc -n kube-system etcd-k8s -o yaml 中svc的pod名稱 scheme: https tlsConfig: caFile: /etc/prometheus/secrets/etcd-ssl/ca.crt certFile: /etc/prometheus/secrets/etcd-ssl/server.crt keyFile: /etc/prometheus/secrets/etcd-ssl/server.key insecureSkipVerify: true # 關閉證書校驗 selector: matchLabels: app: etcd-k8s # 跟scv的name保持一致 namespaceSelector: matchNames: - kube-system # 跟svc所在namespace保持一致
# kubectl create -f etcd-servicemonitor.yaml
匹配Kube-system這個命名空間下面具有app=etcd-k8s這個label標簽的Serve,job label用於檢索job任務名稱的標簽。由於證書serverName和etcd中簽發的證書可能不匹配,所以添加了insecureSkipVerify=true將不再對服務端的證書進行校驗
4.5 登錄頁面查看

4.6 導入grafana 模板
https://grafana.com/grafana/dashboards/3070

五、非雲原生的監控exporter
我們使用MySQL並沒有部署在k8s內,使用prometheus監控k8s集群外的MySQL
5.1 創建mysql-exporter的deployment獲取mysql的監控數據
# cat mysql-exporter.yaml apiVersion: apps/v1 kind: Deployment metadata: name: mysql-exporter-deployment namespace: monitoring spec: replicas: 1 selector: matchLabels: app: mysql-exporter template: metadata: labels: app: mysql-exporter spec: containers: - name: mysql-exporter imagePullPolicy: IfNotPresent image: prom/mysqld-exporter env: - name: DATA_SOURCE_NAME value: "exporter:childe12#@(192.168.0.247:3306)/" ports: - containerPort: 9104 resources: requests: cpu: 500m memory: 1024Mi limits: cpu: 1000m memory: 2048Mi --- apiVersion: v1 kind: Service metadata: name: mysql-exporter namespace: monitoring labels: app: mysql-exporter spec: type: ClusterIP selector: app: mysql-exporter ports: - name: mysql port: 9104 targetPort: 9104 protocol: TCP # kubectl create -f mysql-exporter.yaml # kubectl get svc -n monitoring | grep mysql-exporter mysql-exporter ClusterIP 10.102.205.21 <none> 9104/TCP 98m # curl -s 10.102.205.21:9104/metrics | tail -1 # 通過svc的地址能獲取到mysql的監控數據即可 promhttp_metric_handler_requests_total{code="503"} 0
5.2 創建 servicemonitor
# cat mysql-servicemonitor.yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: mysql-exporter namespace: monitoring labels: app: mysql-exporter spec: jobLabel: mysql-monitoring endpoints: - interval: 30s port: mysql # svc的名稱 scheme: http selector: matchLabels: app: mysql-exporter # 跟scv的name保持一致 namespaceSelector: matchNames: - monitoring # 跟svc所在namespace保持一致 # kubectl create -f mysql-servicemonitor.yaml
5.3 在prometheus中查看數據

5.4 監控失敗排查思路
- 確認ServiceMonitor是否創建成功
- 確認ServiceMonitor標簽是否匹配正確
- 確認在Pormetheus中是否生成了相關的配置
- 確認ServiceMonitor是否能匹配到Service(自己當時就沒有匹配到標簽所以查了好久)
- 確認通過Service是否能夠訪問/metrics接口
- 確認Service的端口是否和Scheme和ServiceMonitor的端口一致

六、使用靜態配置文件配置
touch prometheus-additional.yaml kubectl create secret generic additional-configs - -from-file=prometheus-additional.yaml -n monitoring # kubectl describe secret additional-configs -n moni toring Name: additional-configs Namespace: monitoring Labels: <none> Annotations: <none> Type: Opaque Data ==== prometheus-additional.yaml: 0 bytes # kubectl edit prometheus -n monitoring k8s spec: additionalScrapeConfigs: key: prometheus-additional.yaml name: additional-configs optional: true # 修改配置文件 # cat prometheus-additional.yaml - job_name: 'node' static_configs: - targets: ['192.168.0.26:9100'] # 進行熱更新 # kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml --dry-run=client -o yaml | kubectl replace -f - -n monitoring # 驗證配置 # kubectl get secret -n monitoring additional-configs -oyaml apiVersion: v1 data: prometheus-additional.yaml: LSBqb2JfbmFtZTogJ25vZGUnCiAgc3RhdGljX2NvbmZpZ3M6CiAgLSB0YXJnZXRzOiBbJzE5Mi4xNjguMC4yNjo5MTAwJ10K kind: Secret # echo "LSBqb2JfbmFtZTogJ25vZGUnCiAgc3RhdGljX2NvbmZpZ3M6CiAgLSB0YXJnZXRzOiBbJzE5Mi4xNjguMC4yNjo5MTAwJ10K" | base64 -d - job_name: 'node' static_configs: - targets: ['192.168.0.26:9100']
