手動部署k8s-prometheus

本文轉載自查看原文 2019-07-03 15:18 871 K8s/ k8s-組件部署

簡介

Prometheus 最初是 SoundCloud 構建的開源系統監控和報警工具，是一個獨立的開源項目，於2016年加入了 CNCF 基金會，作為繼 Kubernetes 之后的第二個托管項目。

特征

Prometheus 相比於其他傳統監控工具主要有以下幾個特點：

具有由 metric 名稱和鍵/值對標識的時間序列數據的多維數據模型
有一個靈活的查詢語言
不依賴分布式存儲，只和本地磁盤有關
通過 HTTP 的服務拉取時間序列數據
也支持推送的方式來添加時間序列數據
還支持通過服務發現或靜態配置發現目標
多種圖形和儀表板支持

組件

Prometheus 由多個組件組成，但是其中許多組件是可選的：

Prometheus Server：用於抓取指標、存儲時間序列數據
exporter：暴露指標讓任務來抓
pushgateway：push 的方式將指標數據推送到該網關
alertmanager：處理報警的報警組件
adhoc：用於數據查詢

大多數 Prometheus 組件都是用 Go 編寫的，因此很容易構建和部署為靜態的二進制文件。

架構

下圖是 Prometheus 官方提供的架構及其一些相關的生態系統組件：

架構

整體流程比較簡單，Prometheus 直接接收或者通過中間的 Pushgateway 網關被動獲取指標數據，在本地存儲所有的獲取的指標數據，並對這些數據進行一些規則整理，用來生成一些聚合數據或者報警信息，Grafana 或者其他工具用來可視化這些數據。

安裝

由於 Prometheus 是 Golang 編寫的程序，所以要安裝的話也非常簡單，只需要將二進制文件下載下來直接執行即可，前往地址：https://prometheus.io/download 下載我們對應的版本即可。

Prometheus 是通過一個 YAML 配置文件來進行啟動的，如果我們使用二進制的方式來啟動的話，可以使用下面的命令：

$ ./prometheus --config.file=prometheus.yml

其中 prometheus.yml 文件的基本配置如下：

global: scrape_interval: 15s evaluation_interval: 15s rule_files: # - "first.rules" # - "second.rules" scrape_configs: - job_name: prometheus static_configs: - targets: ['localhost:9090']

上面這個配置文件中包含了3個模塊：global、rule_files 和 scrape_configs。

其中 global 模塊控制 Prometheus Server 的全局配置：

scrape_interval：表示 prometheus 抓取指標數據的頻率，默認是15s，我們可以覆蓋這個值
evaluation_interval：用來控制評估規則的頻率，prometheus 使用規則產生新的時間序列數據或者產生警報

rule_files 模塊制定了規則所在的位置，prometheus 可以根據這個配置加載規則，用於生成新的時間序列數據或者報警信息，當前我們沒有配置任何規則。

scrape_configs 用於控制 prometheus 監控哪些資源。由於 prometheus 通過 HTTP 的方式來暴露的它本身的監控數據，prometheus 也能夠監控本身的健康情況。在默認的配置里有一個單獨的 job，叫做prometheus，它采集 prometheus 服務本身的時間序列數據。這個 job 包含了一個單獨的、靜態配置的目標：監聽 localhost 上的9090端口。prometheus 默認會通過目標的/metrics路徑采集 metrics。所以，默認的 job 通過 URL：http://localhost:9090/metrics采集 metrics。收集到的時間序列包含 prometheus 服務本身的狀態和性能。如果我們還有其他的資源需要監控的話，直接配置在該模塊下面就可以了。

由於我們這里是要跑在 Kubernetes 系統中，所以我們直接用 Docker 鏡像的方式運行即可。

為了方便管理，我們將所有的資源對象都安裝在kube-ops的 namespace 下面，沒有的話需要提前安裝。

為了能夠方便的管理配置文件，我們這里將 prometheus.yml 文件用 ConfigMap 的形式進行管理：（prometheus-cm.yaml）

apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: kube-ops data: prometheus.yml: | global: scrape_interval: 15s scrape_timeout: 15s scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090']

我們這里暫時只配置了對 prometheus 的監控，然后創建該資源對象：

$ kubectl create -f prometheus-cm.yaml
configmap "prometheus-config" created

配置文件創建完成了，以后如果我們有新的資源需要被監控，我們只需要將上面的 ConfigMap 對象更新即可。現在我們來創建 prometheus 的 Pod 資源：(prometheus-deploy.yaml)

apiVersion: extensions/v1beta1 kind: Deployment metadata: name: prometheus namespace: kube-ops labels: app: prometheus spec: template: metadata: labels: app: prometheus spec: serviceAccountName: prometheus containers: - image: prom/prometheus:v2.4.3 name: prometheus command: - "/bin/prometheus" args: - "--config.file=/etc/prometheus/prometheus.yml" - "--storage.tsdb.path=/prometheus" - "--storage.tsdb.retention=24h" - "--web.enable-admin-api" # 控制對admin HTTP API的訪問，其中包括刪除時間序列等功能 - "--web.enable-lifecycle" # 支持熱更新，直接執行localhost:9090/-/reload立即生效 ports: - containerPort: 9090 protocol: TCP name: http volumeMounts: - mountPath: "/prometheus" subPath: prometheus name: data - mountPath: "/etc/prometheus" name: config-volume resources: requests: cpu: 100m memory: 512Mi limits: cpu: 100m memory: 512Mi securityContext: runAsUser: 0 volumes: - name: data persistentVolumeClaim: claimName: prometheus - configMap: name: prometheus-config name: config-volume

我們在啟動程序的時候，除了指定了 prometheus.yml 文件之外，還通過參數storage.tsdb.path指定了 TSDB 數據的存儲路徑、通過storage.tsdb.retention設置了保留多長時間的數據，還有下面的web.enable-admin-api參數可以用來開啟對 admin api 的訪問權限，參數web.enable-lifecycle非常重要，用來開啟支持熱更新的，有了這個參數之后，prometheus.yml 配置文件只要更新了，通過執行localhost:9090/-/reload就會立即生效，所以一定要加上這個參數。

我們這里將 prometheus.yml 文件對應的 ConfigMap 對象通過 volume 的形式掛載進了 Pod，這樣 ConfigMap 更新后，對應的 Pod 里面的文件也會熱更新的，然后我們再執行上面的 reload 請求，Prometheus 配置就生效了，除此之外，為了將時間序列數據進行持久化，我們將數據目錄和一個 pvc 對象進行了綁定，所以我們需要提前創建好這個 pvc 對象：(prometheus-volume.yaml)

apiVersion: v1 kind: PersistentVolume metadata: name: prometheus spec: capacity: storage: 10Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Recycle nfs: server: 10.151.30.57 path: /data/k8s --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: prometheus namespace: kube-ops spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi

我們這里簡單的通過 NFS 作為存儲后端創建一個 pv、pvc 對象：

$ kubectl create -f prometheus-volume.yaml

除了上面的注意事項外，我們這里還需要配置 rbac 認證，因為我們需要在 prometheus 中去訪問 Kubernetes 的相關信息，所以我們這里管理了一個名為 prometheus 的 serviceAccount 對象：(prometheus-rbac.yaml)

apiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: kube-ops --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: - "" resources: - nodes - services - endpoints - pods - nodes/proxy verbs: - get - list - watch - apiGroups: - "" resources: - configmaps - nodes/metrics verbs: - get - nonResourceURLs: - /metrics verbs: - get --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: - kind: ServiceAccount name: prometheus namespace: kube-ops

由於我們要獲取的資源信息，在每一個 namespace 下面都有可能存在，所以我們這里使用的是 ClusterRole 的資源對象，值得一提的是我們這里的權限規則聲明中有一個nonResourceURLs的屬性，是用來對非資源型 metrics 進行操作的權限聲明，這個在以前我們很少遇到過，然后直接創建上面的資源對象即可：

$ kubectl create -f prometheus-rbac.yaml
serviceaccount "prometheus" created clusterrole.rbac.authorization.k8s.io "prometheus" created clusterrolebinding.rbac.authorization.k8s.io "prometheus" created

還有一個要注意的地方是我們這里必須要添加一個securityContext的屬性，將其中的runAsUser設置為0，這是因為現在的 prometheus 運行過程中使用的用戶是 nobody，否則會出現下面的permission denied之類的權限錯誤：

level=error ts=2018-10-22T14:34:58.632016274Z caller=main.go:617 err="opening storage failed: lock DB directory: open /data/lock: permission denied"

現在我們就可以添加 promethues 的資源對象了：

$ kubectl create -f prometheus-deploy.yaml
deployment.extensions "prometheus" created $ kubectl get pods -n kube-ops NAME READY STATUS RESTARTS AGE prometheus-6dd775cbff-zb69l 1/1 Running 0 20m $ kubectl logs -f prometheus-6dd775cbff-zb69l -n kube-ops ...... level=info ts=2018-10-22T14:44:40.535385503Z caller=main.go:523 msg="Server is ready to receive web requests."

Pod 創建成功后，為了能夠在外部訪問到 prometheus 的 webui 服務，我們還需要創建一個 Service 對象：(prometheus-svc.yaml)

apiVersion: v1 kind: Service metadata: name: prometheus namespace: kube-ops labels: app: prometheus spec: selector: app: prometheus type: NodePort ports: - name: web port: 9090 targetPort: http

為了方便測試，我們這里創建一個NodePort類型的服務，當然我們可以創建一個Ingress對象，通過域名來進行訪問：

$ kubectl create -f prometheus-svc.yaml
service "prometheus" created $ kubectl get svc -n kube-ops NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE prometheus NodePort 10.111.118.104 <none> 9090:30987/TCP 24s

然后我們就可以通過http://任意節點IP:30987訪問 prometheus 的 webui 服務了。

prometheus webui

為了數據的一致性，prometheus 所有的數據都是使用的 UTC 時間，所以我們默認打開的 dashboard 中有這樣一個警告，我們需要在查詢的時候指定我們當前的時間才可以。然后我們可以查看當前監控系統中的一些監控目標： prometheus targets

由於我們現在還沒有配置任何的報警信息，所以 Alerts 菜單下面現在沒有任何數據，隔一會兒，我們可以去 Graph 菜單下面查看我們抓取的 prometheus 本身的一些監控數據了，其中- insert metrics at cursor -下面就是我們搜集到的一些監控數據指標： prometheus metrics

比如我們這里就選擇scrape_duration_seconds這個指標，然后點擊Execute，如果這個時候沒有查詢到任何數據，我們可以切換到Graph這個 tab 下面重新選擇下時間，選擇到當前的時間點，重新執行，就可以看到類似於下面的圖表數據了： prometheus graph

除了簡單的直接使用采集到的一些監控指標數據之外，這個時候也可以使用強大的 PromQL 工具，PromQL其實就是 prometheus 便於數據聚合展示開發的一套 ad hoc 查詢語言的，你想要查什么找對應函數取你的數據好了。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Prometheus監控k8s(2)-手動部署Prometheus 基於k8s手動部署rabbitmq集群 k8s-Prometheus報警設置 k8s-手動安裝 k8s手動安裝 kubernetes之手動部署k8s 1.14.1高可用集群 k8s-Prometheus監控pod指標數據 k8s系列---pod手動驅逐 k8s 1.9.0-手動安裝-2 手動搭建K8S集群