1、scrape_configs 參數介紹
# 默認的全局配置 global: scrape_interval: 15s # 采集間隔15s,默認為1min一次 evaluation_interval: 15s # 計算規則的間隔15s默認為1min一次 scrape_timeout: 10s # 采集超時時間,默認為10s external_labels: # 當和其他外部系統交互時的標簽,如遠程存儲、聯邦集群時 prometheus: monitoring/k8s # 如:prometheus-operator的配置 prometheus_replica: prometheus-k8s-1 # Alertmanager的配置 alerting: alertmanagers: - static_configs: - targets: - 127.0.0.1:9093 # alertmanager的服務地址,如127.0.0.1:9093 alert_relabel_configs: # 在抓取之前對任何目標及其標簽進行修改。 - separator: ; regex: prometheus_replica replacement: $1 action: labeldrop # 一旦加載了報警規則文件,將按照evaluation_interval即15s一次進行計算,rule文件可以有多個 rule_files: # - "first_rules.yml" # - "second_rules.yml" # scrape_configs為采集配置,包含至少一個job scrape_configs: # Prometheus的自身監控 將在采集到的時間序列數據上打上標簽job=xx - job_name: 'prometheus' # 采集指標的默認路徑為:/metrics,如 localhost:9090/metric # 協議默認為http static_configs: - targets: ['localhost:9090'] # 遠程讀,可選配置,如將監控數據遠程讀寫到influxdb的地址,默認為本地讀寫 remote_write: 127.0.0.1:8090 # 遠程寫 remote_read: 127.0.0.1:8090
2、scrape_configs配置案例
prometheus的配置中,最常用的就是scrape_configs配置,比如添加新的監控項,修改原有監控項的地址頻率等。 最簡單配置為: scrape_configs: - job_name: prometheus metrics_path: /metrics scheme: http static_configs: - targets: - localhost:9090 完整配置為(附prometheus-operator的推薦配置): # job 將以標簽形式出現在指標數據中,如node-exporter采集的數據,job=node-exporter job_name: node-exporter # 采集頻率:30s scrape_interval: 30s # 采集超時:10s scrape_timeout: 10s # 采集對象的path路徑 metrics_path: /metrics # 采集協議:http或者https scheme: https # 可選的采集url的參數 params: name: demo # 當自定義label和采集到的自帶label沖突時的處理方式,默認沖突時會重名為exported_xx honor_labels: false # 當采集對象需要鑒權才能獲取時,配置賬號密碼等信息 basic_auth: username: admin password: admin password_file: /etc/pwd # bearer_token或者文件位置(OAuth 2.0鑒權) bearer_token: kferkhjktdgjwkgkrwg bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token # https的配置,如跳過認證,或配置證書文件 tls_config: # insecure_skip_verify: true ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt server_name: kubernetes insecure_skip_verify: false # 代理地址 proxy_url: 127.9.9.0:9999 # Azure的服務發現配置 azure_sd_configs: # Consul的服務發現配置 consul_sd_configs: # DNS的服務發現配置 dns_sd_configs: # EC2的服務發現配置 ec2_sd_configs: # OpenStack的服務發現配置 openstack_sd_configs: # file的服務發現配置 file_sd_configs: # GCE的服務發現配置 gce_sd_configs: # Marathon的服務發現配置 marathon_sd_configs: # AirBnB的服務發現配置 nerve_sd_configs: # Zookeeper的服務發現配置 serverset_sd_configs: # Triton的服務發現配置 triton_sd_configs: # Kubernetes的服務發現配置 kubernetes_sd_configs: - role: endpoints namespaces: names: - monitoring # 對采集對象進行一些靜態配置,如打特定的標簽 static_configs: - targets: ['localhost:9090', 'localhost:9191'] labels: my: label your: label # 在Prometheus采集數據之前,通過Target實例的Metadata信息,動態重新寫入Label的值。 如將原始的__meta_kubernetes_namespace直接寫成namespace,簡潔明了 relabel_configs: - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: namespace replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: service replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_name] separator: ; regex: (.*) target_label: pod replacement: $1 action: replace - source_labels: [__meta_kubernetes_service_name] separator: ; regex: (.*) target_label: job replacement: ${1} action: replace - separator: ; regex: (.*) target_label: endpoint replacement: web action: replace # 指標relabel的配置,如丟掉某些無用的指標 metric_relabel_configs: - source_labels: [__name__] separator: ; regex: etcd_(debugging|disk|request|server).* replacement: $1 action: drop
3、常見案例
1.獲取集群中各節點信息,並按可用區或地域分類 如使用k8s的role:node采集集群中node的數據,可以通過"meta_domain_beta_kubernetes_io_zone"標簽來獲取到該節點的地域,該label為集群創建時為node打上的標記,kubectl decribe node可以看到。 然后可以通過relabel_configs定義新的值 relabel_configs: - source_labels: ["meta_domain_beta_kubernetes_io_zone"] regex: "(.*)" replacement: $1 action: replace target_label: "zone" 后面可以直接通過node{zone="XX"}來進行地域篩選 2.過濾信息,或者按照職能(RD、運維)進行監控管理 對於不同職能(開發、測試、運維)的人員可能只關心其中一部分的監控數據,他們可能各自部署的自己的Prometheus Server用於監控自己關心的指標數據,不必要的數據需要過濾掉,以免浪費資源,可以最類似配置; metric_relabel_configs: - source_labels: [__name__] separator: ; regex: etcd_(debugging|disk|request|server).* replacement: $1 action: drop action: drop代表丟棄掉符合條件的指標,不進行采集。 3.搭建prometheus聯邦集群,管理各IDC(地域)監控實例 如果存在多個地域,每個地域又有很多節點或者集群,可以采用默認的聯邦集群部署,每個地域部署自己的prometheus server實例,采集自己地域的數據。然后由統一的server采集所有地域數據,進行統一展示,並按照地域歸類 配置: scrape_configs: - job_name: 'federate' scrape_interval: 15s honor_labels: true metrics_path: '/federate' params: 'match[]': - '{job="prometheus"}' - '{__name__=~"job:.*"}' - '{__name__=~"node.*"}' static_configs: - targets: - '192.168.77.11:9090' - '192.168.77.12:9090'
4、服務發現
[root@VM_0_14_centos prometheus]# cat prometheus.yml # my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: - 172.18.0.1:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" - "alert_rules/rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090'] labels: idc: bj - job_name: 'harbor_server' file_sd_configs: ####基於文件的發現 - files: - /opt/prometheus/file_sd_configs/harbor_monitor.json ### refresh_interval: 10s - job_name: 'container' static_configs: - targets: ['172.18.0.1:8080'] 配置文件: [root@VM_0_14_centos prometheus]# cat /opt/prometheus/file_sd_configs/harbor_monitor.json [ { "targets": ["172.19.0.14:9100","124.156.173.164:9100"] } ]