Prometheus 官方的高可用有幾種方案:
- HA:即兩套 Prometheus 采集完全一樣的數據,外邊掛負載均衡
- HA + 遠程存儲:除了基礎的多副本 Prometheus,還通過 Remote write 寫入到遠程存儲,解決存儲持久化問題
- 聯邦集群:即 Federation,按照功能進行分區,不同的 Shard 采集不同的數據,由 Global 節點來統一存放,解決監控數據規模的問題.
Thanos 的默認模式:sidecar 方式. or Receiver方式 ;
- Thanos Query. 主要是對從Promethues Pod采集來的數據進行merge,提供查詢接口給客戶端;
- Thanos SideCar. 將Promethues container的數據進行封裝,以提供接口給Thanos Query;
- Prometheus Container. 采集數據,通過Remote Read API提供接口給Thanos SideCar。
- Thanos store Gateway: 將對象存儲的數據暴露給 Thanos Query 去查詢。
- thanos compact: 將對象存儲中的數據進行壓縮和降低采樣率,加速大時間區間監控數據查詢的速度。
- Thanos Ruler: 對監控數據進行評估和告警,還可以計算出新的監控數據,將這些新數據提供給 Thanos Query 查詢並且/或者上傳到對象存儲,以供長期存儲。
thanos 主:
thanos compact --data-dir ./thanos/comp --http-address 0.0.0.0:19192 --objstore.config-file ./bucket_config.yaml
thanos store --data-dir ./thanos/store --objstore.config-file ./bucket_config.yaml --http-address 0.0.0.0:19191 --grpc-address 0.0.0.0:19090
thanos query --http-address 0.0.0.0:8080 --grpc-address 0.0.0.0:8081 --query.replica-label slave --store 172.16.10.11:10901 --store 172.16.10.10:10901 --store 127.0.0.1:19090
thanos sidecar --tsdb.path /data/ --prometheus.url http://localhost:9090 --objstore.config-file ./bucket_config.yaml --shipper.upload-compacted
thanos 備:
thanos sidecar --tsdb.path /data/ --prometheus.url http://localhost:9090 --objstore.config-file ./bucket_config.yaml --shipper.upload-compacted
thanos query --http-address 0.0.0.0:8080 --grpc-address 0.0.0.0:8081 --query.replica-label slave --store 172.16.10.10:10901 --store 172.16.10.11:10901 --store 172.16.10.10:19090
注: bucket_config.yaml為雲存儲;
promtheus 配置:
global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). external_labels: slave: 02 # Alertmanager configuration #alerting: # alertmanagers: # - static_configs: # - targets: ['127.0.0.1:9093'] alerting: alertmanagers: - scheme: http static_configs: - targets: - "172.16.10.10:9093" # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: - "/data/alert/etc/*.rule" # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'Node_Exporter' consul_sd_configs: - server: '172.16.10.10:8500' relabel_configs: - source_labels: ["__meta_consul_service_address"] regex: "(.*)" replacement: $1 action: replace target_label: "address" - source_labels: ["__meta_consul_service"] regex: "(.*)" replacement: $1 action: replace target_label: "hostname" - source_labels: ["__meta_consul_service_address"] regex: "10.2.*" action: drop - source_labels: ["__meta_consul_service_address"] regex: "10.3.*" action: drop - source_labels: ["__meta_consul_service_address"] regex: "10.7.*" action: drop - source_labels: ["__meta_consul_tags"] regex: ".*測試環境.*" action: drop - source_labels: ["__meta_consul_tags"] regex: ".*安全組.*" action: drop - source_labels: ["__meta_consul_tags"] regex: ",(.*),(.*),(.*),(.*),(.*),(.*)," replacement: $1 action: replace target_label: "department" - source_labels: ["__meta_consul_tags"] regex: ",(.*),(.*),(.*),(.*),(.*),(.*)," replacement: $2 action: replace target_label: "group" - source_labels: ["__meta_consul_tags"] regex: ",(.*),(prod|dev|pre|test)," replacement: $2 action: replace target_label: "env" - source_labels: ["__meta_consul_tags"] regex: ",(.*),(.*),(.*),(.*),(.*),(.*)," replacement: $3 action: replace target_label: "application" - source_labels: ["__meta_consul_tags"] regex: ",(.*),(.*),(.*),(.*),(.*),(.*)," replacement: $4 action: replace target_label: "type" - source_labels: ["__meta_consul_tags"] regex: ",(.*),(.*),(.*),(.*),(.*),(.*)," replacement: $5 action: replace target_label: "dc" - source_labels: ["__meta_consul_tags"] regex: ",(.*),(.*),(.*),(.*),(.*),(.*)," replacement: $6 action: replace target_label: "appCode" - source_labels: ["__meta_consul_service_id"] regex: "(.*)" replacement: $1 action: replace target_label: "id" - job_name: kubetake kubernetes_sd_configs: - role: node api_server: 'https://172.16.10.11:6443' tls_config: ca_file: /data/alert/etc/ca.crt bearer_token_file: /data/alert/etc/token relabel_configs: - action: labelmap regex: (.+) - target_label: __address__ source_labels: [__meta_kubernetes_node_address_InternalIP] regex: (.+) replacement: ${1}:9100