prometheus集群方案-thanos

本文轉載自查看原文 2020-07-03 15:07 3915 prometheus/ thanos

場景：

隨着監控數據的增長，單個prometheus采集數據性能無法滿足，即使100G+內存，也會出現OOM現象。

解決思路：

1.減少prometheus駐留內存的數據量，將數據持久化到tsdb或對象存儲；

2.根據業務切割成多個prometheus，分模塊存儲數據。若需要進行多個promenade之間的匯聚，利用thanos的query實現。

搭建thanos前提假設：

1.已經安裝docker和docker compose（本例子通過docker-compose進行安裝部署）

2.通過2個prometheus驗證thanos的可用性

安裝步驟：

#定義2個prometheus的存儲路徑

mkdir -p /home/dockerdata/prometheus   

mkdir -p /home/dockerdata/prometheus2

#定義minio（用於對象存儲）和docker-compose的路徑

mkdir -p /home/dockerfile/thanos
mkdir -p /home/minio/data

2.minio配置文件（位於/home/dockerfile/thanos/bucket_config.yaml）

type: S3
config:
  bucket: "thanos"
  endpoint: 'minio:9000'
  access_key: "danny"
  insecure: true  #是否使用安全協議http或https
  signature_version2: false
  encrypt_sse: false
  secret_key: "xxxxxxxx" #設置s3密碼，保證8位以上的長度
  put_user_metadata: {}
  http_config:
    idle_conn_timeout: 90s
    response_header_timeout: 2m
    insecure_skip_verify: false
  trace:
    enable: false
  part_size: 134217728

3.prometheus配置文件（位於/home/dockerfile/thanos/prometheus.yml和/home/dockerfile/thanos/prometheus2.yml）。有2個prometheus配置文件主要是區分端口和extends label。必須定義extends label，用於區分相同metrics的數據源

prometheus：

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
  external_labels:
    monitor: 'danny-ecs'

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    #- targets: ['localhost:9090']
    - targets: ['node-exporter:9100']

prometheus2.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
  external_labels:
    monitor: 'prometheus2'

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    #- targets: ['localhost:9091']
    - targets: ['node-exporter:9100']

4.配置Dockerfile（docker-compose用到的，不確定是否必須。位於：/home/dockerfile/thanos）

FROM quay.io/prometheus/busybox:latest
LABEL maintainer="danny"

COPY /thanos_tmp_for_docker /bin/thanos

ENTRYPOINT [ "/bin/thanos" ]

5.docker-compose配置文件定義（基本所有內容都在這里了）

version: '2'
services:
  prometheus1:
    container_name: prometheus1
    image: prom/prometheus
    ports: 
      - 9090:9090
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"
    volumes:
    - /home/dockerdata/prometheus:/prometheus
    - /home/dockerfile/thanos/prometheus.yml:/etc/prometheus/prometheus.yml  
    command:
      - --web.enable-lifecycle
      - --web.enable-admin-api
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.path=/prometheus
      - --web.console.libraries=/usr/share/prometheus/console_libraries
      - --web.console.templates=/usr/share/prometheus/consoles
      - --storage.tsdb.min-block-duration=30m # small just to not wait hours to test :)
      - --storage.tsdb.max-block-duration=30m # small just to not wait hours to test :)
    depends_on:
    - minio

  sidecar1:
    container_name: sidecar1
    image: quay.io/thanos/thanos:v0.13.0-rc.2
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"
    volumes:
    - /home/dockerdata/prometheus:/var/prometheus
    - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
    command:
    - sidecar
    - --tsdb.path=/var/prometheus
    - --prometheus.url=http://prometheus1:9090
    - --objstore.config-file=/bucket_config.yaml
    - --http-address=0.0.0.0:19191
    - --grpc-address=0.0.0.0:19090
    depends_on:
    - minio
    - prometheus1

  prometheus2:
    container_name: prometheus2
    image: prom/prometheus
    ports:
      - 9091:9090
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"
    volumes:
    - /home/dockerdata/prometheus2:/prometheus
    - /home/dockerfile/thanos/prometheus2.yml:/etc/prometheus/prometheus.yml
    command:
      - --web.enable-lifecycle
      - --web.enable-admin-api
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.path=/prometheus
      - --web.console.libraries=/usr/share/prometheus/console_libraries
      - --web.console.templates=/usr/share/prometheus/consoles
      - --storage.tsdb.min-block-duration=30m 
      - --storage.tsdb.max-block-duration=30m 
    depends_on:
    - minio

  sidecar2:
    container_name: sidecar2
    image: quay.io/thanos/thanos:v0.13.0-rc.2
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"
    volumes:
    - /home/dockerdata/prometheus2:/var/prometheus
    - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
    command:
    - sidecar
    - --tsdb.path=/var/prometheus
    - --prometheus.url=http://prometheus2:9090
    - --objstore.config-file=/bucket_config.yaml
    - --http-address=0.0.0.0:19191
    - --grpc-address=0.0.0.0:19090
    depends_on:
    - minio
    - prometheus2

  grafana:
    container_name: grafana
    image: grafana/grafana
    ports: 
    - "3000:3000"

  # to search on old metrics
  storer:
    container_name: storer
    image: quay.io/thanos/thanos:v0.13.0-rc.2
    volumes:
    - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
    command:
    - store
    - --data-dir=/var/thanos/store
    - --objstore.config-file=bucket_config.yaml
    - --http-address=0.0.0.0:19191
    - --grpc-address=0.0.0.0:19090
    depends_on:
    - minio  

  # downsample metrics on the bucket
  compactor:
    container_name: compactor
    image: quay.io/thanos/thanos:v0.13.0-rc.2
    volumes:
    - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
    command:
    - compact
    - --data-dir=/var/thanos/compact
    - --objstore.config-file=bucket_config.yaml
    - --http-address=0.0.0.0:19191
    - --wait
    depends_on:
    - minio 

  # querier component which can be scaled
  querier:
    container_name: querier
    image: quay.io/thanos/thanos:v0.13.0-rc.2
    labels:
    - "traefik.enable=true"
    - "traefik.port=19192"
    - "traefik.frontend.rule=PathPrefix:/"
    ports: 
    - "19192:19192"
    command:
    - query
    - --http-address=0.0.0.0:19192
    - --store=sidecar1:19090
    - --store=sidecar2:19090
    - --store=storer:19090
    - --query.replica-label=replica

  minio:
    image: minio/minio:latest
    container_name: minio
    ports:
      - 9000:9000
    volumes: 
      - "/home/minio/data:/data"
    environment:
      MINIO_ACCESS_KEY: "danny"
      MINIO_SECRET_KEY: "xxxxxxxx" #輸入8位以上的密碼
    command: server /data
    restart: always
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    ports:
      - '9100:9100'

6.啟動。第一次up的時候，storer會啟動失敗，需要單獨重啟一次storer

cd /home/dockerfile/thanos
docker-compose up -d
docker-compose up -d storer

7.驗證

docker ps -a

證明所有容器已經啟動成功

8.promenade 的block數據成功上傳到minio驗證

訪問地址：http://ip:9000/minio/login

賬號密碼在bucket_config.yaml中定義的

創建bucket：thanos

如果正常運行，block數據會成功上傳到thanos這個bucket（即sidecar組建安裝成功）：

9.驗證query組建和store組建是否安裝成功。

訪問query頁面（跟promenade的頁面基本一致）

http://ip:19192

輸入metrices：node_uname_info

安裝部署參考鏈接：https://www.cnblogs.com/rongfengliang/p/11319933.html

對thanos架構理解參考鏈接：http://www.dockone.io/article/10035

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Thanos 開源的大規模Prometheus集群解決方案 Thanos prometheus 集群以及多租戶解決方案docker-compose 試用(一) 六、prometheus高可用之thanos 使用Thanos實現Prometheus指標聯邦使用thanos管理Prometheus持久化數據基於VictoriaMetrics的prometheus 集群監控報警方案使用 Thanos 實現多集群（租戶）監控 thanos 實現 prometheus 高可用數據持久化2 prometheus 集群 050.Kubernetes集群管理-Prometheus+Grafana監控方案