prometheus集群方案-thanos


 場景:

隨着監控數據的增長,單個prometheus采集數據性能無法滿足,即使100G+內存,也會出現OOM現象。

 

解決思路:

1.減少prometheus駐留內存的數據量,將數據持久化到tsdb或對象存儲;

2.根據業務切割成多個prometheus,分模塊存儲數據。若需要進行多個promenade之間的匯聚,利用thanos的query實現。

 

搭建thanos前提假設:

1.已經安裝docker和docker compose(本例子通過docker-compose進行安裝部署)

2.通過2個prometheus驗證thanos的可用性

 

安裝步驟:

1.

#定義2個prometheus的存儲路徑

mkdir -p /home/dockerdata/prometheus   

mkdir -p /home/dockerdata/prometheus2

 

#定義minio(用於對象存儲)和docker-compose的路徑

mkdir -p /home/dockerfile/thanos
mkdir -p /home/minio/data

 

 

2.minio配置文件(位於/home/dockerfile/thanos/bucket_config.yaml)

type: S3
config:
  bucket: "thanos"
  endpoint: 'minio:9000'
  access_key: "danny"
  insecure: true  #是否使用安全協議http或https
  signature_version2: false
  encrypt_sse: false
  secret_key: "xxxxxxxx" #設置s3密碼,保證8位以上的長度
  put_user_metadata: {}
  http_config:
    idle_conn_timeout: 90s
    response_header_timeout: 2m
    insecure_skip_verify: false
  trace:
    enable: false
  part_size: 134217728

 

3.prometheus配置文件(位於/home/dockerfile/thanos/prometheus.yml和/home/dockerfile/thanos/prometheus2.yml)。有2個prometheus配置文件主要是區分端口和extends label。必須定義extends label,用於區分相同metrics的數據源

prometheus:

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
  external_labels:
    monitor: 'danny-ecs'

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    #- targets: ['localhost:9090']
    - targets: ['node-exporter:9100']

 

prometheus2.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
  external_labels:
    monitor: 'prometheus2'

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    #- targets: ['localhost:9091']
    - targets: ['node-exporter:9100']

 

4.配置Dockerfile(docker-compose用到的,不確定是否必須。位於:/home/dockerfile/thanos)

FROM quay.io/prometheus/busybox:latest
LABEL maintainer="danny"

COPY /thanos_tmp_for_docker /bin/thanos

ENTRYPOINT [ "/bin/thanos" ]

 

5.docker-compose配置文件定義(基本所有內容都在這里了)

version: '2'
services:
  prometheus1:
    container_name: prometheus1
    image: prom/prometheus
    ports: 
      - 9090:9090
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"
    volumes:
    - /home/dockerdata/prometheus:/prometheus
    - /home/dockerfile/thanos/prometheus.yml:/etc/prometheus/prometheus.yml  
    command:
      - --web.enable-lifecycle
      - --web.enable-admin-api
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.path=/prometheus
      - --web.console.libraries=/usr/share/prometheus/console_libraries
      - --web.console.templates=/usr/share/prometheus/consoles
      - --storage.tsdb.min-block-duration=30m # small just to not wait hours to test :)
      - --storage.tsdb.max-block-duration=30m # small just to not wait hours to test :)
    depends_on:
    - minio

  sidecar1:
    container_name: sidecar1
    image: quay.io/thanos/thanos:v0.13.0-rc.2
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"
    volumes:
    - /home/dockerdata/prometheus:/var/prometheus
    - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
    command:
    - sidecar
    - --tsdb.path=/var/prometheus
    - --prometheus.url=http://prometheus1:9090
    - --objstore.config-file=/bucket_config.yaml
    - --http-address=0.0.0.0:19191
    - --grpc-address=0.0.0.0:19090
    depends_on:
    - minio
    - prometheus1

  prometheus2:
    container_name: prometheus2
    image: prom/prometheus
    ports:
      - 9091:9090
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"
    volumes:
    - /home/dockerdata/prometheus2:/prometheus
    - /home/dockerfile/thanos/prometheus2.yml:/etc/prometheus/prometheus.yml
    command:
      - --web.enable-lifecycle
      - --web.enable-admin-api
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.path=/prometheus
      - --web.console.libraries=/usr/share/prometheus/console_libraries
      - --web.console.templates=/usr/share/prometheus/consoles
      - --storage.tsdb.min-block-duration=30m 
      - --storage.tsdb.max-block-duration=30m 
    depends_on:
    - minio

  sidecar2:
    container_name: sidecar2
    image: quay.io/thanos/thanos:v0.13.0-rc.2
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"
    volumes:
    - /home/dockerdata/prometheus2:/var/prometheus
    - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
    command:
    - sidecar
    - --tsdb.path=/var/prometheus
    - --prometheus.url=http://prometheus2:9090
    - --objstore.config-file=/bucket_config.yaml
    - --http-address=0.0.0.0:19191
    - --grpc-address=0.0.0.0:19090
    depends_on:
    - minio
    - prometheus2

  grafana:
    container_name: grafana
    image: grafana/grafana
    ports: 
    - "3000:3000"

  # to search on old metrics
  storer:
    container_name: storer
    image: quay.io/thanos/thanos:v0.13.0-rc.2
    volumes:
    - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
    command:
    - store
    - --data-dir=/var/thanos/store
    - --objstore.config-file=bucket_config.yaml
    - --http-address=0.0.0.0:19191
    - --grpc-address=0.0.0.0:19090
    depends_on:
    - minio  

  # downsample metrics on the bucket
  compactor:
    container_name: compactor
    image: quay.io/thanos/thanos:v0.13.0-rc.2
    volumes:
    - /home/dockerfile/thanos/bucket_config.yaml:/bucket_config.yaml
    command:
    - compact
    - --data-dir=/var/thanos/compact
    - --objstore.config-file=bucket_config.yaml
    - --http-address=0.0.0.0:19191
    - --wait
    depends_on:
    - minio 

  # querier component which can be scaled
  querier:
    container_name: querier
    image: quay.io/thanos/thanos:v0.13.0-rc.2
    labels:
    - "traefik.enable=true"
    - "traefik.port=19192"
    - "traefik.frontend.rule=PathPrefix:/"
    ports: 
    - "19192:19192"
    command:
    - query
    - --http-address=0.0.0.0:19192
    - --store=sidecar1:19090
    - --store=sidecar2:19090
    - --store=storer:19090
    - --query.replica-label=replica

  minio:
    image: minio/minio:latest
    container_name: minio
    ports:
      - 9000:9000
    volumes: 
      - "/home/minio/data:/data"
    environment:
      MINIO_ACCESS_KEY: "danny"
      MINIO_SECRET_KEY: "xxxxxxxx" #輸入8位以上的密碼
    command: server /data
    restart: always
    logging:
      driver: "json-file"
      options:
        max-size: "5m"
        max-file: "3"

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    ports:
      - '9100:9100'

 

6.啟動。第一次up的時候,storer會啟動失敗,需要單獨重啟一次storer

cd /home/dockerfile/thanos
docker-compose up -d
docker-compose up -d storer

 

7.驗證

docker ps -a

證明所有容器已經啟動成功

 

8.promenade 的block數據成功上傳到minio驗證

訪問地址:http://ip:9000/minio/login

賬號密碼在bucket_config.yaml中定義的

創建bucket:thanos

 

如果正常運行,block數據會成功上傳到thanos這個bucket(即sidecar組建安裝成功):

 

9.驗證query組建和store組建是否安裝成功。

訪問query頁面(跟promenade的頁面基本一致)

http://ip:19192

輸入metrices:node_uname_info

 

安裝部署參考鏈接:https://www.cnblogs.com/rongfengliang/p/11319933.html

對thanos架構理解參考鏈接:http://www.dockone.io/article/10035


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM