Promethues + Grafana + AlertManager使用總結


Prometheus是一個開源監控報警系統和時序列數據庫,通常會使用Grafana來美化數據展示。

1. 監控系統基礎架

1.1核心組件

  • Prometheus Server, 主要用於抓取數據和存儲時序數據,另外還提供查詢和 Alert Rule 配置管理。
  • exporters ,數據采樣器,例如采集機器數據的node_exporter,采集MongoDB 信息的 MongoDB exporter 等等。
  • alertmanager ,用於告警通知管理。
  • Grafana ,監控數據圖表化展示模塊。

2. 基礎組件安裝

由於是學習研究使用,這里通過docker快速安裝環境。

2.1 安裝Node Exporter

  • docker-compose-node-export.yml

    version: '3'
    services:
      node-exporter:
        image: prom/node-exporter
        container_name: node-exporter
        hostname: node-exporter
        restart: always
        ports:
          - "9100:9100"
    

2.2 安裝Alert Manager

  • docker-compose-alertmanager.yml

    version: '3'
    services:
      alertmanager:
        image: prom/alertmanager
        container_name: alertmanager
        hostname: alertmanager
        restart: always
        volumes:
          - /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
        ports:
          - "9093:9093"
    
  • alertmanager.yml

    global:
      smtp_smarthost: 'smtp.qq.com:25'  		#QQ服務器
      smtp_from: '793272861@qq.com'        	#發郵件的郵箱
      smtp_auth_username: '793272861@qq.com'  	#發郵件的郵箱用戶名,也就是你的郵箱
      smtp_auth_password: '****************'  	#發郵件的郵箱密碼
      smtp_require_tls: false        		#不進行tls驗證
     
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 10m
      receiver: live-monitoring
    
    receivers:
    - name: 'live-monitoring'
      email_configs:
      - to: '793272861@qq.com'        		#收郵件的郵箱
    

2.3 安裝Prometheus

  • docker-compose-prometheus.yml

    version: '3'
    services:
      prometheus:
        image: prom/prometheus
        container_name: prometheus
        hostname: prometheus
        restart: always
        volumes:
          - /data/docker_file/prometheus/data:/prometheus
          - /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
        ports:
          - "9090:9090"
    
  • prometheus.yml

    # my global config
    global:
      scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
      evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
      # scrape_timeout is set to the global default (10s).
     
    # Alertmanager configuration
    alerting:
      alertmanagers:
      - static_configs:
        - targets: ['alertmanager:9093']
     
    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
      # - "first_rules.yml"
      # - "second_rules.yml"
     
    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    # 配置定時任務,輪詢拉取監控數據
    scrape_configs:
      - job_name: 'prometheus'
        static_configs:
          - targets: ['prometheus:9090']
      - job_name: 'node-exporter'
        scrape_interval: 5s
        static_configs:
          - targets: ['node-exporter:9100']
    
  • Prometheus服務發現機制

  • 訪問:http://localhost:9090/

2.4 安裝Grafana

  • docker-compose-grafana.yml

    version: '3'
    services:
      grafana:
        image: grafana/grafana
        container_name: grafana
        hostname: grafana
        restart: always
        environment:
          - GF_SECURITY_ADMIN_PASSWORD=admin
        volumes:
          - /data/docker_file/grafana/data:/var/lib/grafana
          - /data/docker_file/grafana/log:/var/log/grafana
        ports:
          - "3000:3000"
    
  • 添加數據源(Prometheus)

  • 訪問:http://localhost:30000/ , 默認用戶名:admin,密碼:admin

2.5 Docker-Compose腳本

version: '3'
services:
  prometheus:
    image: prom/prometheus
    container_name: prometheus
    hostname: prometheus
    restart: always
    volumes:
      - /data/docker_file/prometheus/data:/prometheus
      - /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    networks:
      - monitor
  alertmanager:
    image: prom/alertmanager
    container_name: alertmanager
    hostname: alertmanager
    restart: always
    volumes:
      - /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
    ports:
      - "9093:9093"
    networks:
      - monitor
  grafana:
    image: grafana/grafana
    container_name: grafana
    hostname: grafana
    restart: always
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - /data/docker_file/grafana/data:/var/lib/grafana
      - /data/docker_file/grafana/log:/var/log/grafana
    ports:
      - "3000:3000"
    networks:
      - monitor
  node-exporter:
    image: prom/node-exporter
    container_name: node-exporter
    hostname: node-exporter
    restart: always
    ports:
      - "9100:9100"
    networks:
      - monitor
networks:
  monitor:
    driver: bridge
 

3. 配置Grafana DashBoard

Grafana通過PromQL查詢語句從Prometheus拉取數據,並有Pannel進行渲染,一個個Grafana Pannel 組成一個Grafana DashBoard。

3.1下載Grafana DashBoard文件

可以從官網下載已經寫好的Grafana DashBoard文件,導入到我們Grafana系統就可以直接使用。

推薦的Grafana DashBoard

導入Grafana DashBoard

3.2 添加修改Grafana Panel(擴展)

官方自帶的Spring Boot 2.1 Statistics Dashboard沒有展示第三方請求的數據報表,我們以此為例,添加第三方請求的Client Request Count報表和Client Response Time報表。

Client Request Count

irate(http_client_requests_seconds_count{instance="$instance", application="$application", uri!~".*actuator.*"}[5m])

注意:應用中的Meter的名稱必須為http.client.requests

Client Response Time

irate(http_client_requests_seconds_sum{instance="$instance", application="$application",uri!~".*actuator.*"}[5m]) / irate(http_client_requests_seconds_count{instance="$instance", application="$application",uri!~".*actuator.*"}[5m])

4. Spring Boot 集成Micrometer

Metrics(譯:指標,度量)

Micrometer提供了與供應商無關的接口,包括 timers(計時器)gauges(量規)counters(計數器)distribution summaries(分布式摘要)long task timers(長任務定時器)。它具有維度數據模型,當與維度監視系統結合使用時,可以高效地訪問特定的命名度量,並能夠跨維度深入研究。

4.1 引入依賴

<dependency>
 	<groupId>io.micrometer</groupId>
   	<artifactId>micrometer-registry-prometheus</artifactId>
   	<version>${micrometer.version}</version>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

4.2 開啟Prometheus功能

spring:
  application:
    name: spring-boot-node

management:
  metrics:
    # 1.添加全局的tags,后面可以作為變量搜索數據
    tags:
      application: ${spring.application.name}
  endpoints:
    web:
      exposure:
      	# 2.打開prometheus端點功能
        include: 'health,prometheus'

4.3 實現第三方請求的監控

基於OkHttpMetricsEventListener可以有好的對OkHttp Client的請求進行監控。

配置OkHttp Client事件監聽

@Bean("okHttpClient")
public OkHttpClient okHttpClient(ConnectionPool connectionPool) {
    return new OkHttpClient().newBuilder().connectionPool(connectionPool)
            .connectTimeout(5, TimeUnit.SECONDS)
            .readTimeout(10, TimeUnit.SECONDS)
            .eventListener(eventListener())
            .build();
}

/**
* 事件監聽器 OkHttpMetricsEventListener
* metricsProperties.getWeb().getClient().getRequestsMetricName() equals 'http.client.request',可稱為度量。
* @return
*/
private EventListener eventListener(){
    return OkHttpMetricsEventListener.builder(
    meterRegistry, metricsProperties.getWeb().getClient().getRequestsMetricName())
    .build();
}

原理:OkHttpMetricsEventListener.java

public class OkHttpMetricsEventListener extends EventListener {

    /**
     * Header name for URI patterns which will be used for tag values.
     */
    public static final String URI_PATTERN = "URI_PATTERN";

    @Override
    public void callFailed(Call call, IOException e) {
        CallState state = callState.remove(call);
        if (state != null) {
            state.exception = e;
            // 請求完成時,注冊監控數據
            time(state);
        }
    }

    @Override
    public void responseHeadersEnd(Call call, Response response) {
        CallState state = callState.remove(call);
        if (state != null) {
            state.response = response;
            // 請求完成時,注冊監控數據
            time(state);
        }
    }

    private void time(CallState state) {
        String uri = state.response == null ? "UNKNOWN" :
            (state.response.code() == 404 || state.response.code() == 301 ? "NOT_FOUND" : urlMapper.apply(state.request));

        // 定義一些Tag或者是變量,在Prometheus和Grafana中可以使用
        Iterable<Tag> tags = Tags.concat(extraTags, Tags.of(
            "method", state.request != null ? state.request.method() : "UNKNOWN",
            "uri", uri,
            "status", getStatusMessage(state.response, state.exception),
            "host", state.request != null ? state.request.url().host() : "UNKNOWN"
        ));

        // 注冊計時器監控數據,此時Prometheus可以通過Spring Boot Actuator提供的/actuator/promotheus斷點來pull數據
        Timer.builder(this.requestsMetricName)
            .tags(tags)
            .description("Timer of OkHttp operation")
            .register(registry)
            .record(registry.config().clock().monotonicTime() - state.startTime, TimeUnit.NANOSECONDS);
    }

}

4.4 Spring Boot集成案例

5. 參考文檔

【1】Grafana Dashboards

【2】Centos7.X 搭建Prometheus+node-exporter+Grafana實時監控平台

【3】Micrometer 快速入門

【4】JVM應用度量框架Micrometer實戰

【5】SpringBoot+Prometheus:微服務開發中自定義業務監控指標的幾點經驗


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM