Prometheus是一個開源監控報警系統和時序列數據庫,通常會使用Grafana來美化數據展示。
1. 監控系統基礎架
1.1核心組件
- Prometheus Server, 主要用於抓取數據和存儲時序數據,另外還提供查詢和 Alert Rule 配置管理。
- exporters ,數據采樣器,例如采集機器數據的node_exporter,采集MongoDB 信息的 MongoDB exporter 等等。
- alertmanager ,用於告警通知管理。
- Grafana ,監控數據圖表化展示模塊。
2. 基礎組件安裝
由於是學習研究使用,這里通過docker快速安裝環境。
2.1 安裝Node Exporter
-
docker-compose-node-export.yml
version: '3' services: node-exporter: image: prom/node-exporter container_name: node-exporter hostname: node-exporter restart: always ports: - "9100:9100"
2.2 安裝Alert Manager
-
docker-compose-alertmanager.yml
version: '3' services: alertmanager: image: prom/alertmanager container_name: alertmanager hostname: alertmanager restart: always volumes: - /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml ports: - "9093:9093"
-
alertmanager.yml
global: smtp_smarthost: 'smtp.qq.com:25' #QQ服務器 smtp_from: '793272861@qq.com' #發郵件的郵箱 smtp_auth_username: '793272861@qq.com' #發郵件的郵箱用戶名,也就是你的郵箱 smtp_auth_password: '****************' #發郵件的郵箱密碼 smtp_require_tls: false #不進行tls驗證 route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 10m receiver: live-monitoring receivers: - name: 'live-monitoring' email_configs: - to: '793272861@qq.com' #收郵件的郵箱
2.3 安裝Prometheus
-
docker-compose-prometheus.yml
version: '3' services: prometheus: image: prom/prometheus container_name: prometheus hostname: prometheus restart: always volumes: - /data/docker_file/prometheus/data:/prometheus - /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml ports: - "9090:9090"
-
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: ['alertmanager:9093'] # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. # 配置定時任務,輪詢拉取監控數據 scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['prometheus:9090'] - job_name: 'node-exporter' scrape_interval: 5s static_configs: - targets: ['node-exporter:9100']
-
2.4 安裝Grafana
-
docker-compose-grafana.yml
version: '3' services: grafana: image: grafana/grafana container_name: grafana hostname: grafana restart: always environment: - GF_SECURITY_ADMIN_PASSWORD=admin volumes: - /data/docker_file/grafana/data:/var/lib/grafana - /data/docker_file/grafana/log:/var/log/grafana ports: - "3000:3000"
-
添加數據源(Prometheus)
-
訪問:http://localhost:30000/ , 默認用戶名:admin,密碼:admin
2.5 Docker-Compose腳本
version: '3'
services:
prometheus:
image: prom/prometheus
container_name: prometheus
hostname: prometheus
restart: always
volumes:
- /data/docker_file/prometheus/data:/prometheus
- /data/docker_file/prometheus/conf/prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
networks:
- monitor
alertmanager:
image: prom/alertmanager
container_name: alertmanager
hostname: alertmanager
restart: always
volumes:
- /data/docker_file/monitor/conf/alertmanager.yml:/etc/alertmanager/alertmanager.yml
ports:
- "9093:9093"
networks:
- monitor
grafana:
image: grafana/grafana
container_name: grafana
hostname: grafana
restart: always
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- /data/docker_file/grafana/data:/var/lib/grafana
- /data/docker_file/grafana/log:/var/log/grafana
ports:
- "3000:3000"
networks:
- monitor
node-exporter:
image: prom/node-exporter
container_name: node-exporter
hostname: node-exporter
restart: always
ports:
- "9100:9100"
networks:
- monitor
networks:
monitor:
driver: bridge
3. 配置Grafana DashBoard
Grafana通過PromQL查詢語句從Prometheus拉取數據,並有Pannel進行渲染,一個個Grafana Pannel 組成一個Grafana DashBoard。
3.1下載Grafana DashBoard文件
可以從官網下載已經寫好的Grafana DashBoard文件,導入到我們Grafana系統就可以直接使用。
推薦的Grafana DashBoard
- JVM (Micrometer)
- Spring Boot 2.1 Statistics
- 主機基礎監控(cpu,內存,磁盤,網絡)
- Node Exporter for Prometheus Dashboard CN
- Druid Connection Pool Dashboard
導入Grafana DashBoard
3.2 添加修改Grafana Panel(擴展)
官方自帶的Spring Boot 2.1 Statistics Dashboard沒有展示第三方請求的數據報表,我們以此為例,添加第三方請求的Client Request Count報表和Client Response Time報表。
Client Request Count
irate(http_client_requests_seconds_count{instance="$instance", application="$application", uri!~".*actuator.*"}[5m])
注意:應用中的Meter的名稱必須為http.client.requests
Client Response Time
irate(http_client_requests_seconds_sum{instance="$instance", application="$application",uri!~".*actuator.*"}[5m]) / irate(http_client_requests_seconds_count{instance="$instance", application="$application",uri!~".*actuator.*"}[5m])
4. Spring Boot 集成Micrometer
Metrics(譯:指標,度量)
Micrometer提供了與供應商無關的接口,包括 timers(計時器), gauges(量規), counters(計數器), distribution summaries(分布式摘要), long task timers(長任務定時器)。它具有維度數據模型,當與維度監視系統結合使用時,可以高效地訪問特定的命名度量,並能夠跨維度深入研究。
4.1 引入依賴
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<version>${micrometer.version}</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
4.2 開啟Prometheus功能
spring:
application:
name: spring-boot-node
management:
metrics:
# 1.添加全局的tags,后面可以作為變量搜索數據
tags:
application: ${spring.application.name}
endpoints:
web:
exposure:
# 2.打開prometheus端點功能
include: 'health,prometheus'
4.3 實現第三方請求的監控
基於OkHttpMetricsEventListener
可以有好的對OkHttp Client
的請求進行監控。
配置OkHttp Client事件監聽
@Bean("okHttpClient")
public OkHttpClient okHttpClient(ConnectionPool connectionPool) {
return new OkHttpClient().newBuilder().connectionPool(connectionPool)
.connectTimeout(5, TimeUnit.SECONDS)
.readTimeout(10, TimeUnit.SECONDS)
.eventListener(eventListener())
.build();
}
/**
* 事件監聽器 OkHttpMetricsEventListener
* metricsProperties.getWeb().getClient().getRequestsMetricName() equals 'http.client.request',可稱為度量。
* @return
*/
private EventListener eventListener(){
return OkHttpMetricsEventListener.builder(
meterRegistry, metricsProperties.getWeb().getClient().getRequestsMetricName())
.build();
}
原理:OkHttpMetricsEventListener.java
public class OkHttpMetricsEventListener extends EventListener {
/**
* Header name for URI patterns which will be used for tag values.
*/
public static final String URI_PATTERN = "URI_PATTERN";
@Override
public void callFailed(Call call, IOException e) {
CallState state = callState.remove(call);
if (state != null) {
state.exception = e;
// 請求完成時,注冊監控數據
time(state);
}
}
@Override
public void responseHeadersEnd(Call call, Response response) {
CallState state = callState.remove(call);
if (state != null) {
state.response = response;
// 請求完成時,注冊監控數據
time(state);
}
}
private void time(CallState state) {
String uri = state.response == null ? "UNKNOWN" :
(state.response.code() == 404 || state.response.code() == 301 ? "NOT_FOUND" : urlMapper.apply(state.request));
// 定義一些Tag或者是變量,在Prometheus和Grafana中可以使用
Iterable<Tag> tags = Tags.concat(extraTags, Tags.of(
"method", state.request != null ? state.request.method() : "UNKNOWN",
"uri", uri,
"status", getStatusMessage(state.response, state.exception),
"host", state.request != null ? state.request.url().host() : "UNKNOWN"
));
// 注冊計時器監控數據,此時Prometheus可以通過Spring Boot Actuator提供的/actuator/promotheus斷點來pull數據
Timer.builder(this.requestsMetricName)
.tags(tags)
.description("Timer of OkHttp operation")
.register(registry)
.record(registry.config().clock().monotonicTime() - state.startTime, TimeUnit.NANOSECONDS);
}
}