Prometheus（七）監控spring boot docker 容器

本文轉載自查看原文 2020-03-20 17:00 632 Prometheus/ Load Balance

主要步驟

通過Prometheus提供的Java client包，在spring boot工程中生成我們關心的業務指標，
將spring boot工程打成docker 鏡像
將docker容器部署到docker swarm集群中
修改Prometheus對應的file_sd_configs文件，將部署的服務追加進去
通過Grafana觀察業務指標

本文主要闡述的是對容器中業務指標的監控，對容器的監控以及環境的搭建參照[Prometheus 監控Docker Swarm](Prometheus 監控Docker Swarm.md)

Prometheus感念

數據模型

Prometheus中的數據都是時間序列值。以相同指標名稱，並且擁有相同label來作為一個觀察量，Prometheus中存的就是各種觀察量在不同時間點的數據，稱為一個時間序列，每個時間點對應的值稱為一個Sample，包含一個float64數值和一個精確到毫秒的時間值。

每個序列通過指標名稱以及相關的label來唯一確定，指標名稱主要來表明需要觀測的系統特性如（http_requests_total表示系統受到請求總數的時間序列），同一指標中的每個key/value對(稱為label)都稱作一個維度，如（http_requests_total{code=200}）代表請求成功的時間序列.

外部應用通過PromQL來對時間序列進行相應的查找，在查詢時還可以利用Prometheus提供的功能豐富的函數對時間序列中的值進行計算。
指標類型

Prometheus提供的客戶端jar包中把指標類型分為如下四種
- Counter
- Gauge
- Histogram
- Summary
這個分類只針對客戶端使用者有效，在Prometheus server端是不進行區分的。對於Prometheus server端而言，客戶端返回的都是時間序列對應的一個Sample，如http_requests_total{code=200} 290，表示Prometheus server拉取指標的這個時間點，請求成功的總數是290次，是一個純文本數據，即便我們不用Prometheus提供的客戶端，只要返回的數據滿足這種格式，Prometheus server就能正常存儲，也可以通過PromQL供外部查詢。
1. Counter
  
  Counter對應的指標值只能是一個單獨的數值，並且除了能在服務啟動時重置外，只能對指標值做累加操作，不能做減法操作，可以用來統計請求次數、任務執行次數、關鍵業務對象操作次數等。
2. Gauge
  
  Gauge對應的指標值只能是一個單獨的數值，與Counter不同的是，可以對Gauge代表的指標值做仁義的加減操作，一般用來表示溫度、正在執行的job等指標
3. Histogram
  
  Histogram 柱狀圖，不再是簡單對指標的sample值進行加減等操作，對於每一個sample值執行下面的三個操作：
  - 根據Histogram定義時指定的bucket區間，將sample分到各個bucket中，每個bucket中存放的是落入這個區間的個數
  - 對每個采樣點值累計和(sum)
  - 對采樣點的次數累計和(count)
  例如我們通過Prometheus提供的客戶端通過Histogram.build().name("job_execute_time").help("job執行時間時間分布（分）").buckets(1,5,10) .register();定義了一個histogram，用來統計job執行時間的分布。對應的buckets是（1，5，10），代表四個區間
  - <=1分鍾
  - <=5分鍾
  - <=10分鍾
  - <無窮大
  Histogram會生成如下6個維度的指標值
```
job_execute_time_bucket{le="1.0",} 
job_execute_time_bucket{le="5.0",} 
job_execute_time_bucket{le="10.0",} 
job_execute_time_bucket{le="+Inf",}
job_execute_time_count 
job_execute_time_sum 
```
  當我們有一個job執行時間為5.6分鍾，則對應的各個維度的值變成
```
job_execute_time_bucket{le="1.0",} 0.0
job_execute_time_bucket{le="5.0",} 0.0
job_execute_time_bucket{le="10.0",} 1.0
job_execute_time_bucket{le="+Inf",} 1.0
job_execute_time_count 1.0
job_execute_time_sum  5.6
```
  無窮大的肯定是和job_execute_time_count一致的
  
  可以看到Histogram類型的指標不會保留各個sample的具體數值，每個bucket中也只是記錄樣本數的counter。
4. Summary 采樣點分位圖統計,類似於histgram，但是采用分位數來將sample分到不同的bucket中，具體的區別查看HISTOGRAMS AND SUMMARIES,個人數學不好，理解的太痛苦了。

構建spring boot工程(2.x)

pom.xml中追加Prometheus相關依賴

<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
	<groupId>io.micrometer</groupId>
	<artifactId>micrometer-core</artifactId>
</dependency>
<dependency>
	<groupId>io.micrometer</groupId>
	<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

<dependency>
	<groupId>io.prometheus</groupId>
	<artifactId>simpleclient</artifactId>
	<version>0.8.1</version>
</dependency>

application.yml 文件中追加啟用Prometheus相關metric，

spring:
  application:
    name: sbprometheus
server:
  port: 8080

management:
  metrics:
    export:
      prometheus:
        enabled: true
  endpoint:
    metrics:
      enabled: true
    prometheus:
      enabled: true
  endpoints:
    web:
      exposure:
        include: ["prometheus","health"]

默認情況 prometheus對應的endpoint是 /actuator/prometheus

定義業務需要的指標


/**
 * 
 */
package chengf.falcon.sb.prometheus;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import io.micrometer.prometheus.PrometheusMeterRegistry;
import io.prometheus.client.Counter;
import io.prometheus.client.Gauge;
import io.prometheus.client.Histogram;
import io.prometheus.client.Summary;

/**
 * @author: 作者： chengaofeng
 * @date: 創建時間：2020-03-20 12:04:20
 * @Description: TODO
 * @version V1.0
 */
@Configuration
public class MetricConfig {

	@Autowired
	PrometheusMeterRegistry registry;

	@Value("${spring.application.name}")
	String appName;

	@Bean
	public Counter operatorCount() {
		return Counter.build().name(appName + "_class_operator_count").help("操作總次數").labelNames("className")
				.register(registry.getPrometheusRegistry());
	}

	@Bean
	public Gauge runningJob() {
		return Gauge.build().name(appName + "_running_job_count").help("正在運行的job數")
				.register(registry.getPrometheusRegistry());
	}

	@Bean
	public Histogram executeTime() {
		return Histogram.build().name(appName + "_job_execute_time").help("job執行時間時間分布（分）").buckets(1,5,10)
				.register(registry.getPrometheusRegistry());
	}
	
	@Bean
	public Summary timeQuantile() {
		return Summary.build().name(appName + "_job_execute_time_quantile").help("job執行時間時間分布（分）").quantile(0.5, 0.05).quantile(0.9, 0.01)
				.register(registry.getPrometheusRegistry());
	}

}

業務代碼中更新指標（通過resturl模擬實際的操作）

/**
 * 
 */
package chengf.falcon.sb.prometheus;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import io.prometheus.client.Counter;
import io.prometheus.client.Gauge;
import io.prometheus.client.Histogram;
import io.prometheus.client.Summary;

/**
 * @author: 作者： chengaofeng
 * @date: 創建時間：2020-03-03 19:02:43
 * @Description: TODO
 * @version V1.0
 */
@RestController
public class MetricController {


	
	@Autowired
	Counter operatorCount;
	
	@Autowired
	Gauge runningJob;
	
	@Autowired
	Histogram executeTime;
	
	@Autowired
	Summary timeQuantile;

	@RequestMapping("/counter/{className}")
	public String counter(@PathVariable String className) {
		operatorCount.labels(className).inc();
		return "COUNTER";
	}
	
	@RequestMapping("/guage/{number}")
	public String guage(@PathVariable int number) {
		runningJob.set(number);
		return "guage";
	}
	
	@RequestMapping("/histogram/{time}")
	public String histogram(@PathVariable double time) {
		executeTime.observe(time);
		return "histogram";
	}
	
	@RequestMapping("/summary/{time}")
	public String summary(@PathVariable double time) {
		timeQuantile.observe(time);
		return "summary";
	}
	
	
}

啟動sprong-boot工程，訪問上面的restful接口幾次，然后訪問/actuator/prometheus查看指標情況,下面貼出一個樣例（去除掉spring 自動給我們生成的各種指標）

# HELP sbprometheus_job_execute_time_quantile job執行時間時間分布（分）
# TYPE sbprometheus_job_execute_time_quantile summary
sbprometheus_job_execute_time_quantile{quantile="0.5",} 5.0
sbprometheus_job_execute_time_quantile{quantile="0.9",} 13.0
sbprometheus_job_execute_time_quantile_count 11.0
sbprometheus_job_execute_time_quantile_sum 120.0

# HELP sbprometheus_job_execute_time job執行時間時間分布（分）
# TYPE sbprometheus_job_execute_time histogram
sbprometheus_job_execute_time_bucket{le="1.0",} 2.0
sbprometheus_job_execute_time_bucket{le="5.0",} 3.0
sbprometheus_job_execute_time_bucket{le="10.0",} 3.0
sbprometheus_job_execute_time_bucket{le="+Inf",} 3.0
sbprometheus_job_execute_time_count 3.0
sbprometheus_job_execute_time_sum 6.5

# HELP sbprometheus_class_operator_count 操作總次數
# TYPE sbprometheus_class_operator_count counter
sbprometheus_class_operator_count{className="transform",} 2.0
sbprometheus_class_operator_count{className="sub",} 1.0

# HELP sbprometheus_running_job_count 正在運行的job數
# TYPE sbprometheus_running_job_count gauge
sbprometheus_running_job_count 10.0

生成docker鏡像

因為在spring-boot工程的pom中，我們追加了spring-boot-maven-plugin這個插件，所以執行mvn package后會生成一個獨立的可執行jar，所以制作鏡像時，只用基於openjdk的鏡像，再把這個jar copy進去，之后啟動就可以了.

構建(在工程目錄下）

$ mvn clean package
$ cd target 
$ cat > Dockerfile<<EOF
#基礎鏡像基於openjdk，利用alpine
FROM openjdk:8u212-jdk-alpine
#所屬團隊
MAINTAINER chengf

#將編譯好的工程jar包copy到鏡像容器中
ENV TARGET_JAR="sb-prometheus-0.0.1-SNAPSHOT.jar"
COPY ${TARGET_JAR} /usr/src/${TARGET_JAR}


# 工作目錄
WORKDIR /usr/src
#程序入口
RUN echo "java -jar \${TARGET_JAR}  > start.sh \
             && chmod 777 start.sh
CMD ./start.sh

EOF

$ docker build -t sb-prometheus:0.0.1 .

啟動鏡像，測試鏡像是否正確
```
docker run --rm --name sb-prometheus -p 8080:8080 sb-prometheus:0.0.1
```
- 訪問一些我們定義的restful接口以及/actuator/prometheus，看是否正常工作

部署到docker swarm集群

編輯stack文件

$ cd /opt/k8s/prometheus
$ cat> sb-prom-stack.yml<<EOF
version: "3"
services:
  sbprometheus:
    image: sb-prometheus:0.0.1
    networks:
  		- mcsas-network
    deploy:
      restart_policy:
        condition: on-failure
  
networks:
  mcsas-network: 
    external: true
EOF

啟動服務

$ docker stack deploy -c sb-prom-stack.yml sbprom

修改Prometheus的`file_sd_configs`中配置的文件

在[Prometheus 監控Docker Swarm](Prometheus 監控Docker Swarm.md)中，我們在prometheus的配置文件中指定了如下配置段：

- job_name: 'springboot'
  metrics_path: /actuator/prometheus
  file_sd_configs:
  - files:
	 - /etc/prometheus/service.yaml

所以只用在掛載目錄下創建service.yaml，並追加我們要監控的服務即可

$ cd /opt/k8s/prometheus/conf
$ cat>service.yaml<<EOF
- targets: ['sbprometheus:8080']
EOF

修改完成后，通過Prometheus服務暴露的端口查看指標分類，可以發現我們自定義的業務指標已經被Prometheus獲取到

因為我們業務容器沒有暴露出來端口，所以為了演示，進入容器內部，通過wget訪問我們的restful接口,產生一些指標數據


$ docker ps |grep sbprometheus
8dbafd80573b        sb-prometheus:0.0.1                               "/bin/sh -c ./start.…"   44 minutes ago      Up 44 minutes                               sbprom_sbprometheus.1.kuzpe4he7j2iz9i43cwrrxh3x
$ docker exec -it 8dbafd80573b sh
/usr/src # wget -q -O - http://localhost:8080/summary/66
/usr/src # wget -q -O - http://localhost:8080/counter/tranform
/usr/src # wget -q -O - http://localhost:8080/counter/sub
/usr/src # wget -q -O - http://localhost:8080/histogram/1
/usr/src # wget -q -O - http://localhost:8080/histogram/3.4