一、prometheus安裝
1、下載二進制包
cd /opt wget https://github.com/prometheus/prometheus/releases/download/v2.5.0/prometheus-2.5.0.linux-amd64.tar.gz tar zxf prometheus-2.5.0.linux-amd64.tar.gz
2、配置
1) 啟動參數:
--version Show application version. --config.file="prometheus.yml" Prometheus configuration file path. --web.listen-address="0.0.0.0:9090" Address to listen on for UI, API, and telemetry. --web.read-timeout=5m Maximum duration before timing out read of the request, and closing idle connections. --web.max-connections=512 Maximum number of simultaneous connections. --web.external-url=<URL> The URL under which Prometheus is externally reachable (for example, if Prometheus is served via a reverse proxy). Used for generating relative and absolute links back to Prometheus itself. If the URL has a path portion, it will be used to prefix all HTTP endpoints served by Prometheus. If omitted, relevant URL components will be derived automatically. --web.route-prefix=<path> Prefix for the internal routes of web endpoints. Defaults to path of --web.external-url. --web.user-assets=<path> Path to static asset directory, available at /user. --web.enable-lifecycle Enable shutdown and reload via HTTP request. --web.enable-admin-api Enables API endpoints for admin control actions. --web.console.templates="consoles" Path to the console template directory, available at /consoles. --web.console.libraries="console_libraries" Path to the console library directory. --storage.tsdb.path="data/" Base path for metrics storage. --storage.tsdb.retention=15d How long to retain samples in the storage. --storage.tsdb.no-lockfile Do not create lockfile in data directory. --alertmanager.notification-queue-capacity=10000 The capacity of the queue for pending alert manager notifications. --alertmanager.timeout=10s Timeout for sending alerts to Alertmanager. --query.lookback-delta=5m The delta difference allowed for retrieving metrics during expression evaluations. --query.timeout=2m Maximum time a query may take before being aborted. --query.max-concurrency=20 Maximum number of queries executed concurrently. --log.level=info Only log messages with the given severity or above. One of: [debug, info, warn, error]
2)啟動腳本,日志重定向到/data0/logs/prometheus.log
# /usr/lib/systemd/system/prometheus.service [Unit] Description=Prometheus Server Documentation=https://prometheus.io/docs/introduction/overview/ After=network-online.target [Service] User=root ExecStart=/bin/sh -ce "/opt/prometheus-2.5.0.linux-amd64/prometheus --web.enable-admin-api \ --config.file=/opt/prometheus-2.5.0.linux-amd64/prometheus.yml \ --web.external-url=http://localhost:9090 --storage.tsdb.path=/data0/prometheus \ --storage.tsdb.retention=14d >> /data0/logs/prometheus.log 2>&1 " [Install] WantedBy=multi-user.target
3)prometheus配置變更: alertmanager移入配置文件,alertrule變為yml語法
rule_files:
- "alertrules.yml" # - "second_rules.yml" alerting: alertmanagers: - static_configs: - targets: - localhost:9093
4)alertrule.yml
groups: - name: all-instance-health interval: 30s # defaults to global interval rules: - alert: InstanceDown expr: up == 0 for: 5m labels: severity: critical annotations: summary: "Instance {{ $labels.instance }} down" description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
二、alertmanager安裝
1、下載二進制包
wget https://github.com/prometheus/alertmanager/releases/download/v0.22.2/alertmanager-0.22.2.linux-amd64.tar.gz cd /opt tar zxvf alertmanager-0.22.2.linux-amd64.tar.gz
2、啟動alertmanager
nohup /opt/alertmanager-0.22.2.linux-amd64/alertmanager --config.file /opt/alertmanager-0.22.2.linux-amd64/simple.yml --web.external-url http://127.0.0.1:9093 &
三、集成consul
1、在prometheus添加配置
jvm監控
- job_name: jmx_status params: module: - jmx_status scrape_interval: 8s scrape_timeout: 8s metrics_path: /_/metrics consul_sd_configs: - server: 127.0.0.1:8500 tag_separator: ',' services: - jmx_status relabel_configs: - source_labels: ['__address__'] target_label: instance regex: '(.*):.*' replacement: $1
mysql監控
- job_name: mysql-hr metrics_path: /metrics-hr #從目標列表中抓取度量指標的http資源路徑, 默認為/metrics basic_auth: #在`Authorization`頭部設置每次抓取請求的用戶名和密碼 username: admin # 請求的url的用戶名和密碼 password: admin tls_config: insecure_skip_verify: true #配置抓取請求的TLS設置 consul_sd_configs: #Consul服務配置列表 - server: '127.0.0.1:8500' datacenter: dc1 services: ['mysql:metrics'] # 從consul服務獲取mysql:metrics對應的值 relabel_configs: - target_label: 'job' #修改job標簽內容 replacement: 'mysql' #更改為mysql - source_labels: ['__meta_consul_tags'] #匹配服務包含的所有標簽信息 regex: '.*,alias_([-\w:\.]+),.*' target_label: 'instance' #修改instance標簽 replacement: '$1' # $1表示regex括號里匹配的內容
2、consul注冊
hostname curl -X PUT -d '{"id": "hostname","name": "jmx_status","address": "hostname","port": 8080 ,"tags":["hostname:8080"],"checks": [{"http": "http://hostname:8080/","interval": "5s"}]}' http://10.221.13.7:8500/v1/agent/service/register curl -X PUT -d '{"id": "upload01","name": "jmx_status","address": "upload02","port": 8080 ,"tags":["upload02:8080"],"checks": [{"http": "http://upload02:8080/","interval": "5s"}]}' http://10.221.13.7:8500/v1/agent/service/register curl -X PUT -d '{"id": "upload01","name": "jmx_status","address": "upload01","port": 8080 ,"tags":["upload01:8080"],"checks": [{"http": "http://upload01:8080/","interval": "5s"}]}' http://10.221.13.7:8500/v1/agent/service/register
3、consul注銷
curl -X PUT http://10.221.13.7:8500/v1/agent/service/deregister/upload01