Prometheus
1、不是很友好,各種配置都手寫
2、對docker和k8s監控有成熟解決方案
Prometheus(普羅米修斯)
是一個最初在SoudCloud上構建的監控系統,開源項目,擁有非常活躍的開發人員和用戶社區,2016年加入雲原生雲計算基金會(CNCF)成為繼kubernetes之后的第二個托管項目
https://Prometheus.io
https://github.com./Prometheus
Prometheus 特點
l 多維數據模型:由度量名稱和鍵值對標識的時間序列數據
l PromSQL:一種靈活的查詢語言,可以利用多維數據完成復雜查詢
l 不依賴分布式存儲,當服務器節點可以直接工作
l 基於HTTP的pull方式進行采集時間序列數據
l 推送時間序列數據通過pushgateway組件支持
l 通過服務發現或靜態配置發現目標
l 多種如下模式儀表盤執行(grafana)
功能簡介
l Prometheus Server:收集指標和存儲時間序列數據,並提供查詢接口
l ClientLibrary:客戶端庫
l Push Gateway:短期存儲指標數據,主要用於臨時性任務
l Exporters:采集已有的第三方服務監控指標並暴露metrics
l Alertmanager:告警
l Web UI:簡單的Wbe控制台
Prometheus概述
概念:
l 實例:可以抓取的目標稱為實例(Instances)
l 作業:具有相同目標的實例集合稱為作業(Job)
機器規划
CentOS 7.X |
192.168.10.110 |
安裝docker |
安裝promethues |
CentOS 7.X |
192.168.10.113 |
安裝docker |
安裝cAdvisor |
基礎優化
1、時間同步 echo "#time sync by fage at 2020-7-22" >>/var/spool/cron/root && echo "*/5 * * * * /usr/sbin/ntpdate ntp1.aliyun.com >/dev/null 2>&1" >>/var/spool/cron/root && systemctl restart crond.service 2、關閉防火牆和selinux systemctl stop firewalld && systemctl disable firewalld && setenforce 0 && sed -i s#SELINUX=enforcing#SELINUX=disable#g /etc/selinux/config 3、安裝基礎軟件 yum install -y lrzsz nmap tcpdump screen tree dos2unix nc iproute net-tools unzip wget vim bash-completion.noarch telnet ntp ntpdate lsof curl 4、更換國內原 wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo 5、安裝docker yum install -y yum-utils device-mapper-persistent-data lvm2 wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo yum makecache fast yum install -y docker-ce-19.03.12 mkdir -p /etc/docker cat > /etc/docker/daemon.json << EOF { "registry-mirrors": ["https://b9pmyelo.mirror.aliyuncs.com"] } EOF systemctl daemon-reload && systemctl enable docker systemctl start docker && systemctl status docker docker info
使用docker部署,首先對系統有一定要求,必須是centos7.x以上版本,安裝好docker
prometheus.yml文件內容
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Attach these labels to any time series or alerts when communicating with # external systems (federation, remote storage, Alertmanager). external_labels: monitor: 'codelab-monitor' # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first.rules" # - "second.rules" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090'] - job_name: "docker" static_configs: - targets: ['192.168.10.190:8080'] - job_name: "Linux" static_configs: - targets: ['192.168.10.190:9100']
docker部署
參考地址:https://prometheus.io/docs/prometheus/latest/installation/
docker run -d \ -p 9090:9090 \ --name prometheus \ -v /tmp/prometheus.yml:/etc/prometheus/prometheus.yml \ prom/prometheus
Prometheus+Grafana監控Docker 主機
cAdvisor(Container Advisor)用於手機正在運行的容器資源使用和性能信息
Grafana 是一個開源的度量分析和可視化系統
模板獲取地址:https://grafana.com/grafana/download
運行容器,如果沒有這個鏡像會去docker官方去拉取
docker run -d -p 80:80 --name nginx nginx
監控指標:
內存、CPU、內存、網絡、業務狀態
查看docker容器運行狀態(動態)
docker stats nginx
查看docker容器運行狀態(靜態)
docker stats --no-stream nginx
Docker部署cAdvisor
獲取狀態通過掛載目錄去獲取
參考地址:https://github.com/google/cadvisor
docker run -d \ --volume=/:/rootfs:ro \ --volume=/var/run:/var/run:ro \ --volume=/sys:/sys:ro \ --volume=/var/lib/docker/:/var/lib/docker:ro \ --volume=/dev/disk/:/dev/disk:ro \ --publish=8080:8080 \ --detach=true \ --name=cadvisor \ --privileged \ --device=/dev/kmsg \ google/cadvisor:v0.33.0
需要注意的是,cAdvisor沒有存儲數據的機制,所以需要采集到Prometheus上去存儲
圖上的容器指標也可以通過地址查看到:http://192.168.10.113:8080/metrics
里面的數據模型都是一個指標內容,解析出來了才能放入到promethues的實時數據庫里面
修改promethues配置文件,讓promethues采集被監控的數據
cat /tmp/prometheus.yml
global: scrape_interval: 15s evaluation_interval: 15s alerting: alertmanagers: - static_configs: - targets: rule_files: scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: "docker" static_configs: - targets: ['192.168.10.113:8080']
重啟容器
docker ps -a | grep prometheus
docker restart prometheus
使用grafana去展示數據
Docker部署Grafana
docker run -d \ --name=grafana \ -p 3000:3000 \ grafana/grafana
首選打開地址和端口后輸入賬號和密碼,默認賬號:admin 密碼admin
進去就會出現導航,添加數據源,新建儀表盤,用戶和擴展插件
儀表盤獲取地址:https://grafana.com/grafana/download
監控docker主機
下載地址:https://github.com/prometheus/node_exporter/
執行這個腳本即可安裝import
cat node_exporter.sh
#!/bin/bash wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz tar xf node_exporter-0.17.0.linux-amd64.tar.gz mv node_exporter-0.17.0.linux-amd64 /usr/local/node_exporter cat <<EOF >/usr/lib/systemd/system/node_exporter.service [Unit] Description=https://prometheus.io [Service] Restart=on-failure ExecStart=/usr/local/node_exporter/node_exporter --collector.systemd --collector.systemd.unit-whitelist=(docker|kubelet|kube-proxy|flanneld).service [Install] WantedBy=mulit-user.target EOF systemctl daemon-reload systemctl enable node_exporter.service systemctl start node_exporter.service bash node_exporter.sh
檢查端口和服務是否正常,默認使用9100端口
systemctl status node_exporter.service ps -ef | grep node_exporter netstat -ntpul|grep 9100
重新修改prometheus配置文件,添加監控項目
cat /tmp/prometheus.yml
global: scrape_interval: 15s evaluation_interval: 15s alerting: alertmanagers: - static_configs: - targets: rule_files: scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: "docker" static_configs: - targets: ['192.168.10.113:8080'] - job_name: "Linux" static_configs: - targets: ['192.168.10.113:9100']
重啟服務
docker ps | grep prometheus docker restart prometheus
導入Linux模板,模板ID為:9276
如果不出圖:
1、網卡不出流量,可能是網卡名配置問題以前的eth0,現在可能是ens33等等 2、時間不對也會不出圖 3、promql語句不正確,如網絡沒有數據:node_network_receive_bytes_total ,把這個語句拿到promtheus網頁進行測試是否能獲取到數據