為了能夠獲取到Docker容器的運行狀態,用戶可以通過Docker的stats命令獲取到當前主機上運行容器的統計信息,可以查看容器的CPU利用率、內存使用量、網絡IO總量以及磁盤IO總量等信息。
除了使用命令以外,用戶還可以通過Docker提供的HTTP API查看容器詳細的監控統計信息。
CAdvisor是Google開源的一款用於展示和分析容器運行狀態的可視化工具。通過在主機上運行CAdvisor用戶可以輕松的獲取到當前主機上容器的運行統計信息,並以圖表的形式向用戶展示。
在本地運行CAdvisor也非常簡單,直接運行一下命令即可:
docker run \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--publish=8080:8080 \
--detach=true \
--name=cadvisor \
google/cadvisor:latest
但是因為主機的8080端口被占用了,所以把上面的命令修改成如下的:
docker run \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--publish=9095:9095 \
--detach=true \
--name=cadvisor \
google/cadvisor:latest
但是啟動后進行查看會有倆端口存在,一個時8080,另一個是9095.
通過如下步驟登陸到docker容器中查看命令的選項,會有一個-port
參數,並且官網中也有明確的說明:
但是在使用的時候,卻沒法使用這個參數。
因此放棄使用docker方式部署,改用二進制的方式。
進入容器中查看命令選項
# docker exec -it cadvisor /bin/sh
/ # cd /usr/bin/
/usr/bin # ./cadvisor --help
Usage of ./cadvisor:
-allow_dynamic_housekeeping
Whether to allow the housekeeping interval to be dynamic (default true)
-alsologtostderr
log to standard error as well as files
-application_metrics_count_limit int
Max number of application metrics to store (per container) (default 100)
-boot_id_file string
Comma-separated list of files to check for boot-id. Use the first one that exists. (default "/proc/sys/kernel/random/boot_id")
-bq_account string
Service account email
-bq_credentials_file string
Credential Key file (pem)
-bq_id string
Client ID
-bq_project_id string
Bigquery project ID
-bq_secret string
Client Secret (default "notasecret")
-collector_cert string
Collector's certificate, exposed to endpoints for certificate based authentication.
-collector_key string
Key for the collector's certificate
-container_hints string
location of the container hints file (default "/etc/cadvisor/container_hints.json")
-containerd string
containerd endpoint (default "unix:///var/run/containerd.sock")
-disable_metrics metrics
comma-separated list of metrics to be disabled. Options are 'disk', 'network', 'tcp', 'udp', 'percpu', 'sched', 'process'. Note: tcp and udp are disabled by default due to high CPU usage. (default process,tcp,udp,sched)
-docker string
docker endpoint (default "unix:///var/run/docker.sock")
-docker-tls
use TLS to connect to docker
-docker-tls-ca string
path to trusted CA (default "ca.pem")
-docker-tls-cert string
path to client certificate (default "cert.pem")
-docker-tls-key string
path to private key (default "key.pem")
-docker_env_metadata_whitelist string
a comma-separated list of environment variable keys that needs to be collected for docker containers
-docker_only
Only report docker containers in addition to root stats
-docker_root string
DEPRECATED: docker root is read from docker info (this is a fallback, default: /var/lib/docker) (default "/var/lib/docker")
-enable_load_reader
Whether to enable cpu load reader
-event_storage_age_limit string
Max length of time for which to store events (per type). Value is a comma separated list of key values, where the keys are event types (e.g.: creation, oom) or "default" and the value is a duration. Default is applied to all non-specified event types (default "default=24h")
-event_storage_event_limit string
Max number of events to store (per type). Value is a comma separated list of key values, where the keys are event types (e.g.: creation, oom) or "default" and the value is an integer. Default is applied to all non-specified event types (default "default=100000")
-global_housekeeping_interval duration
Interval between global housekeepings (default 1m0s)
-housekeeping_interval duration
Interval between container housekeepings (default 1s)
-http_auth_file string
HTTP auth file for the web UI
-http_auth_realm string
HTTP auth realm for the web UI (default "localhost")
-http_digest_file string
HTTP digest file for the web UI
-http_digest_realm string
HTTP digest file for the web UI (default "localhost")
-listen_ip string
IP to listen on, defaults to all IPs
-log_backtrace_at value
when logging hits line file:N, emit a stack trace
-log_cadvisor_usage
Whether to log the usage of the cAdvisor container
-log_dir string
If non-empty, write log files in this directory
-log_file string
If non-empty, use this log file
-logtostderr
log to standard error instead of files
-machine_id_file string
Comma-separated list of files to check for machine-id. Use the first one that exists. (default "/etc/machine-id,/var/lib/dbus/machine-id")
-max_housekeeping_interval duration
Largest interval to allow between container housekeepings (default 1m0s)
-max_procs int
max number of CPUs that can be used simultaneously. Less than 1 for default (number of cores).
-mesos_agent string
Mesos agent address (default "127.0.0.1:5051")
-mesos_agent_timeout duration
Mesos agent timeout (default 10s)
-port int
port to listen (default 8080)
-profiling
Enable profiling via web interface host:port/debug/pprof/
-prometheus_endpoint string
Endpoint to expose Prometheus metrics on (default "/metrics")
-skip_headers
If true, avoid header prefixes in the log messages
-stderrthreshold value
logs at or above this threshold go to stderr (default 2)
-storage_driver driver
Storage driver to use. Data is always cached shortly in memory, this controls where data is pushed besides the local cache. Empty means none. Options are: <empty>, bigquery, elasticsearch, influxdb, kafka, redis, statsd, stdout
-storage_driver_buffer_duration duration
Writes in the storage driver will be buffered for this duration, and committed to the non memory backends as a single transaction (default 1m0s)
-storage_driver_db string
database name (default "cadvisor")
-storage_driver_es_enable_sniffer
ElasticSearch uses a sniffing process to find all nodes of your cluster by default, automatically
-storage_driver_es_host string
ElasticSearch host:port (default "http://localhost:9200")
-storage_driver_es_index string
ElasticSearch index name (default "cadvisor")
-storage_driver_es_type string
ElasticSearch type name (default "stats")
-storage_driver_host string
database host:port (default "localhost:8086")
-storage_driver_influxdb_retention_policy string
retention policy
-storage_driver_kafka_broker_list string
kafka broker(s) csv (default "localhost:9092")
-storage_driver_kafka_ssl_ca string
optional certificate authority file for TLS client authentication
-storage_driver_kafka_ssl_cert string
optional certificate file for TLS client authentication
-storage_driver_kafka_ssl_key string
optional key file for TLS client authentication
-storage_driver_kafka_ssl_verify
verify ssl certificate chain (default true)
-storage_driver_kafka_topic string
kafka topic (default "stats")
-storage_driver_password string
database password (default "root")
-storage_driver_secure
use secure connection with database
-storage_driver_table string
table name (default "stats")
-storage_driver_user string
database username (default "root")
-storage_duration duration
How long to keep data stored (Default: 2min). (default 2m0s)
-store_container_labels
convert container labels and environment variables into labels on prometheus metrics for each container. If flag set to false, then only metrics exported are container name, first alias, and image name (default true)
-v value
log level for V logs
-version
print cAdvisor version and exit
-vmodule value
comma-separated list of pattern=N settings for file-filtered logging
使用二進制方式部署
cd /home/cadvisor-0.37.0
wget https://github.com/google/cadvisor/releases/download/v0.37.0/cadvisor
# 普通本地運行:./cadvisor -port=8080 &>>/var/log/cadvisor.log
使用service服務管理程序
# chown -R prometheus:prometheus /home/cadvisor-0.37.0
# chmod -R 777 /home/cadvisor-0.37.0 #防止因為selinux出現這個啟動錯誤:Failed at step EXEC spawning /home/cadvisor-0.37.0/cadvisor: Permission denied
# vim /usr/lib/systemd/system/cadvisor.service
[Unit]
Description=cadvisor
Documentation=https://github.com/google/cadvisor/tree/master/docs
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/home/cadvisor-0.37.0/cadvisor -port 9096
Restart=on-failure
[Install]
WantedBy=multi-user.target
通過訪問http://localhost:9096可以查看,當前主機上容器的運行狀態,如下所示:
下面表格中列舉了一些CAdvisor中獲取到的典型監控指標:
指標名稱 | 類型 | 含義 |
---|---|---|
container_cpu_load_average_10s | gauge | 過去10秒容器CPU的平均負載 |
container_cpu_usage_seconds_total | counter | 容器在每個CPU內核上的累積占用時間 (單位:秒) |
container_cpu_system_seconds_total | counter | System CPU累積占用時間(單位:秒) |
container_cpu_user_seconds_total | counter | User CPU累積占用時間(單位:秒) |
container_fs_usage_bytes | gauge | 容器中文件系統的使用量(單位:字節) |
container_fs_limit_bytes | gauge | 容器可以使用的文件系統總量(單位:字節) |
container_fs_reads_bytes_total | counter | 容器累積讀取數據的總量(單位:字節) |
container_fs_writes_bytes_total | counter | 容器累積寫入數據的總量(單位:字節) |
container_memory_max_usage_bytes | gauge | 容器的最大內存使用量(單位:字節) |
container_memory_usage_bytes | gauge | 容器當前的內存使用量(單位:字節 |
container_spec_memory_limit_bytes | gauge | 容器的內存使用量限制 |
machine_memory_bytes | gauge | 當前主機的內存總量 |
container_network_receive_bytes_total | counter | 容器網絡累積接收數據總量(單位:字節) |
container_network_transmit_bytes_total | counter | 容器網絡累積傳輸數據總量(單位:字節) |
與Prometheus集成
修改/etc/prometheus/prometheus.yml,將cAdvisor添加監控數據采集任務目標當中:
- job_name: cadvisor
static_configs:
- targets:
- localhost:9096
重啟Prometheus服務,查看