Ceph dashboard 是通過一個web 界面,對已經運行的ceph 集群進行狀態查看及功能配置等功能,早期ceph 使用的是第三方的dashboard 組件。
Ceph 的監控可視化界面方案很多----grafana、Kraken。但是從Luminous開始,Ceph 提供了原生的Dashboard功能,通過Dashboard可以獲取Ceph集群的各種基本狀態信息。 mimic版 (nautilus版) dashboard 安裝。如果是 (nautilus版) 需要安裝 ceph-mgr-dashboard 。
Ceph-Dash 是用Python 開發的一個Ceph 的監控面板,用來監控Ceph 的運行狀態。同時提供REST API 來訪問狀態數據。
優點:
易部署 輕量級 靈活(可以自定義開發功能)
缺點:
功能相對簡單
1)啟用dashboard插件
https://docs.ceph.com/en/mimic/mgr/
https://docs.ceph.com/en/latest/mgr/dashboard/
https://packages.debian.org/unstable/ceph-mgr-dashboard #15 版本有依賴需要單獨解決 Ceph mgr 是一個多插件( 模塊化) 的組件, 其組件可以單獨的啟用或關閉。
新版本需要安裝dashboard 安保,而且必須安裝在mgr 節點,否則報錯如下
The following packages have unmet dependencies: ceph-mgr-dashboard : Depends: ceph-mgr (= 15.2.13-1~bpo10+1) but it is not going to be installed E: Unable to correct problems, you have held broken packages.
在ceph-mgr01上部署dashboard
root@ceph-mgr01:~# apt-cache madison ceph-mgr-dashboard root@ceph-mgr01:~# apt install ceph-mgr-dashboard
在ceph集群管理端ceph-deploy上查看ceph的模塊信息
cephadmin@ceph-deploy:~$ ceph mgr module -h #查看幫助 cephadmin@ceph-deploy:~$ ceph mgr module ls #列出所有模塊 { "always_on_modules": [ "balancer", "crash", "devicehealth", "orchestrator", "pg_autoscaler", "progress", "rbd_support", "status", "telemetry", "volumes" ], "enabled_modules": [ "iostat", "nfs", "restful" ], "disabled_modules": [ #沒有啟用的模塊 { "name": "alerts", "can_run": true, "error_string": "", ...... }, ...... { "name": "dashboard", #模塊名稱 "can_run": true, #是否可以啟用 "error_string": "" ...... } ......
在ceph管理端ceph-deploy上啟用dashboard模塊
cephadmin@ceph-deploy:~$ ceph mgr module enable dashboard cephadmin@ceph-deploy:~$ ceph mgr module ls | less { "always_on_modules": [ "balancer", "crash", "devicehealth", "orchestrator", "pg_autoscaler", "progress", "rbd_support", "status", "telemetry", "volumes" ], "enabled_modules": [ "dashboard", #dashboard模塊已經啟用了 "iostat", "nfs", "restful" ],
Ceph dashboard 在mgr 節點進行開啟設置,並且可以配置開啟或者關閉SSL,如下:
在集群管理端ceph-deploy上操作
#關閉ssl cephadmin@ceph-deploy:~$ ceph config set mgr mgr/dashboard/ssl false #設置dashboard的監聽地址,這里設置為ceph-mgr01的地址 cephadmin@ceph-deploy:~$ ceph config set mgr mgr/dashboard/ceph-mgr01/server_addr 172.168.32.102 #指定dashboard的監聽端口為9009 cephadmin@ceph-deploy:~$ ceph config set mgr mgr/dashboard/ceph-mgr01/server_port 9009 #驗證集群狀態,第一次啟用dashboard 插件需要等一段時間(幾分鍾),再去被啟用的節點驗證。 cephadmin@ceph-deploy:~/ceph-cluster$ ceph -s cluster: id: c31ea2e3-47f7-4247-9d12-c0bf8f1dfbfb health: HEALTH_OK services: mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 (age 19h) mgr: ceph-mgr01(active, since 2s), standbys: ceph-mgr02 mds: 2/2 daemons up, 2 standby osd: 16 osds: 16 up (since 32h), 16 in (since 32h) rgw: 2 daemons active (2 hosts, 1 zones) data: volumes: 1/1 healthy pools: 9 pools, 241 pgs objects: 279 objects, 13 KiB usage: 664 MiB used, 799 GiB / 800 GiB avail pgs: 241 active+clean #在ceph-mgr01上查看端口與進程 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME ceph-mgr 24275 ceph 32u IPv4 358492 0t0 TCP ceph-mgr01:9009 (LISTEN)
ceph集群報錯:
Module 'dashboard' has failed: error('No socket could be created',) #需要檢查mgr 服務是否正常運行,可以重啟一遍mgr 服務
dashboard的訪問驗證
#創建登錄的密碼文件 cephadmin@ceph-deploy:~/ceph-cluster$ touch dashboard-passwd #向密碼文件中寫入密碼 cephadmin@ceph-deploy:~/ceph-cluster$ echo 123456 > dashboard-passwd #設備ywx用戶並導入密碼密碼 cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard set-login-credentials ywx -i dashboard-passwd ****************************************************************** *** WARNING: this command is deprecated. *** *** Please use the ac-user-* related commands to manage users. *** ****************************************************************** Username and password updated
dashboard的命令格式
cephadmin@ceph-deploy:~/ceph-cluster$ceph dashboard set-login-credentials -h #命令格式 Monitor commands: ================= dashboard set-login-credentials <username> Set the login credentials. Password read from -i <file> #修改ywx的dashboard的密碼為123456789 cephadmin@ceph-deploy:~/ceph-cluster$ echo 123456789 > dashboard-passwd cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard set-login-credentials ywx -i dashboard-passwd ****************************************************************** *** WARNING: this command is deprecated. *** *** Please use the ac-user-* related commands to manage users. *** ****************************************************************** Username and password updated #重新登錄成功
主機信息
mon信息
pool信息
ceph rbd信息
cephfs信息
如果要使用SSL 訪問。則需要配置簽名證書。證書可以使用ceph 命令生成,或是opessl 命令生成。
https://docs.ceph.com/en/latest/mgr/dashboard/
配置dashboard的ssl
#ceph 自簽名證書: cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard create-self-signed-cert Self-signed certificate created #啟用ssl cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ssl true #查看當前dashboard的狀態 cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr services { "dashboard": "http://172.168.32.102:9009/" } #重啟mgr服務 root@ceph-mgr01:~# systemctl restart ceph-mgr@ceph-mgr01 #再次查看dashboard的狀態 cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr services { "dashboard": "https://172.168.32.102:8443/" }
2、prometheus監控ceph node節點
#prometheus服務端 prometheus-2.23.0.linux-amd64.tar.gz #node_exporter node_exporter-1.0.1.linu#prometheus服務端
在ceph-deploy上部署promethus
#從官網下載 prometheus-2.23.0.linux-amd64.tar.gz #部署promethus mkdir /apps cd /apps tar xvf prometheus-2.23.0.linux-amd64.tar.gz ln -sv /apps/prometheus-2.23.0.linux-amd64 /apps/prometheus #編寫prometheus的啟動文件 cat >> /etc/systemd/system/prometheus.service <<EOF [Unit] Description=Prometheus Server Documentation=https://prometheus.io/docs/introduction/overview/ After=network.target [Service] Restart=on-failure WorkingDirectory=/apps/prometheus/ ExecStart=/apps/prometheus/prometheus --config.file=/apps/prometheus/prometheus.yml [Install] WantedBy=multi-user.target EOF #啟動prometheus systemctl daemon-reload systemctl restart prometheus systemctl enable prometheus #驗證prometheus root@ceph-deploy:/apps# ps -ef|grep prometheus root 16366 1 0 23:31 ? 00:00:00 /apps/prometheus/prometheus --config.file=/apps/prometheus/prometheus.yml root@ceph-deploy:/apps# ss -antlp|grep prometheus LISTEN 0 20480 *:9090 *:* users:(("prometheus",pid=16366,fd=11))
172.168.32.101:9090
#從官網下載 node_exporter-1.0.1.linux-amd64.tar.gz #在ceph的所有ceph-node上部署node_exporter mkdir /apps cd /apps/ tar xvf node_exporter-1.0.1.linux-amd64.tar.gz ln -sv /apps/node_exporter-1.0.1.linux-amd64 /apps/node_exporter #編寫node_exporter的啟動文件 cat >> /etc/systemd/system/node-exporter.service << EOF [Unit] Description=Prometheus Node Exporter After=network.target [Service] ExecStart=/apps/node_exporter/node_exporter [Install] WantedBy=multi-user.target EOF #啟動node_exporter systemctl daemon-reload systemctl restart node-exporter systemctl enable node-exporter #驗證node-exporter root@ceph-node01:/apps# ps -ef|grep node_exporter root 24798 1 0 23:40 ? 00:00:00 /apps/node_exporter/node_exporter root@ceph-node01:/apps# ss -antlp|grep node_exporter LISTEN 0 20480 *:9100 *:* users:(("node_exporter",pid=24798,fd=3))
驗證ceph-node01節點的node_exporter 數據:172.168.32.107:9100
在ceph-deploy
root@ceph-deploy:/apps/prometheus# pwd /apps/prometheus root@ceph-deploy:/apps/prometheus# cat prometheus.yml # my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090'] - job_name: 'ceph-node-data' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. #添加監控需要的ceph-node節點 static_configs: - targets: ['172.168.32.107:9100',172.168.32.108:9100',172.168.32.109:9100','172.168.32.110:9100'] #重啟promtheus root@ceph-deploy:/apps/prometheus# systemctl restart prometheus.service
4)通過prometheus 監控ceph 服務
Ceph manager 內部的模塊中包含了prometheus 的監控模塊,並監聽在每個manager 節點的9283 端口,該端口用於將采集到的信息通過http 接口向prometheus 提供數據。
https://docs.ceph.com/en/mimic/mgr/prometheus/?highlight=prometheus
#在ceph-deploy啟用prometheus 監控模塊,在mgr上會開啟9283端口 root@ceph-deploy:/apps/prometheus# ceph mgr module enable prometheus #在ceph-mgr01上查看9283端口 root@ceph-mgr01:~# ss -antlp|grep 9283 LISTEN 0 5 172.168.32.102:9283 0.0.0.0:* users:(("ceph-mgr",pid=748,fd=35))
172.168.32.102:9283
6)配置prometheus 采集數據
root@ceph-deploy:/apps/prometheus# pwd /apps/prometheus root@ceph-deploy:/apps/prometheus# cat prometheus.yml # my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090'] #配置需要監控的ceph-node節點 - job_name: 'ceph-node-data' static_configs: - targets: ['172.168.32.107:9100',172.168.32.108:9100',172.168.32.109:9100','172.168.32.110:9100'] #配置數據采集節點ceph-mgr01 - job_name: 'ceph-cluster-data' static_configs: - targets: ['172.168.32.102:9283'] #重啟prometheus root@ceph-deploy:/apps/prometheus# systemctl restart prometheus.service
3、通過grafana來顯示監控數據
通過granfana 顯示對ceph 的集群監控數據及node 數據
1)在ceph-deploy上安裝grafana
#下載grafana_7.3.6_amd64.deb wget https://mirrors.tuna.tsinghua.edu.cn/grafana/debian/pool/main/g/grafana/grafana_7.3.6_amd64.deb #安裝grafana dpkg -i grafana_7.3.6_amd64.deb #啟動grafana systemctl start grafana-server systemctl enable grafana-server #驗證grafana root@ceph-deploy:/tmp# ss -antlp|grep grafana LISTEN 0 20480 *:3000 *:* users:(("grafana-server",pid=17039,fd=8))
172.168.32.101:3000
默認密碼:admin/admin.第一次登錄會修改密碼(123456)
3)配置數據源
4)導入模板
https://grafana.com/grafana/dashboards/5336 #ceph OSD
其他模板一樣導入 https://grafana.com/grafana/dashboards/5342 #ceph pools https://grafana.com/grafana/dashboards/7056 #ceph cluser https://grafana.com/grafana/dashboards/2842