五、Ceph的dashboard和監控


Ceph dashboard 是通過一個web 界面,對已經運行的ceph 集群進行狀態查看及功能配置等功能,早期ceph 使用的是第三方的dashboard 組件。

Ceph 的監控可視化界面方案很多----grafana、Kraken。但是從Luminous開始,Ceph 提供了原生的Dashboard功能,通過Dashboard可以獲取Ceph集群的各種基本狀態信息。 mimic版 (nautilus版) dashboard 安裝。如果是 (nautilus版) 需要安裝 ceph-mgr-dashboard 。

Ceph-Dash 是用Python 開發的一個Ceph 的監控面板,用來監控Ceph 的運行狀態。同時提供REST API 來訪問狀態數據。

http://cephdash.crapworks.de/

優點:

易部署
輕量級
靈活(可以自定義開發功能)

缺點:

功能相對簡單

  

1、cephde dashboard

1)啟用dashboard插件

https://docs.ceph.com/en/mimic/mgr/

https://docs.ceph.com/en/latest/mgr/dashboard/

https://packages.debian.org/unstable/ceph-mgr-dashboard #15 版本有依賴需要單獨解決 Ceph mgr 是一個多插件( 模塊化) 的組件, 其組件可以單獨的啟用或關閉。

新版本需要安裝dashboard 安保,而且必須安裝在mgr 節點,否則報錯如下

The following packages have unmet dependencies:
ceph-mgr-dashboard : Depends: ceph-mgr (= 15.2.13-1~bpo10+1) but it is not going to be
installed
E: Unable to correct problems, you have held broken packages.

  在ceph-mgr01上部署dashboard

root@ceph-mgr01:~# apt-cache madison ceph-mgr-dashboard
root@ceph-mgr01:~# apt install ceph-mgr-dashboard

  在ceph集群管理端ceph-deploy上查看ceph的模塊信息

cephadmin@ceph-deploy:~$ ceph mgr module -h #查看幫助
cephadmin@ceph-deploy:~$ ceph mgr module ls #列出所有模塊
{
    "always_on_modules": [
        "balancer",
        "crash",
        "devicehealth",
        "orchestrator",
        "pg_autoscaler",
        "progress",
        "rbd_support",
        "status",
        "telemetry",
        "volumes"
    ],
    "enabled_modules": [
        "iostat",
        "nfs",
        "restful"
    ],
    "disabled_modules": [   #沒有啟用的模塊
        {
            "name": "alerts",
            "can_run": true,
            "error_string": "",
         ......
         },
         ......
         {
            "name": "dashboard",  #模塊名稱
            "can_run": true,      #是否可以啟用
            "error_string": ""
         ......
         }
......

  在ceph管理端ceph-deploy上啟用dashboard模塊

cephadmin@ceph-deploy:~$ ceph mgr module enable dashboard
cephadmin@ceph-deploy:~$ ceph mgr module ls | less

{
    "always_on_modules": [
        "balancer",
        "crash",
        "devicehealth",
        "orchestrator",
        "pg_autoscaler",
        "progress",
        "rbd_support",
        "status",
        "telemetry",
        "volumes"
    ],
    "enabled_modules": [
        "dashboard",   #dashboard模塊已經啟用了
        "iostat",
        "nfs",
        "restful"
    ],

  

注:模塊啟用后還不能直接訪問,需要配置關閉SSL 或啟用SSL 及指定監聽地址。

Ceph dashboard 在mgr 節點進行開啟設置,並且可以配置開啟或者關閉SSL,如下:

在集群管理端ceph-deploy上操作

#關閉ssl
cephadmin@ceph-deploy:~$ ceph config set mgr mgr/dashboard/ssl false 
#設置dashboard的監聽地址,這里設置為ceph-mgr01的地址
cephadmin@ceph-deploy:~$ ceph config set mgr mgr/dashboard/ceph-mgr01/server_addr 172.168.32.102
#指定dashboard的監聽端口為9009
cephadmin@ceph-deploy:~$ ceph config set mgr mgr/dashboard/ceph-mgr01/server_port 9009
#驗證集群狀態,第一次啟用dashboard 插件需要等一段時間(幾分鍾),再去被啟用的節點驗證。
cephadmin@ceph-deploy:~/ceph-cluster$ ceph -s
  cluster:
    id:     c31ea2e3-47f7-4247-9d12-c0bf8f1dfbfb
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 (age 19h)
    mgr: ceph-mgr01(active, since 2s), standbys: ceph-mgr02
    mds: 2/2 daemons up, 2 standby
    osd: 16 osds: 16 up (since 32h), 16 in (since 32h)
    rgw: 2 daemons active (2 hosts, 1 zones)
 
  data:
    volumes: 1/1 healthy
    pools:   9 pools, 241 pgs
    objects: 279 objects, 13 KiB
    usage:   664 MiB used, 799 GiB / 800 GiB avail
    pgs:     241 active+clean

#在ceph-mgr01上查看端口與進程
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
ceph-mgr 24275 ceph   32u  IPv4 358492      0t0  TCP ceph-mgr01:9009 (LISTEN)

  ceph集群報錯:

Module 'dashboard' has failed: error('No socket could be created',)
#需要檢查mgr 服務是否正常運行,可以重啟一遍mgr 服務

  dashboard的訪問驗證

2)設置dashboard登錄賬戶和密碼

#創建登錄的密碼文件
cephadmin@ceph-deploy:~/ceph-cluster$ touch dashboard-passwd
#向密碼文件中寫入密碼
cephadmin@ceph-deploy:~/ceph-cluster$ echo 123456 > dashboard-passwd 
#設備ywx用戶並導入密碼密碼
cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard set-login-credentials ywx -i dashboard-passwd 
******************************************************************
***          WARNING: this command is deprecated.              ***
*** Please use the ac-user-* related commands to manage users. ***
******************************************************************
Username and password updated

  

使用賬戶和密碼登錄

 

dashboard的命令格式

cephadmin@ceph-deploy:~/ceph-cluster$ceph dashboard set-login-credentials -h #命令格式
 Monitor commands: 
 =================
dashboard set-login-credentials <username>          Set the login credentials. Password read from -i <file>


#修改ywx的dashboard的密碼為123456789
cephadmin@ceph-deploy:~/ceph-cluster$ echo 123456789 > dashboard-passwd 
cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard set-login-credentials ywx -i dashboard-passwd 
******************************************************************
***          WARNING: this command is deprecated.              ***
*** Please use the ac-user-* related commands to manage users. ***
******************************************************************
Username and password updated
#重新登錄成功

  

3)集群信息在dashboard中的顯示

主機信息

 

mon信息

 

pool信息

ceph rbd信息

cephfs信息

object對象存儲

監控對象存儲需要配置監控用戶,否則dashboard 無法獲取對象存儲狀態

 

4)dashboard的ssl配置

如果要使用SSL 訪問。則需要配置簽名證書。證書可以使用ceph 命令生成,或是opessl 命令生成。

https://docs.ceph.com/en/latest/mgr/dashboard/

配置dashboard的ssl

#ceph 自簽名證書:
cephadmin@ceph-deploy:~/ceph-cluster$ ceph dashboard create-self-signed-cert 
Self-signed certificate created
#啟用ssl
cephadmin@ceph-deploy:~/ceph-cluster$ ceph config set mgr mgr/dashboard/ssl true
#查看當前dashboard的狀態
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr services
{
    "dashboard": "http://172.168.32.102:9009/"
}
#重啟mgr服務
root@ceph-mgr01:~# systemctl restart ceph-mgr@ceph-mgr01
#再次查看dashboard的狀態
cephadmin@ceph-deploy:~/ceph-cluster$ ceph mgr services
{
    "dashboard": "https://172.168.32.102:8443/"
}

  

訪問https://172.168.32.102:8443/

 

2、prometheus監控ceph node節點

https://prometheus.io/

#prometheus服務端
prometheus-2.23.0.linux-amd64.tar.gz
#node_exporter
node_exporter-1.0.1.linu#prometheus服務端

  

1)部署promethus

在ceph-deploy上部署promethus

#從官網下載
prometheus-2.23.0.linux-amd64.tar.gz

#部署promethus
mkdir /apps
cd /apps
tar xvf prometheus-2.23.0.linux-amd64.tar.gz
ln -sv /apps/prometheus-2.23.0.linux-amd64 /apps/prometheus

#編寫prometheus的啟動文件
cat >> /etc/systemd/system/prometheus.service <<EOF
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target
[Service]
Restart=on-failure
WorkingDirectory=/apps/prometheus/
ExecStart=/apps/prometheus/prometheus --config.file=/apps/prometheus/prometheus.yml
[Install]
WantedBy=multi-user.target
EOF

#啟動prometheus
systemctl daemon-reload
systemctl restart prometheus
systemctl enable prometheus

#驗證prometheus
root@ceph-deploy:/apps# ps -ef|grep prometheus
root     16366     1  0 23:31 ?        00:00:00 /apps/prometheus/prometheus --config.file=/apps/prometheus/prometheus.yml

root@ceph-deploy:/apps# ss -antlp|grep prometheus
LISTEN   0         20480                     *:9090                   *:*        users:(("prometheus",pid=16366,fd=11))     

  

訪問prometheus

172.168.32.101:9090

 

 

2)在ceph的所有ceph-node節點上部署node_exporter

#從官網下載
 node_exporter-1.0.1.linux-amd64.tar.gz

#在ceph的所有ceph-node上部署node_exporter
mkdir /apps
cd /apps/
tar xvf node_exporter-1.0.1.linux-amd64.tar.gz
ln -sv /apps/node_exporter-1.0.1.linux-amd64 /apps/node_exporter

#編寫node_exporter的啟動文件
cat >> /etc/systemd/system/node-exporter.service << EOF
[Unit]
Description=Prometheus Node Exporter
After=network.target
[Service]
ExecStart=/apps/node_exporter/node_exporter
[Install]
WantedBy=multi-user.target
EOF

#啟動node_exporter
systemctl daemon-reload
systemctl restart node-exporter
systemctl enable node-exporter

#驗證node-exporter
root@ceph-node01:/apps# ps -ef|grep node_exporter
root       24798       1  0 23:40 ?        00:00:00 /apps/node_exporter/node_exporter

root@ceph-node01:/apps# ss -antlp|grep node_exporter
LISTEN   0         20480                      *:9100                   *:*       users:(("node_exporter",pid=24798,fd=3))

  驗證ceph-node01節點的node_exporter 數據:172.168.32.107:9100

 

3)配置prometheus server需要監控的ceph node節點

在ceph-deploy

root@ceph-deploy:/apps/prometheus# pwd
/apps/prometheus
root@ceph-deploy:/apps/prometheus# cat prometheus.yml 
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
   
  - job_name: 'ceph-node-data'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    #添加監控需要的ceph-node節點
    static_configs:
    - targets: ['172.168.32.107:9100',172.168.32.108:9100',172.168.32.109:9100','172.168.32.110:9100']
    
#重啟promtheus
root@ceph-deploy:/apps/prometheus# systemctl restart prometheus.service 
  

  

4)通過prometheus 監控ceph 服務

Ceph manager 內部的模塊中包含了prometheus 的監控模塊,並監聽在每個manager 節點的9283 端口,該端口用於將采集到的信息通過http 接口向prometheus 提供數據。

https://docs.ceph.com/en/mimic/mgr/prometheus/?highlight=prometheus

#在ceph-deploy啟用prometheus 監控模塊,在mgr上會開啟9283端口
root@ceph-deploy:/apps/prometheus# ceph mgr module enable prometheus

#在ceph-mgr01上查看9283端口
root@ceph-mgr01:~# ss -antlp|grep 9283
LISTEN   0         5             172.168.32.102:9283             0.0.0.0:*       users:(("ceph-mgr",pid=748,fd=35)) 

  

5)驗證mgr上數據

172.168.32.102:9283

 

6)配置prometheus 采集數據

在ceph-deploy

root@ceph-deploy:/apps/prometheus# pwd
/apps/prometheus

root@ceph-deploy:/apps/prometheus# cat prometheus.yml 
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
    
  #配置需要監控的ceph-node節點
  - job_name: 'ceph-node-data'
    static_configs:
    - targets: ['172.168.32.107:9100',172.168.32.108:9100',172.168.32.109:9100','172.168.32.110:9100']
  #配置數據采集節點ceph-mgr01
  - job_name: 'ceph-cluster-data'
    static_configs:
    - targets: ['172.168.32.102:9283']


#重啟prometheus
root@ceph-deploy:/apps/prometheus# systemctl restart prometheus.service 

  

7)驗證prometheus對ceph的監控

 

 

 

 

 

 

 

3、通過grafana來顯示監控數據

通過granfana 顯示對ceph 的集群監控數據及node 數據

1)在ceph-deploy上安裝grafana

#下載grafana_7.3.6_amd64.deb
 wget https://mirrors.tuna.tsinghua.edu.cn/grafana/debian/pool/main/g/grafana/grafana_7.3.6_amd64.deb
#安裝grafana
dpkg -i grafana_7.3.6_amd64.deb
#啟動grafana
systemctl start grafana-server
systemctl enable grafana-server
#驗證grafana
root@ceph-deploy:/tmp# ss -antlp|grep grafana
LISTEN   0         20480                     *:3000                   *:*        users:(("grafana-server",pid=17039,fd=8))      

  

2)登錄grafana

172.168.32.101:3000

默認密碼:admin/admin.第一次登錄會修改密碼(123456)

 

3)配置數據源

在grafana添加prometheus

 

4)導入模板

https://grafana.com/grafana/dashboards/5336 #ceph OSD

 

 

其他模板一樣導入

https://grafana.com/grafana/dashboards/5342 #ceph pools

https://grafana.com/grafana/dashboards/7056 #ceph cluser

https://grafana.com/grafana/dashboards/2842

  


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM