Ceph Health_err osd_full等問題的處理

本文轉載自查看原文 2021-07-11 22:39 259 Ceph

客戶端無法再寫入，卡在某處

檢查結果：

ceph health detail

ceph df

ceph osd df

ceph osd dump | grep full_ratio

網絡的解決方法：

1. 設置 osd 禁止讀寫

ceph osd pause

2. 通知 mon 和 osd 修改 full 閾值

ceph tell mon.* injectargs "--mon-osd-full-ratio 0.96"

ceph tell osd.* injectargs "--mon-osd-full-ratio 0.96"

3. 通知 pg 修改 full 閾值

ceph pg set_full_ratio 0.96 (Luminous版本之前)

ceph osd set-full-ratio 0.96 (Luminous版本)

4. 解除 osd 禁止讀寫

ceph osd unpause

5. 刪除相關數據

最好是 nova 或者 glance 刪除

也可以在 ceph 層面刪除

6. 配置還原

ceph tell mon.* injectargs "--mon-osd-full-ratio 0.95"

ceph tell osd.* injectargs "--mon-osd-full-ratio 0.95"

ceph pg set_full_ratio 0.95 (Luminous版本之前)

ceph osd set-full-ratio 0.95 (Luminous版本)

按以上方法，在ceph version 15.2.13 octopus 環境下測試報錯

最終在官網找到了解決方法：

https://docs.ceph.com/en/latest/rados/operations/health-checks/#pool-near-full

OSD_FULL

One or more OSDs has exceeded the full threshold and is preventing the cluster from servicing writes.

Utilization by pool can be checked with:

ceph df

The currently defined full ratio can be seen with:

ceph osd dump | grep full_ratio

A short-term workaround to restore write availability is to raise the full threshold by a small amount:

ceph osd set-full-ratio <ratio>

New storage should be added to the cluster by deploying more OSDs or existing data should be deleted in order to free up space.

OSD_BACKFILLFULL

One or more OSDs has exceeded the backfillfull threshold, which will prevent data from being allowed to rebalance to this device. This is an early warning that rebalancing may not be able to complete and that the cluster is approaching full.

OSD_NEARFULL

One or more OSDs has exceeded the nearfull threshold. This is an early warning that the cluster is approaching full.

OSDMAP_FLAGS

One or more cluster flags of interest has been set. These flags include:

full - the cluster is flagged as full and cannot serve writes
pauserd, pausewr - paused reads or writes
noup - OSDs are not allowed to start
nodown - OSD failure reports are being ignored, such that the monitors will not mark OSDs down
noin - OSDs that were previously marked out will not be marked back in when they start
noout - down OSDs will not automatically be marked out after the configured interval
nobackfill, norecover, norebalance - recovery or data rebalancing is suspended
noscrub, nodeep_scrub - scrubbing is disabled
notieragent - cache tiering activity is suspended
With the exception of full, these flags can be set or cleared with:
ceph osd set <flag>
ceph osd unset <flag>

POOL_FULL
One or more pools has reached its quota and is no longer allowing writes.
Pool quotas and utilization can be seen with:
ceph df detail
You can either raise the pool quota with:
ceph osd pool set-quota <poolname> max_objects <num-objects>
ceph osd pool set-quota <poolname> max_bytes <num-bytes>
or delete some existing data to reduce utilization.

設置 osd 禁止讀寫
ceph osd pause

設置集群標記，避免恢復過程中其他任務引發其他問題
ceph osd set noout
ceph osd set noscrub
ceph osd set nodeep-scrub

ceph osd set-full-ratio 0.96 （不能調太高，要不再次到閾值了就沒得再調整了）
ceph osd set-backfillfull-ratio 0.92
ceph osd set-nearfull-ratio 0.9

ceph osd dump | grep full_ratio

調整后，ceph顯示Health OK
趁ceph臨時可操作osd，趕緊整理刪除沒用的image數據或增加新的磁盤同步降低平均值。
cephadm shell -- ceph orch daemon add osd ceph-mon1:/dev/sdd
cephadm shell -- ceph orch daemon add osd ceph-mon2:/dev/sdd

添加兩磁盤后，使用率自動均衡降低了。

正常后恢復為初始值
ceph osd set-full-ratio 0.95
ceph osd set-backfillfull-ratio 0.90
ceph osd set-nearfull-ratio 0.85

最后解除OSD的禁止讀寫和群集標記
ceph osd unpause
ceph osd unset noout
ceph osd unset noscrub
ceph osd unset nodeep-scrub

如果要刪除OSD，需確保ceph是健康狀態才操作。

Health狀態下無警告。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 問題處理--ceph集群告警 too many PGs per OSD Ceph OSD故障處理問題處理--ceph集群osd數據盤故障，進行數據盤更換操作 ceph集群提示health HEALTH_WARN 021 Ceph關於too few PGs per OSD的問題 ceph刪除osd pool 【osd】ceph讀寫流程 ceph添加/刪除OSD ceph osd 知識刪除Ceph OSD節點