在一個ceph集群中,操作創建一個池后,發現ceph的集群狀態處於warn狀態,信息如下
檢查集群的信息
查看看池
[root@serverc ~]# ceph osd pool ls
images #只有一個池
[root@serverc ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.13129 root default -5 0.04376 host serverc 2 hdd 0.01459 osd.2 up 1.00000 1.00000 #9塊osd狀態up in狀態 3 hdd 0.01459 osd.3 up 1.00000 1.00000 7 hdd 0.01459 osd.7 up 1.00000 1.00000 -3 0.04376 host serverd 0 hdd 0.01459 osd.0 up 1.00000 1.00000 5 hdd 0.01459 osd.5 up 1.00000 1.00000 6 hdd 0.01459 osd.6 up 1.00000 1.00000 -7 0.04376 host servere 1 hdd 0.01459 osd.1 up 1.00000 1.00000 4 hdd 0.01459 osd.4 up 1.00000 1.00000 8 hdd 0.01459 osd.8 up 1.00000 1.00000
重現錯誤
[root@serverc ~]# ceph osd pool create images 64 64
[root@serverc ~]# ceph osd pool application enable images rbd
[root@serverc ~]# ceph -s
cluster:
id: 04b66834-1126-4870-9f32-d9121f1baccd health: HEALTH_WARN too few PGs per OSD (21 < min 30) services: mon: 3 daemons, quorum serverc,serverd,servere mgr: servere(active), standbys: serverd, serverc osd: 9 osds: 9 up, 9 in data: pools: 1 pools, 64 pgs objects: 8 objects, 12418 kB usage: 1005 MB used, 133 GB / 134 GB avail pgs: 64 active+clean
[root@serverc ~]# ceph pg dump
dumped all version 1334 stamp 2019-03-29 22:21:41.795511 last_osdmap_epoch 0 last_pg_scan 0 full_ratio 0 nearfull_ratio 0 PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP 1.3f 0 0 0 0 0 0 0 0 active+clean 2019-03-29 22:17:34.871318 0'0 33:41 [7,1,0] 7 [7,1,0] 7 0'0 2019-03-29 21:55:07.534833 0'0 2019-03-29 21:55:07.534833 1.3e 0 0 0 0 0 0 0 0 active+clean 2019-03-29 22:17:34.867341 0'0 33:41 [4,5,7] 4 [4,5,7] 4 0'0 2019-03-29 21:55:07.534833 0'0 2019-03-29 21:55:07.534833 1.3d 0 0 0 0 0 0 0 0 active+clean 2019-03-29 22:17:34.871213 0'0 33:41 [0,3,1] 0 [0,3,1] 0 0'0 2019-03-29 21:55:07.534833 0'0 2019-03-29 21:55:07.534833 1.3c 0 0 0 0 0 0 0 0 active+clean 2019-03-29 22:17:34.859216 0'0 33:41 [5,7,1] 5 [5,7,1] 5 0'0 2019-03-29 21:55:07.534833 0'0 2019-03-29 21:55:07.534833 1.3b 0 0 0 0 0 0 0 0 active+clean 2019-03-29 22:17:34.870865 0'0 33:41 [0,8,7] 0 [0,8,7] 0 0'0 2019-03-29 21:55:07.534833 0'0 2019-03-29 21:55:07.534833 1.3a 2 0 0 0 0 19 17 17 active+clean 2019-03-29 22:17:34.858977 33'17 33:117 [4,6,7] 4 [4,6,7] 4 0'0 2019-03-29 21:55:07.534833 0'0 2019-03-29 21:55:07.534833 1.39 0 0 0 0 0 0 0 0 active+clean 2019-03-29 22:17:34.871027 0'0 33:41 [0,3,4] 0 [0,3,4] 0 0'0 2019-03-29 21:55:07.534833 0'0 2019-03-29 21:55:07.534833 1.38 1 0 0 0 0 16 1 1 active+clean 2019-03-29 22:17:34.861985 30'1 33:48 [4,2,5] 4 [4,2,5] 4 0'0 2019-03-29 21:55:07.534833 0'0 2019-03-29 21:55:07.534833 1.37 0 0 0 0 0 0 0 0 active+clean 2019-03-29 22:17:34.861667 0'0 33:41 [6,7,1] 6 [6,7,1] 6 0'0 2019-03-29 21:55:07.534833 0'0 2019-03-29 21:55:07.534833 1.36 0 0 0 0 0 0 0 0 active+clean 2019-03-29 22:17:34.860382 0'0 33:41 [6,3,1] 6 [6,3,1] 6 0'0 2019-03-29 21:55:07.534833 0'0 2019-03-29 21:55:07.534833 1.35 0 0 0 0 0 0 0 0 active+clean 2019-03-29 22:17:34.860407 0'0 33:41 [8,6,2] 8 [8,6,2] 8 0'0 2019-03-29 21:55:07.534833 0'0 2019-03-29 21:55:07.534833 1.34 0 0 0 0 0 0 2 2 active+clean 2019-03-29 22:17:34.861874 32'2 33:44 [4,3,0] 4 [4,3,0] 4 0'0 2019-03-29 21:55:07.534833 0'0 2019-03-29 21:55:07.534833 1.33 0 0 0 0 0 0 0 0 active+clean 2019-03-29 22:17:34.860929 0'0 33:41 [4,6,2] 4 [4,6,2] 4 0'0 2019-03-29 21:55:07.534833 0'0 2019-03-29 21:55:07.534833 1.32 0 0 0 0 0 0 0 0 active+clean 2019-03-29 22:17:34.860589 0'0 33:41 [4,2,6] 4 [4,2,6] 4 0'0 2019-03-29 21:55:07.534833 0'0 2019-03-29 21:55:07.534833 ………… 1 8 0 0 0 0 12716137 78 78 sum 8 0 0 0 0 12716137 78 78 OSD_STAT USED AVAIL TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM 8 119M 15229M 15348M [0,1,2,3,4,5,6,7] 22 6 7 119M 15229M 15348M [0,1,2,3,4,5,6,8] 22 9 6 119M 15229M 15348M [0,1,2,3,4,5,7,8] 23 5 5 107M 15241M 15348M [0,1,2,3,4,6,7,8] 18 7 4 107M 15241M 15348M [0,1,2,3,5,6,7,8] 18 9 3 107M 15241M 15348M [0,1,2,4,5,6,7,8] 23 6 2 107M 15241M 15348M [0,1,3,4,5,6,7,8] 19 6 1 107M 15241M 15348M [0,2,3,4,5,6,7,8] 24 8 0 107M 15241M 15348M [1,2,3,4,5,6,7,8] 23 8 sum 1005M 133G 134G
由提示看出,每個osd上的pg數量小於最小的數目30個。是因為在創建池的時候,指定pg和pgs為64,由於是3副本的配置,所以當有9個osd的時候,每個osd上均分了64/9 *3=21個pgs,也就是出現了如上的錯誤 小於最小配置30個。從pg dump看出每塊osd上的PG數,都小於30
集群這種狀態如果進行數據的存儲和操作,會發現集群卡死,無法響應io,同時會導致大面積的osd down。
解決辦法
修改pool的pg數
[root@serverc ~]# ceph osd pool set images pg_num 128
set pool 1 pg_num to 128
[root@serverc ~]# ceph -s
cluster:
id: 04b66834-1126-4870-9f32-d9121f1baccd health: HEALTH_WARN Reduced data availability: 21 pgs peering Degraded data redundancy: 21 pgs unclean 1 pools have pg_num > pgp_num too few PGs per OSD (21 < min 30) services: mon: 3 daemons, quorum serverc,serverd,servere mgr: servere(active), standbys: serverd, serverc osd: 9 osds: 9 up, 9 in data: pools: 1 pools, 128 pgs objects: 8 objects, 12418 kB usage: 1005 MB used, 133 GB / 134 GB avail pgs: 50.000% pgs unknown 16.406% pgs not active 64 unknown 43 active+clean 21 peering
出現 too few PGs per OSD
繼續修改PGS
[root@serverc ~]# ceph osd pool set images pgp_num 128
set pool 1 pgp_num to 128
查看
[root@serverc ~]# ceph -s
cluster:
id: 04b66834-1126-4870-9f32-d9121f1baccd health: HEALTH_WARN Reduced data availability: 7 pgs peering Degraded data redundancy: 24 pgs unclean, 2 pgs degraded services: mon: 3 daemons, quorum serverc,serverd,servere mgr: servere(active), standbys: serverd, serverc osd: 9 osds: 9 up, 9 in data: pools: 1 pools, 128 pgs objects: 8 objects, 12418 kB usage: 1005 MB used, 133 GB / 134 GB avail pgs: 24.219% pgs not active #pg狀態,數據在重平衡(狀態信息代表的意義,請參考https://www.cnblogs.com/zyxnhr/p/10616497.html第三部分內容) 97 active+clean 20 activating 9 peering 2 activating+degraded [root@serverc ~]# ceph -s cluster: id: 04b66834-1126-4870-9f32-d9121f1baccd health: HEALTH_WARN Reduced data availability: 7 pgs peering Degraded data redundancy: 3/24 objects degraded (12.500%), 33 pgs unclean, 4 pgs degraded services: mon: 3 daemons, quorum serverc,serverd,servere mgr: servere(active), standbys: serverd, serverc osd: 9 osds: 9 up, 9 in data: pools: 1 pools, 128 pgs objects: 8 objects, 12418 kB usage: 1005 MB used, 133 GB / 134 GB avail pgs: 35.938% pgs not active 3/24 objects degraded (12.500%) 79 active+clean 34 activating 9 peering 3 activating+degraded 2 active+clean+snaptrim 1 active+recovery_wait+degraded io: recovery: 1 B/s, 0 objects/s [root@serverc ~]# ceph -s cluster: id: 04b66834-1126-4870-9f32-d9121f1baccd health: HEALTH_OK services: mon: 3 daemons, quorum serverc,serverd,servere mgr: servere(active), standbys: serverd, serverc osd: 9 osds: 9 up, 9 in data: pools: 1 pools, 128 pgs objects: 8 objects, 12418 kB usage: 1050 MB used, 133 GB / 134 GB avail pgs: 128 active+clean io: recovery: 1023 kB/s, 0 keys/s, 0 objects/s [root@serverc ~]# ceph -s cluster: id: 04b66834-1126-4870-9f32-d9121f1baccd health: HEALTH_OK #數據平衡完畢,集群狀態恢復正常 services: mon: 3 daemons, quorum serverc,serverd,servere mgr: servere(active), standbys: serverd, serverc osd: 9 osds: 9 up, 9 in data: pools: 1 pools, 128 pgs objects: 8 objects, 12418 kB usage: 1016 MB used, 133 GB / 134 GB avail pgs: 128 active+clean io: recovery: 778 kB/s, 0 keys/s, 0 objects/s
注:這里是實驗環境,pool上也沒有數據,所以修改pg影響並不大,但是如果是生產環境,這時候再重新修改pg數,會對生產環境產生較大影響。因為pg數變了,就會導致整個集群的數據重新均衡和遷移,數據越大響應io的時間會越長。具體請參考https://www.cnblogs.com/zyxnhr/p/10543814.html,對PG的狀態參數有詳細的解釋,同時,在生產環境,修改PG,如果不影響業務,要考慮到各個方面,比如在什么時候恢復,什么時間修改pgs,請參考
參考資料: