ceph集群的pg 不一致報錯處理


pg 不一致報錯處理

1 scrub errors; Possible data damage: 1 pg inconsistent

 HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
    pg 1.7fff is active+clean+scrubbing+deep+inconsistent+repair, acting [184,229]

報錯信息整理

  • 問題GP: 1.7fff
  • osd編號: 184 229

修復動作

  1. 執行常規修復

    ceph pg repair 1.7fff

  2. 查看修復結果

    ceph health detail

    HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
    OSD_SCRUB_ERRORS 1 scrub errors
    PG_DAMAGED Possible data damage: 1 pg inconsistent
    pg 1.7fff is active+clean+scrubbing+deep+inconsistent+repair, acting [184,229]

    報錯依然

  3. 觀察集群動作

    ceph -w

    2020-09-05 09:13:25.818257 osd.184 [ERR] 1.7fff repair : stat mismatch, got 9855/9856 objects, 0/0 clones, 9855/9856 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 41285080957/41289275261 bytes, 0/0 hit_set_archive bytes.
    2020-09-05 09:13:25.818757 osd.184 [ERR] 1.7fff repair 1 errors, 1 fixed
    2020-09-05 09:13:31.318617 mon.cb-mon-38 [INF] Health check cleared: OSD_SCRUB_ERRORS (was: 1 scrub errors)
    2020-09-05 09:13:31.321338 mon.cb-mon-38 [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg inconsistent)
    2020-09-05 09:13:31.321983 mon.cb-mon-38 [INF] Cluster is now healthy
    2020-09-05 10:00:00.001158 mon.cb-mon-38 [INF] overall HEALTH_OK
    

其它修復方式

1. 洗刷一個pg組,執行命令:

ceph pg scrub 1.7fff  
ceph pg deep-scrub  1.7fff
ceph pg repair 1.7fff

2.修復關聯的osd

ceph osd repair 184
ceph osd repair 229

3.關閉pg所在的主osd

  • 查詢pg所在主osd
root@manager1:~# ceph pg 1.7fff query|grep primary
            "same_primary_since": 1070,
                "num_objects_missing_on_primary": 0,
            "up_primary": 184,
            "acting_primary": 184
                "same_primary_since": 0,
                    "num_objects_missing_on_primary": 0,
                "up_primary": -1,
                "acting_primary": -1
  • 查詢osd所在主機

    root@manager1:~# ceph osd tree|grep -B25 184
    -41        218.29431     host cc-d-19                          19   hdd    9.09560         osd.19       up  1.00000 1.00000 
     39   hdd    9.09560         osd.39       up  1.00000 1.00000 
     52   hdd    9.09560         osd.52       up  1.00000 1.00000 
     70   hdd    9.09560         osd.70       up  1.00000 1.00000 
     87   hdd    9.09560         osd.87       up  1.00000 1.00000 
    106   hdd    9.09560         osd.106      up  1.00000 1.00000 
    130   hdd    9.09560         osd.130      up  1.00000 1.00000 
    151   hdd    9.09560         osd.151      up  1.00000 1.00000 
    164   hdd    9.09560         osd.164      up  1.00000 1.00000 
    184   hdd    9.09560         osd.184      up  1.00000 1.00000 
    
  • 關閉對應的osd服務 [數據恢復會很慢,也會影響集群速度]

  systemctl stop ceph-osd@184
  • 恢復完成后再次修復集群即可.

    ceph pg repair 2.37c


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM