ceph舊版客戶端掛載新版ceph報錯


問題描述

描述

當我們用低版本的rbd或cephfs客戶端mount高版本的ceph服務端的時候會報錯1000000000000、200000000000000或400000000000000

痛點:客戶端是直接集成在Linux內核里的更新頻率顯然跟不上服務端社區的更新頻率。
不更新ceph的服務端版本,一些功能和BUG又沒辦法解決。

報錯日志

sudo mount -t ceph 10.10.1.11:/
>> > /mnt/mycephfs -o name=admin,secretfile=/etc/ceph/admin.key;
>> > sudo tail /var/log/messages
>> > Fri May  6 22:31:14 MSK 2016
>> > mount error 5 = Input/output error
>> > May  6 22:31:24 ceph-admin kernel: libceph: mon0 10.10.1.11:6789
>> > feature set mismatch, my 103b84a842aca < server's 40103b84a842aca,
>> > missing 400000000000000 May  6 22:31:24 ceph-admin kernel: libceph:
>> > mon0 10.10.1.11:6789 missing required protocol features May  6
>> > 22:31:34 ceph-admin kernel: libceph: mon0 10.10.1.11:6789 feature set
>> > mismatch, my 103b84a842aca < server's 40103b84a842aca, missing
>> > 400000000000000 May  6 22:31:34 ceph-admin kernel: libceph: mon0
>> > 10.10.1.11:6789 missing required protocol features May  6 22:31:44
>> > ceph-admin kernel: libceph: mon0 10.10.1.11:6789 feature set mismatch,
>> > my 103b84a842aca < server's 40103b84a842aca, missing 400000000000000
>> > May  6 22:31:44 ceph-admin kernel: libceph: mon0 10.10.1.11:6789
>> > missing required protocol features May  6 22:31:54 ceph-admin kernel:
>> > libceph: mon0 10.10.1.11:6789 feature set mismatch, my 103b84a842aca <
>> > server's 40103b84a842aca, missing 400000000000000 May  6 22:31:54
>> > ceph-admin kernel: libceph: mon0 10.10.1.11:6789 missing required
>> > protocol features May  6 22:32:04 ceph-admin kernel: libceph: mon0
>> > 10.10.1.11:6789 feature set mismatch, my 103b84a842aca < server's
>> > 40103b84a842aca, missing 400000000000000 May  6 22:32:04 ceph-admin
>> > kernel: libceph: mon0 10.10.1.11:6789 missing required protocol
>> > features
>> >
>> > As I guessed I need to switch off the "require_feature_tunables5" to
>> > remove the error messages.
>> >
>> > Can somebody tell me how to do that ?
>> >
>> > Many thanks in advance.

特性和內核對應表

客戶端與服務端能力之間的匹配關系

CEPH_FEATURE Table and Kernel Version
You can find the feature missing in that table :

For exemple, missing 2040000 means that CEPH_FEATURE_CRUSH_TUNABLES (40000) and CEPH_FEATURE_CRUSH_TUNABLES2 (2000000) is missing on kernel client.

‘R’:required, ’S’:support, ‘-X-’ feature is new since this version
Feature BIT OCT 3.8 3.9 3.10 3.14 3.15 3.18 4.1 4.5 4.6
CEPH_FEATURE_NOSRCADDR 1 2 R R R R R R R R R
CEPH_FEATURE_SUBSCRIBE2 4 10 -R-
CEPH_FEATURE_RECONNECT_SEQ 6 40 -R- R R R R R R
CEPH_FEATURE_PGID64 9 200 R R R R R R R R
CEPH_FEATURE_PGPOOL3 11 800 R R R R R R R R
CEPH_FEATURE_OSDENC 13 2000 R R R R R R R R
CEPH_FEATURE_CRUSH_TUNABLES 18 40000 S S S S S S S S S
CEPH_FEATURE_MSG_AUTH 23 800000 -S- S S S
CEPH_FEATURE_CRUSH_TUNABLES2 25 2000000 S S S S S S S S
CEPH_FEATURE_REPLY_CREATE_INODE 27 8000000 S S S S S S S S
CEPH_FEATURE_OSDHASHPSPOOL 30 40000000 S S S S S S S S
CEPH_FEATURE_OSD_CACHEPOOL 35 800000000 -S- S S S S S
CEPH_FEATURE_CRUSH_V2 36 1000000000 -S- S S S S S
CEPH_FEATURE_EXPORT_PEER 37 2000000000 -S- S S S S S
CEPH_FEATURE_OSD_ERASURE_CODES*** 38 4000000000
CEPH_FEATURE_OSDMAP_ENC 39 8000000000 -S- S S S S
CEPH_FEATURE_CRUSH_TUNABLES3 41 20000000000 -S- S S S S
CEPH_FEATURE_OSD_PRIMARY_AFFINITY 41* 20000000000 -S- S S S S
CEPH_FEATURE_CRUSH_V4 **** 48 1000000000000 -S- S S
CEPH_FEATURE_CRUSH_TUNABLES5 58 200000000000000 -S- S
CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING 58* 400000000000000 -S- S

解決辦法

描述

最簡單的辦法就是升級客戶端版本,但顯然遇到這個問題的人都是升級不了客戶端版本的人。
反過來,那只能通過降低服務端的能力來解決這個問題了。
以ceph-nautilus 14.2.9為例
展示一下tunables

$ ceph osd crush show-tunables
{
    "choose_local_tries": 0,
    "choose_local_fallback_tries": 0,
    "choose_total_tries": 50,
    "chooseleaf_descend_once": 1,
    "chooseleaf_vary_r": 1,
    "chooseleaf_stable": 0,
    "straw_calc_version": 1,
    "allowed_bucket_algs": 54,
    "profile": "hammer",
    "optimal_tunables": 0,
    "legacy_tunables": 0,
    "minimum_required_version": "jewel",
    "require_feature_tunables": 1,
    "require_feature_tunables2": 1,
    "has_v2_rules": 0,
    "require_feature_tunables3": 0,
    "has_v3_rules": 0,
    "has_v4_buckets": 1,
    "require_feature_tunables5": 1,
    "has_v5_rules": 0
}

自己的客戶端版本內核是3.10;因此錯誤1000000000000、和400000000000000都會報。
最終關掉require_feature_tunables5、has_v4_buckets兩項能力才完成了掛載。

關掉require_feature_tunables5

查看可調的參數

$ ceph osd crush tunables --help
.....
osd crush tunables legacy|argonaut|bobtail|firefly|hammer|jewel|optimal|default

設置到firefly

ceph osd crush tunables firefly
ceph osd crush reweight-all

關掉has_v4_buckets

我們發現就算把所有的選項都嘗試一遍has_v4_buckets依然都是1
最終有網友發現,把crush里的straw2都改成straw就可以了。

# 獲取crushmap
$ sudo ceph osd getcrushmap -o crushmap.txt
# 反編譯crushmap
$ crushtool -d crushmap.txt -o crushmap-decompile
# 改之前記得備份
$ cp crushmap-decompile bakcrushmap
# 修改把所有的straw2都改成straw
$ sed -i "s/straw2/straw/" crushmap-decompile
# 編譯crushmap
$ crushtool -c crushmap-decompile -o crushmap-compiled
# 設置crushmap
[root@node1 ~]# sudo ceph osd setcrushmap -i crushmap-compiled

參考網址

http://cephnotes.ksperis.com/blog/2014/01/21/feature-set-mismatch-error-on-ceph-kernel-client/
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-May/009634.html
https://ceph.io/planet/ceph-的crush算法-straw/
https://blog.csdn.net/tiankai517/article/details/50221931?locationNum=3&fps=1


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM