ceph機房斷電之后重啟遇到問題“Transaction order is cyclic. See system logs for details.”


機房停電,ceph啟動出現問題:

[root@node1 my-cluster]# systemctl restart ceph.target
Failed to stop ceph.target: Transaction order is cyclic. See system logs for details.
See system logs and 'systemctl status ceph.target' for details

怎么解決呢?不知道,最后一頓搗鼓,他自己好了。但是並不知道他為什么好了。也什么都沒干。

搗鼓的步驟如下:
查看/var/log/ceph/ceph.log說是osd超時,看一下日志報的osd連接的端口對方不存在。

[root@node1 my-cluster]# systemctl restart ceph-osd@0
[root@node1 my-cluster]# systemctl restart ceph-mon@node1

結果都報同一個錯誤。
是不是重啟間隔太短,導致出問題?改下service文件

vim /etc/systemd/system/ceph-mon.target.wants/ceph-mon\@node1.service

把StartLimitInterval改成1min。
其他幾個模塊類似。
重新試,結果還是報“Transaction order is cyclic”
那就要排查問題了:

tail -f /var/log/message
systemctl restart ceph-osd@0

結果message沒報錯。
再次嘗試。

[root@node1 my-cluster]# systemctl restart ceph.target
Failed to stop ceph.target: Transaction order is cyclic. See system logs for details.
See system logs and 'systemctl status ceph.target' for details
[root@node1 my-cluster]# journalctl |tail
5月 18 19:32:01 node1 CROND[20494]: (root) CMD (. /root/.bashrc;. ~/.bash_profile;. /etc/profile;/usr/bin/python /usr/local/yfs/yfsagent.py >/dev/null 2>&1 &)
5月 18 19:32:02 node1 polkitd[1120]: Registered Authentication Agent for unix-process:20594:832875 (system bus name :1.947 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale zh_CN.UTF-8)
5月 18 19:32:02 node1 systemd[1]: Found ordering cycle on ceph.target/restart
5月 18 19:32:02 node1 systemd[1]: Found dependency on ceph-osd.target/restart
5月 18 19:32:02 node1 polkitd[1120]: Unregistered Authentication Agent for unix-process:20594:832875 (system bus name :1.947, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale zh_CN.UTF-8) (disconnected from bus)
5月 18 19:32:02 node1 systemd[1]: Found dependency on ceph-osd@0.service/restart
5月 18 19:32:02 node1 systemd[1]: Found dependency on ceph-mon.target/restart
5月 18 19:32:02 node1 systemd[1]: Found dependency on ceph.target/restart
5月 18 19:32:02 node1 systemd[1]: Unable to break cycle
5月 18 19:32:02 node1 systemd[1]: Requested transaction contains an unfixable cyclic ordering dependency: Transaction order is cyclic. See system logs for details.

發現啟動的順序中先啟動的是osd,那就

[root@node1 my-cluster]# systemctl restart ceph-osd@0.service

發現命令不報錯了。

總之是個詭異問題。
建議下次碰類似問題建議調試時用如下方式:
看日志:

journalctl -xe
tail -f /var/log/message
tail -f /var/log/ceph/ceph.log 

關於此問題的其他文檔:(與我遇到的情況並不相同)
https://tracker.ceph.com/issues/14839
https://github.com/ceph/ceph/pull/15835
https://github.com/ceph/ceph/pull/15051
https://tracker.ceph.com/issues/19910
https://tracker.ceph.com/issues/21035
https://tracker.ceph.com/issues/21477


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM