機房停電,ceph啟動出現問題:
[root@node1 my-cluster]# systemctl restart ceph.target
Failed to stop ceph.target: Transaction order is cyclic. See system logs for details.
See system logs and 'systemctl status ceph.target' for details
怎么解決呢?不知道,最后一頓搗鼓,他自己好了。但是並不知道他為什么好了。也什么都沒干。
搗鼓的步驟如下:
查看/var/log/ceph/ceph.log說是osd超時,看一下日志報的osd連接的端口對方不存在。
[root@node1 my-cluster]# systemctl restart ceph-osd@0
[root@node1 my-cluster]# systemctl restart ceph-mon@node1
結果都報同一個錯誤。
是不是重啟間隔太短,導致出問題?改下service文件
vim /etc/systemd/system/ceph-mon.target.wants/ceph-mon\@node1.service
把StartLimitInterval改成1min。
其他幾個模塊類似。
重新試,結果還是報“Transaction order is cyclic”
那就要排查問題了:
tail -f /var/log/message
systemctl restart ceph-osd@0
結果message沒報錯。
再次嘗試。
[root@node1 my-cluster]# systemctl restart ceph.target
Failed to stop ceph.target: Transaction order is cyclic. See system logs for details.
See system logs and 'systemctl status ceph.target' for details
[root@node1 my-cluster]# journalctl |tail
5月 18 19:32:01 node1 CROND[20494]: (root) CMD (. /root/.bashrc;. ~/.bash_profile;. /etc/profile;/usr/bin/python /usr/local/yfs/yfsagent.py >/dev/null 2>&1 &)
5月 18 19:32:02 node1 polkitd[1120]: Registered Authentication Agent for unix-process:20594:832875 (system bus name :1.947 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale zh_CN.UTF-8)
5月 18 19:32:02 node1 systemd[1]: Found ordering cycle on ceph.target/restart
5月 18 19:32:02 node1 systemd[1]: Found dependency on ceph-osd.target/restart
5月 18 19:32:02 node1 polkitd[1120]: Unregistered Authentication Agent for unix-process:20594:832875 (system bus name :1.947, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale zh_CN.UTF-8) (disconnected from bus)
5月 18 19:32:02 node1 systemd[1]: Found dependency on ceph-osd@0.service/restart
5月 18 19:32:02 node1 systemd[1]: Found dependency on ceph-mon.target/restart
5月 18 19:32:02 node1 systemd[1]: Found dependency on ceph.target/restart
5月 18 19:32:02 node1 systemd[1]: Unable to break cycle
5月 18 19:32:02 node1 systemd[1]: Requested transaction contains an unfixable cyclic ordering dependency: Transaction order is cyclic. See system logs for details.
發現啟動的順序中先啟動的是osd,那就
[root@node1 my-cluster]# systemctl restart ceph-osd@0.service
發現命令不報錯了。
總之是個詭異問題。
建議下次碰類似問題建議調試時用如下方式:
看日志:
journalctl -xe
tail -f /var/log/message
tail -f /var/log/ceph/ceph.log
關於此問題的其他文檔:(與我遇到的情況並不相同)
https://tracker.ceph.com/issues/14839
https://github.com/ceph/ceph/pull/15835
https://github.com/ceph/ceph/pull/15051
https://tracker.ceph.com/issues/19910
https://tracker.ceph.com/issues/21035
https://tracker.ceph.com/issues/21477
