docker啟動失敗問題


內核3.10,systemctl start docker 被阻塞,沒有返回,查看狀態為啟動中。

某兄弟機器安裝docker之后,發現systemctl start docker的時候阻塞,由於排查走了一些彎路,記錄如下:

level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
level=info msg="libcontainerd: new containerd process, pid: 46803"
level=warning msg="Docker could not enable SELinux on the host system"
level=info msg="Graph migration to content-addressability took 0.00 seconds"
level=info msg="Loading containers: start."
level=warning msg="Running modprobe nf_nat failed with message: ``, error: exec: \"modprobe\": executable file not found in $PATH"
level=warning msg="Running modprobe xt_conntrack failed with message: ``, error: exec: \"modprobe\": executable file not found in $PATH"
level=info msg="Firewalld running: false"
Error starting daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain: iptables failed: iptables --wait -t nat -N DOCKER: iptables v
Perhaps iptables or your kernel needs to be upgraded.
(exit status 3)
 docker.service: main process exited, code=exited, status=1/FAILURE
 Failed to start Docker Application Container Engine.

 

根據錯誤記錄,確定是創建iptable的鏈路規則失敗,然后查看iptables --list,會報 獲取鎖失敗,

[root@custom-16-126 ~]# iptables --list
Another app is currently holding the xtables lock. Perhaps you want to use the -w option

這種情況一般是前面拿鎖寫規則的iptables進程沒有返回,ps -ef 查看對應的進程,發現如下:

[root@custom-16-126 ~]# ps -ef |grep -i iptables
root 14967 14926 0 20:05 ? 00:00:00 /usr/sbin/iptables --wait -t nat -D PREROUTING -m addrtype --dst-type LOCAL -j DOCKER

iptables進程確實沒有返回,

查看對應的堆棧和內核代碼,確定nat模塊需要依賴對應的conntrack模塊:

int nf_nat_l3proto_register(const struct nf_nat_l3proto *l3proto)
{
    int err;



    err = nf_ct_l3proto_try_module_get(l3proto->l3proto);

然后看對應為什么沒有加載nf_conntrack-2,發現該環境上的nf_conntrack-2 被backlist了。

另外注意到一個很有趣的問題,在打點測試的時候,發現如下代碼:

nf_ct_l3proto_try_module_get(unsigned short l3proto)
{
    int ret;
    struct nf_conntrack_l3proto *p;

retry:  p = nf_ct_l3proto_find_get(l3proto);
    if (p == &nf_conntrack_l3proto_generic) {
        ret = request_module("nf_conntrack-%d", l3proto);
        if (!ret)
            goto retry;

        return -EPROTOTYPE;
    }

    return 0;
}
這里retry應該是有問題的,如果request的nf_conntrack模塊被backlist,則會出現一直不退出的情況,而這個流程中會不停提交work_struct到workqueue中,大量的無效work被執行。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM