在init進程的main函數中有調用sigchld_handler_init() 設置SIGCHLD signal。這里設置SIGCHLD的處理函數時,在sa_flags中有加SA_NOCLDSTOP flag,這個flag表示只有當子進程終止時父進程才接受這個signal,當子進程在暫停或者繼續運行的狀態時是不會收到這個signal的。
void sigchld_handler_init() { // Create a signalling mechanism for SIGCHLD. int s[2]; if (socketpair(AF_UNIX, SOCK_STREAM | SOCK_NONBLOCK | SOCK_CLOEXEC, 0, s) == -1) { PLOG(FATAL) << "socketpair failed in sigchld_handler_init"; } signal_write_fd = s[0]; signal_read_fd = s[1]; // Write to signal_write_fd if we catch SIGCHLD. struct sigaction act; memset(&act, 0, sizeof(act)); act.sa_handler = SIGCHLD_handler; act.sa_flags = SA_NOCLDSTOP; sigaction(SIGCHLD, &act, 0); ReapAnyOutstandingChildren(); register_epoll_handler(signal_read_fd, handle_signal); }
1. 注冊的SIGCHLD處理函數是SIGCHLD_handler。
2. 調用register_epoll_handler()
init.cpp
void register_epoll_handler(int fd, void (*fn)()) { epoll_event ev; ev.events = EPOLLIN; ev.data.ptr = reinterpret_cast<void*>(fn); if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, fd, &ev) == -1) { PLOG(ERROR) << "epoll_ctl failed"; } }
1. 在子進程退出時,init進程會收到SIGCHLD signal,然后調用SIGCHLD_handler()進行處理。這個函數只是簡單地調用write()
2. 由於上面有進行write,所以在init的main函數中的epoll_wait()會返回,然后調用通過register_epoll_handler()注冊的函數,即hand_signal()
epoll_event ev; int nr = TEMP_FAILURE_RETRY(epoll_wait(epoll_fd, &ev, 1, epoll_timeout_ms)); if (nr == -1) { PLOG(ERROR) << "epoll_wait failed"; } else if (nr == 1) { ((void (*)()) ev.data.ptr)(); }
在hand_signal()中,首先調用read()將數據讀出,然后會調用到ReapOneProcess()。在這個函數中,調用waitpid()等待子進程退出,waitpid()會拿到一個siginfo,這個siginfo中有退出的子進程的pid。
然后根據這個pid從ServiceList中找到這個退出的子進程的Service對象;
然后調用這個Service對象的Reap()。
在Reap函數中,首先調用KillProcessGroup(SIGKILL)。這個函數會調到system/core/libprocessgroup/processgroup.cpp中的killProcessGroup()。這個調用的目的是kill掉退出的子進程的所有子進程。
void Service::KillProcessGroup(int signal) { // If we've already seen a successful result from killProcessGroup*(), then we have removed // the cgroup already and calling these functions a second time will simply result in an error. // This is true regardless of which signal was sent. // These functions handle their own logging, so no additional logging is needed. if (!process_cgroup_empty_) { LOG(INFO) << "Sending signal " << signal << " to service '" << name_ << "' (pid " << pid_ << ") process group..."; int r; if (signal == SIGTERM) { r = killProcessGroupOnce(uid_, pid_, signal); } else { r = killProcessGroup(uid_, pid_, signal); } if (r == 0) process_cgroup_empty_ = true; } }
然后調用UnsetExec()將is_exec_service_running_設置為false並將SVC_EXEC flag清掉;
然后會將flags_中的SVC_RESTARTING flag設置上;
然后調用onrestart_.ExecuteAllCommands()
void Service::Reap(const siginfo_t& siginfo) { if (!(flags_ & SVC_ONESHOT) || (flags_ & SVC_RESTART)) { KillProcessGroup(SIGKILL); } // Remove any descriptor resources we may have created. std::for_each(descriptors_.begin(), descriptors_.end(), std::bind(&DescriptorInfo::Clean, std::placeholders::_1)); for (const auto& f : reap_callbacks_) { f(siginfo); } if (flags_ & SVC_EXEC) UnSetExec(); if (flags_ & SVC_TEMPORARY) return; pid_ = 0; flags_ &= (~SVC_RUNNING); start_order_ = 0; // Oneshot processes go into the disabled state on exit, // except when manually restarted. if ((flags_ & SVC_ONESHOT) && !(flags_ & SVC_RESTART)) { flags_ |= SVC_DISABLED; } // Disabled and reset processes do not get restarted automatically. if (flags_ & (SVC_DISABLED | SVC_RESET)) { NotifyStateChange("stopped"); return; } // If we crash > 4 times in 4 minutes, reboot into recovery. boot_clock::time_point now = boot_clock::now(); if ((flags_ & SVC_CRITICAL) && !(flags_ & SVC_RESTART)) { if (now < time_crashed_ + 4min) { if (++crash_count_ > 4) { LOG(FATAL) << "critical process '" << name_ << "' exited 4 times in 4 minutes"; } } else { time_crashed_ = now; crash_count_ = 1; } } flags_ &= (~SVC_RESTART); flags_ |= SVC_RESTARTING; // Execute all onrestart commands for this service. onrestart_.ExecuteAllCommands(); NotifyStateChange("restarting"); return; }
然后在init的main函數的死循環中,會調用Service::is_exec_service_running()函數,這個函數是直接return Service class的is_exec_service_running_ static bool變量(注意這個變量是static變量)。
因為上面有將is_exec_service_running_變量設置為false,所以這個if條件成立,所以會調用RestartProcesses()。在這個函數中,會從ServiceList中取出每一個Service對象,然后判斷其flags_變量中是否有SVC_RESTARTING flag,如果有,最終會調到這個service的Start()去再次啟動這個service。