參考:
2. Linux下2號進程的kthreadd--Linux進程的管理與調度(七)
本文中代碼內核版本:3.2.0
kthreadd:這種內核線程只有一個,它的作用是管理調度其它的內核線程。這個線程不能關閉。它在內核初始化的時候被創建,會循環運行一個叫做kthreadd的函數,該函數的作用是運行kthread_create_list全局鏈表中維護的kthread。其他任務或代碼想創建內核線程時需要調用kthread_create(或kthread_create_on_node)創建一個kthread,該kthread會被加入到kthread_create_list鏈表中,同時kthread_create會weak up kthreadd_task(即kthreadd)(增鏈表)。kthreadd再執行kthread時會調用老的接口——kernel_thread運行一個名叫“kthread”的內核線程去運行創建的kthread,被執行過的kthread會從kthread_create_list鏈表中刪除(減鏈表),並且kthreadd會不斷調用scheduler 讓出CPU。kthreadd創建的kthread執行完后,會調到kthread_create()執行,之后再執行最初原任務或代碼。
創建
在linux啟動的C階段start_kernel()的最后,rest_init()會開啟兩個進程:kernel_init,kthreadd,之后主線程變成idle線程,init/main.c。
linux下的3個特殊的進程:idle進程(PID=0),init進程(PID=1)和kthreadd(PID=2)。
* idle進程由系統自動創建, 運行在內核態 PID=0
idle進程其pid=0,其前身是系統創建的第一個進程,也是唯一一個沒有通過fork或者kernel_thread產生的進程。完成加載系統后,演變為進程調度、交換。
* init進程由idle通過kernel_thread創建,在內核空間完成初始化后, 加載init程序, 並最終用戶空間運行 PID=1 PPID=0
由0進程創建,完成系統的初始化. 是系統中所有其它用戶進程的祖先進程 。
Linux中的所有進程都是有init進程創建並運行的。首先Linux內核啟動,然后在用戶空間中啟動init進程,再啟動其他系統進程。在系統啟動完成完成后,init將變為守護進程監視系統其他進程。
* kthreadd進程由idle通過kernel_thread創建,並始終運行在內核空間, 負責所有內核線程的調度和管理 PID=2 PPID=0
它的任務就是管理和調度其他內核線程kernel_thread, 會循環執行一個kthreadd的函數,該函數的作用就是運行kthread_create_list全局鏈表中維護的kthread, 當我們調用kthread_create創建的內核線程會被加入到此鏈表中,因此所有的內核線程都是直接或者間接的以kthreadd為父進程。所有的內核線程的PPID都是2。
注:所有的內核線程在大部分時間里都處於阻塞狀態(TASK_INTERRUPTIBLE)只有在系統滿足進程需要的某種資源的情況下才會運行。
/*
* We need to finalize in a non-__init function, or else race conditions
* between the root thread and the init thread may cause start_kernel to
* be reaped by free_initmem before the root thread has proceeded to
* cpu_idle.
*
* gcc-3.4 accidentally inlines this function, so use noinline.
*/
static __initdata DECLARE_COMPLETION(kthreadd_done); static noinline void __init_refok rest_init(void) { int pid; rcu_scheduler_starting(); /* * We need to spawn init first so that it obtains pid 1, however * the init task will end up wanting to create kthreads, which, if * we schedule it before we create kthreadd, will OOPS. */ kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND); numa_default_policy(); pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES); rcu_read_lock(); kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns); rcu_read_unlock(); complete(&kthreadd_done); /* * The boot idle thread must execute schedule() * at least once to get things moving: */ init_idle_bootup_task(current); preempt_enable_no_resched(); schedule(); /* Call into cpu_idle with preempt disabled */ preempt_disable(); cpu_idle(); }
kthreadd任務
函數體定義在kernel/kthread.c中。
static DEFINE_SPINLOCK(kthread_create_lock);
static LIST_HEAD(kthread_create_list);
struct task_struct *kthreadd_task;
struct kthread_create_info { /* Information passed to kthread() from kthreadd. */ int (*threadfn)(void *data); void *data; int node; /* Result passed back to kthread_create() from kthreadd. */ struct task_struct *result; struct completion done; struct list_head list; }; struct kthread { int should_stop; void *data; struct completion exited; };
int kthreadd(void *unused) { struct task_struct *tsk = current; /* Setup a clean context for our children to inherit. */ set_task_comm(tsk, "kthreadd"); ignore_signals(tsk); set_cpus_allowed_ptr(tsk, cpu_all_mask); set_mems_allowed(node_states[N_HIGH_MEMORY]); current->flags |= PF_NOFREEZE | PF_FREEZER_NOSIG; for (;;) { set_current_state(TASK_INTERRUPTIBLE); if (list_empty(&kthread_create_list)) schedule(); __set_current_state(TASK_RUNNING); spin_lock(&kthread_create_lock); while (!list_empty(&kthread_create_list)) { struct kthread_create_info *create; create = list_entry(kthread_create_list.next, struct kthread_create_info, list); list_del_init(&create->list); spin_unlock(&kthread_create_lock); create_kthread(create); spin_lock(&kthread_create_lock); } spin_unlock(&kthread_create_lock); } return 0; }
static void create_kthread(struct kthread_create_info *create) { int pid; #ifdef CONFIG_NUMA current->pref_node_fork = create->node; #endif /* We want our own signal handler (we take no signals by default). */ pid = kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | SIGCHLD); if (pid < 0) { create->result = ERR_PTR(pid); complete(&create->done); } }
kthread任務
static int kthread(void *_create) { /* Copy data: it's on kthread's stack */ struct kthread_create_info *create = _create; int (*threadfn)(void *data) = create->threadfn; void *data = create->data; struct kthread self; int ret; self.should_stop = 0; self.data = data; init_completion(&self.exited); current->vfork_done = &self.exited; /* OK, tell user we're spawned, wait for stop or wakeup */ __set_current_state(TASK_UNINTERRUPTIBLE); create->result = current; complete(&create->done); schedule(); ret = -EINTR; if (!self.should_stop) ret = threadfn(data); /* we can't just return, we must preserve "self" on stack */ do_exit(ret); }
/** * kthread_create_on_node - create a kthread. * @threadfn: the function to run until signal_pending(current). * @data: data ptr for @threadfn. * @node: memory node number. * @namefmt: printf-style name for the thread. * * Description: This helper function creates and names a kernel * thread. The thread will be stopped: use wake_up_process() to start * it. See also kthread_run(). * * If thread is going to be bound on a particular cpu, give its node * in @node, to get NUMA affinity for kthread stack, or else give -1. * When woken, the thread will run @threadfn() with @data as its * argument. @threadfn() can either call do_exit() directly if it is a * standalone thread for which no one will call kthread_stop(), or * return when 'kthread_should_stop()' is true (which means * kthread_stop() has been called). The return value should be zero * or a negative error number; it will be passed to kthread_stop(). * * Returns a task_struct or ERR_PTR(-ENOMEM). */ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data), void *data, int node, const char namefmt[], ...) { struct kthread_create_info create; create.threadfn = threadfn; create.data = data; create.node = node; init_completion(&create.done); spin_lock(&kthread_create_lock); list_add_tail(&create.list, &kthread_create_list); spin_unlock(&kthread_create_lock); wake_up_process(kthreadd_task); wait_for_completion(&create.done); if (!IS_ERR(create.result)) { static const struct sched_param param = { .sched_priority = 0 }; va_list args; va_start(args, namefmt); vsnprintf(create.result->comm, sizeof(create.result->comm), namefmt, args); va_end(args); /* * root may have changed our (kthreadd's) priority or CPU mask. * The kernel thread should not inherit these properties. */ sched_setscheduler_nocheck(create.result, SCHED_NORMAL, ¶m); set_cpus_allowed_ptr(create.result, cpu_all_mask); } return create.result; } EXPORT_SYMBOL(kthread_create_on_node);
kernel/kthread.c的頭文件include/linux/kthread.h定義kthread_create():
#define kthread_create(threadfn, data, namefmt, arg...) \
kthread_create_on_node(threadfn, data, -1, namefmt, ##arg)