Docker原理之Namespace

本文轉載自查看原文 2020-04-09 15:26 674 Docker

Linux Namespace

　　namespace是linux內核用來隔離內核資源的方案。是對全局系統資源的一種封裝隔離，使得處於不同 namespace 的進程擁有獨立的全局系統資源，改變一個 namespace 中的系統資源只會影響當前 namespace 里的進程，對其他 namespace 中的進程沒有影響。

隔離資源

名稱	宏定義	隔離的資源
IPC	CLONE_NEWIPC	System V IPC(信號量、消息隊列、共享內存) 和POSIX MESSAGE QUEUES
Network	CLONE_NEWNET	Network devices、stacks、ports（網絡設備、網絡棧、端口等）
Mount	CLONE_NEWNS	Mount points（文件系統掛載點）
PID	CLONE_NEWPID	Process IDs（進程編號）
User	CLONE_NEWUSER	User and Groups IDs（用戶和用戶組）
UTS	CLONE_NEWUTS	Hostname and NIS domain name（主機名與NIS域名）
Cgroup	CLONE_NEWCGROUP	Cgroup root directory（cgroup的根目錄）

表現形式

查看進程ID的namespace

# 查看進程18863的namespace
ll /proc/18863/ns

　　　可以看到，namespace 是鏈接文件，格式為[隔離類型：唯一標識]，唯一標識可看成namespace的ID，同一個ID下的進程共享該namespace的全局資源。

函數

clone()：Clone()函數是在libc庫中定義的一個封裝函數，它負責建立新輕量級進程的堆棧並且調用對編程者隱藏了clone系統條用。實現clone()系統調用的sys_clone()服務例程並沒有fn和arg參數。封裝函數把fn指針存放在子進程堆棧的每個位置處，該位置就是該封裝函數本身返回地址存放的位置。Arg指針正好存放在子進程堆棧中的fn的下面。當封裝函數結束時，CPU從堆棧中取出返回地址，然后執行fn(arg)函數。
setns(): 通過 setns() 函數可以將當前進程加入到已有的 namespace 中。
unshare()：通過 unshare 函數可以在原進程上進行 namespace 隔離。

容器里的進程看到的文件系統

　　啟用隔離函數CLONE_NEWNS, 進入容器看到就是容器自己的文件系統？

#define _GNU_SOURCE
#include <sys/mount.h> 
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <unistd.h>
#define STACK_SIZE (1024 * 1024)
static char container_stack[STACK_SIZE];
char* const container_args[] = {
  "/bin/bash",
  NULL
};

int container_main(void* arg)
{  
  printf("進入容器里面!\n");

  execv(container_args[0], container_args);
  printf("錯誤\n");
return 1;
}

int main()
{
  printf("宿主機啟動一個容器!\n");
int container_pid = clone(container_main, container_stack+STACK_SIZE, CLONE_NEWNS | SIGCHLD , NULL);
  waitpid(container_pid, NULL, 0);
  printf("容器停止!\n");
return 0;
}

　　編譯並執行：

# 編譯
gcc -o  ns ns.c -D_GNU_SOURCE  -lpthread
# 執行
./ns

　　結果：

　　執行ns，顯示我們進入到了一個容器中。假象我們容器是掛載到/tmp目錄下，查看/tmp，其實這里仍然是宿主機的文件。緊接着，把容器以 tmpfs（內存盤）格式，重新掛載了 /tmp 目錄。

#define SOURCE
#include <sys/mount.h> 
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <unistd.h>
#define STACK_SIZE (1024 * 1024)
static char container_stack[STACK_SIZE];
char* const container_args[] = {
  "/bin/bash",
  NULL
};

int container_main(void* arg)
{  
  printf("進入容器里面!\n");
 
  mount("none", "/tmp", "tmpfs", 0, "");

  execv(container_args[0], container_args);

  printf("錯誤\n");
  return 1;
}

int main()
{
  printf("宿主機啟動一個容器!\n");
  int container_pid = clone(container_main, container_stack+STACK_SIZE, CLONE_NEWNS | SIGCHLD , NULL);
  waitpid(container_pid, NULL, 0);
  printf("容器停止!\n");
  return 0;
}

　　　在容器啟動前加 mount("none", "/tmp", "tmpfs", 0, "");

　　　再編譯執行ns，掛載后，就看不見任何宿主機的文件了。這就是在當前容器下的單獨的文件系統了。

參考

　　https://time.geekbang.org/column/article/17921

　　本文直接拷貝了部分代碼，如有侵權，請告知刪除。

猜您在找 docker 原理之 mount namespace（下）容器底層原理之namespace和cgroups docker 深入理解之namespace linux的namespace、docker網絡模式如何進入指定docker容器的namespace？ Docker實踐(4)—network namespace與veth pair 理解Docker（3）：Docker 使用 Linux namespace 隔離容器的運行環境 Docker Overlay 工作原理 Docker核心原理---Cgroup Docker Overlay 工作原理