深入理解系統調用


作業要求:

  • 找一個系統調用,系統調用號為學號最后2位相同的系統調用
  • 通過匯編指令觸發該系統調用
  • 通過gdb跟蹤該系統調用的內核處理過程
  • 重點閱讀分析系統調用入口的保存現場、恢復現場和系統調用返回,以及重點關注系統調用過程中內核堆棧狀態的變化

一、選擇系統調用

本人學號尾數為31,但是查找syscall_32.tbl表后發現31號系統調用為stty,進一步搜素在系統調用描述文件里面找到此系統調用和32號gtty都為sys_ni_syscall,進一步查資料發現上述兩個系統調用已經被淘汰,所以它所對應的服務例程就要被指定為sys_ni_syscall
系統調用表
stty被指定為sys_ni_syscall

知識拓展:
即使31號和32號系統調用已經被淘汰了,但是我們並不能將它們的位置分配給其他的系統調用,因為一些老的代碼可能還會使用到它們。否則,如果某個用戶應用試圖調用這些已經被淘汰的系統調用,所得到的結果,比如打開了一個文件,就會與預期完全不同,這將令人感到非常奇怪。其實,sys_ni_syscall中的"ni"即表示"not implemented(沒有實現)


下面轉而分析31號上面的系統調用,即30號utime

# The format is:
# <number> <abi> <name> <entry point> <compat entry point>
30	i386	utime			sys_utime32			__ia32_sys_utime32

utime作用為修改文件的訪問時間和修改時間。其對應的32位entry pointsys_utime32,搜索sys_utime32utimes.c文件中找到了其實現,它是通過調用do_utimes來實現的。do_utimes的代碼實現如下:

/*
 * do_utimes - change times on filename or file descriptor
 * @dfd: open file descriptor, -1 or AT_FDCWD
 * @filename: path name or NULL
 * @times: new times or NULL
 * @flags: zero or more flags (only AT_SYMLINK_NOFOLLOW for the moment)
 *
 * If filename is NULL and dfd refers to an open file, then operate on
 * the file.  Otherwise look up filename, possibly using dfd as a
 * starting point.
 *
 * If times==NULL, set access and modification to current time,
 * must be owner or have write permission.
 * Else, update from *times, must be owner or super user.
 */
long do_utimes(int dfd, const char __user *filename, struct timespec64 *times,
	       int flags)
{
	int error = -EINVAL;

	if (times && (!nsec_valid(times[0].tv_nsec) ||
		      !nsec_valid(times[1].tv_nsec))) {
		goto out;
	}

	if (flags & ~AT_SYMLINK_NOFOLLOW)
		goto out;

	if (filename == NULL && dfd != AT_FDCWD) {
		struct fd f;

		if (flags & AT_SYMLINK_NOFOLLOW)
			goto out;

		f = fdget(dfd);
		error = -EBADF;
		if (!f.file)
			goto out;

		error = utimes_common(&f.file->f_path, times);
		fdput(f);
	} else {
		struct path path;
		int lookup_flags = 0;

		if (!(flags & AT_SYMLINK_NOFOLLOW))
			lookup_flags |= LOOKUP_FOLLOW;
retry:
		error = user_path_at(dfd, filename, lookup_flags, &path);
		if (error)
			goto out;

		error = utimes_common(&path, times);
		path_put(&path);
		if (retry_estale(error, lookup_flags)) {
			lookup_flags |= LOOKUP_REVAL;
			goto retry;
		}
	}

out:
	return error;
}

二、 觸發系統調用(直接觸發+匯編觸發)

使用下面的代碼直接觸發utime系統調用:

#include <sys/stat.h>
#include <utime.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
    char *pathname;
    struct stat sb;
    struct utimbuf utb;

    if (argc != 2 || strcmp(argv[1], "--help") == 0){
        printf("%s file\n", argv[0]);
        return 1;
    }

    pathname = argv[1];

    //獲取當前文件時間
    if (stat(pathname, &sb) == -1)
        return 1;

    //把最近修改時間改成訪問時間
    utb.actime = sb.st_atime;
    utb.modtime = sb.st_atime;        /* Make modify time same as access time */
    // 調用utime
    if (utime(pathname, &utb) == -1)  /* Update file times */
        return 1;

    return 0;
}

程序執行效果

對上述的程序進行修改,使用匯編來調用utime,其實就是使用匯編指令傳遞utime的參數,並使用系統調用通過軟中斷0x80陷入內核,跳轉到系統調用處理程序system_call(sys_utime32)函數,並執行相應的服務例程,但由於是代表用戶進程,所以這個執行過程並不屬於中斷上下文,而是處於進程上下文:

#include <sys/stat.h>
#include <utime.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[])
{
    char *pathname;
    struct stat sb;
    struct utimbuf utb;

    if (argc != 2 || strcmp(argv[1], "--help") == 0){
        printf("%s file\n", argv[0]);
        return 1;
    }

    pathname = argv[1];

    //獲取當前文件時間
    if (stat(pathname, &sb) == -1)
        return 1;

    //把最近修改時間改成訪問時間
    utb.actime = sb.st_atime;
    utb.modtime = sb.st_atime;        /* Make modify time same as access time */
    int flag;

    asm volatile(
    "movl %1, %%ebx\n\t"  // 將pathname放入ebx
    "movl %2, %%ecx\n\t"  // 將utimbuf 的引用放入ecx
    "movl $30, %%eax\n\t" //通過EAX寄存器返回系統調用值
    "int $0x80\n\t"       // 通過軟中斷0x80陷入內核
    "movl %%eax, %0\n\t"  // 將輸出通過eax賦值給flag
    :"=m"(flag)
    :"b"(pathname),"c"(&utb)
    );

    if (flag == -1)  /* Update file times */
        return 1;

    return 0;
}

使用匯編修改最近改動時間

三、 通過gdb跟蹤該系統調用的內核處理過程

3.1、 gdb環境配置

首先執行qemu-system-x86_64 -kernel ../arch/x86/boot/bzImage -initrd rootfs.cpio.gz啟動qemu(注意路徑),然后把本地使用匯編觸發utime系統調用的編譯過可執行程序copy到rootfs/home/目錄下,然后再在rootfs/home/目錄下建一個b.test文件。然后使用以下命令重新打包根文件系統鏡像(rootfs下執行),再重啟qemu。

find . -print0 | cpio --null -ov --format=newc | gzip -9 > ../rootfs.cpio.gz
// 重新運行qemu
qemu-system-x86_64 -kernel ../arch/x86/boot/bzImage -initrd rootfs.cpio.gz

啟動qemu,注意與自己機器上的路徑

copy assembly

重啟后的qemu

關掉qemu,在終端使用qemu-system-x86_64 -kernel ./arch/x86/boot/bzImage -initrd ./busybox-1.31.1/rootfs.cpio.gz -S -s -nographic -append "console=ttyS0"以shell的形式運行qemu進行調試(退出使用killall qemu-system-x86_64)。再新開一個終端,執行以下命令加載vmlinux和連接gdb server,然后嘗試着在start_kernel處打斷點,可以看到qemu執行到Booting the kernel會停下來:

gdb
file vmlinux
target remote:1234
b start_kernel
c
....

可能出現的錯誤及解決方法:

  • ERROR:執行file vmlinux可能會報一下錯誤:
  • 解決方法:
vi ~/.gdbinit
================添加以下內容==============
add-auto-load-safe-path /home/dfx/linux-5.4.34/scripts/gdb/vmlinux-gdb.py
set auto-load safe-path /
python sys.path.append("/home/dfx/linux-5.4.34/scripts/gdb/vmlinux-gdb.py")
3.2、系統調用分析

使用gcc a.c -static -m32a.c編譯成32位的可執行文件,然后再使用 objdump -S a.out > a32.s反匯編查看utime的調用過程。

mian函數調用utime
utime call 0x80ea9f0
可以看到utime並沒有使用syscall,而是調用0x80ea9f0,使用gdb 運行x 0x80ea9f0查看該地址的值如下
查看0x80ea9f0地址值

無奈,只好轉而分析一下64位的utime,使用上述方法重新得到64的反匯編代碼如下(部分):

000000000043f250 <utime>:
  43f250:       b8 84 00 00 00          mov    $0x84,%eax
  43f255:       0f 05                   syscall
  43f257:       48 3d 01 f0 ff ff       cmp    $0xfffffffffffff001,%rax
  43f25d:       0f 83 4d 52 00 00       jae    4444b0 <__syscall_error>
  43f263:       c3                      retq
  43f264:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  43f26b:       00 00 00 
  43f26e:       66 90                   xchg   %ax,%ax

從上面的代碼可以看到,utime的系統調用號為0x84(132),查看系統調用表可以發現對應的系統調用函數為__x64_sys_utime
64位系統調用表

3.3、使用gdb調試跟蹤

__x64_sys_utime打斷點,然后在qemu運行64位的程序(注意要重新打包rootfs),可以看到成功跟蹤到了utime.c文件的相關代碼


可以看到調用的是do_futimesat在utime.c中可以發現下面這段注釋:

futimesat()、utimes()和utime()是utimensat()的舊版本為與傳統C庫兼容而提供的。
在現代體系中,我們總是使用libc包裝器utimensat ()

即utime是為了對c語言庫進行兼容,現在使用utimensat,其為第320號系統調用,並且不管是utime還是utimensat,都是調用的do_utimes()函數。
utimensat在系統調用表中的信息
utimensat函數體

=======================do_utimes描述==========================

/*
 * do_utimes - change times on filename or file descriptor
 * @dfd: open file descriptor, -1 or AT_FDCWD
 * @filename: path name or NULL
 * @times: new times or NULL
 * @flags: zero or more flags (only AT_SYMLINK_NOFOLLOW for the moment)
 *
 * If filename is NULL and dfd refers to an open file, then operate on
 * the file.  Otherwise look up filename, possibly using dfd as a
 * starting point.
 *
 * If times==NULL, set access and modification to current time,
 * must be owner or have write permission.
 * Else, update from *times, must be owner or super user.
 */

具體的跟蹤過程如下兩段代碼所示(第一段先整體查看調用流程,並監視堆棧的變化,第二段進入部分函數內部,查看細節):

(gdb) b __x64_sys_utime
Note: breakpoints 1, 2, 3, 4, 5 and 6 also set at pc 0xffffffff81206f07.
Breakpoint 8 at 0xffffffff81206f07: file fs/utimes.c, line 204.
(gdb) c
Continuing.

(gdb) bt
#0  __x64_sys_utime (regs=0xffffc900001b7f58) at fs/utimes.c:204
#1  0xffffffff81002603 in do_syscall_64 (nr=<optimized out>, regs=0xffffc900001b7f58) at arch/x86/entry/common.c:290
#2  0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
#3  0x0000000000000000 in ?? ()
(gdb) n

Breakpoint 7, do_utimes (dfd=-100, filename=0x4a1024 "./b.test", times=0xffffc900001b7ee0, flags=0) at fs/utimes.c:90
90	{
(gdb) n
93		if (times && (!nsec_valid(times[0].tv_nsec) ||
(gdb) n
94			      !nsec_valid(times[1].tv_nsec))) {
(gdb) n
93		if (times && (!nsec_valid(times[0].tv_nsec) ||
(gdb) n
98		if (flags & ~AT_SYMLINK_NOFOLLOW)
(gdb) n
101		if (filename == NULL && dfd != AT_FDCWD) {
(gdb) n
119				lookup_flags |= LOOKUP_FOLLOW;
(gdb) n
121			error = user_path_at(dfd, filename, lookup_flags, &path);
(gdb) n
122			if (error)
(gdb) n
125			error = utimes_common(&path, times);
(gdb) n
126			path_put(&path);
(gdb) n
127			if (retry_estale(error, lookup_flags)) {
(gdb) n
135	}
(gdb) bt
#0  do_utimes (dfd=-100, filename=0x4a1024 "./b.test", times=0xffffc900001b7ee0, flags=0) at fs/utimes.c:119
#1  0xffffffff81206f64 in __do_sys_utime (times=<optimized out>, filename=<optimized out>) at fs/utimes.c:215
#2  __se_sys_utime (times=<optimized out>, filename=<optimized out>) at fs/utimes.c:204
#3  __x64_sys_utime (regs=<optimized out>) at fs/utimes.c:204
#4  0xffffffff81002603 in do_syscall_64 (nr=<optimized out>, regs=0x4a1024) at arch/x86/entry/common.c:290
#5  0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
#6  0x0000000000000000 in ?? ()

(gdb) n
__x64_sys_utime (regs=0xffff8880070a4140) at fs/utimes.c:204
204	SYSCALL_DEFINE2(utime, char __user *, filename, struct utimbuf __user *, times)
(gdb) bt
#0  __x64_sys_utime (regs=0xffff8880070a4140) at fs/utimes.c:204
#1  0xffffffff81002603 in do_syscall_64 (nr=<optimized out>, regs=0x0 <fixed_percpu_data>) at arch/x86/entry/common.c:290
#2  0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
#3  0x0000000000000000 in ?? ()
(gdb) n
do_syscall_64 (nr=18446612682188144960, regs=0xffffc900001b7f58) at arch/x86/entry/common.c:300
300		syscall_return_slowpath(regs);
(gdb) n
301	}
(gdb) bt
#0  do_syscall_64 (nr=<optimized out>, regs=<optimized out>) at arch/x86/entry/common.c:301
#1  0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
#2  0x0000000000000000 in ?? ()
(gdb) n
entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:184
184		movq	RCX(%rsp), %rcx
(gdb) bt
#0  entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:184
#1  0x0000000000000000 in ?? ()
(gdb) n
185		movq	RIP(%rsp), %r11
(gdb) n
187		cmpq	%rcx, %r11	/* SYSRET requires RCX == RIP */
(gdb) n
188		jne	swapgs_restore_regs_and_return_to_usermode
(gdb) n
205		shl	$(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
(gdb) n
206		sar	$(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
(gdb) n
210		cmpq	%rcx, %r11
(gdb) n
211		jne	swapgs_restore_regs_and_return_to_usermode
(gdb) n
213		cmpq	$__USER_CS, CS(%rsp)		/* CS must match SYSRET */
(gdb) n
214		jne	swapgs_restore_regs_and_return_to_usermode
(gdb) n
216		movq	R11(%rsp), %r11
(gdb) n
217		cmpq	%r11, EFLAGS(%rsp)		/* R11 == RFLAGS */
(gdb) n
218		jne	swapgs_restore_regs_and_return_to_usermode
(gdb) n
238		testq	$(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
(gdb) n
239		jnz	swapgs_restore_regs_and_return_to_usermode
(gdb) n
243		cmpq	$__USER_DS, SS(%rsp)		/* SS must match SYSRET */
(gdb) n
244		jne	swapgs_restore_regs_and_return_to_usermode
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:253
253		POP_REGS pop_rdi=0 skip_r11rcx=1
(gdb) bt
#0  syscall_return_via_sysret () at arch/x86/entry/entry_64.S:253
#1  0x0000000000000000 in ?? ()
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:259
259		movq	%rsp, %rdi
(gdb) n
260		movq	PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:262
262		pushq	RSP-RDI(%rdi)	/* RSP */
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:263
263		pushq	(%rdi)		/* RDI */
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:271
271		SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
(gdb) n
273		popq	%rdi
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:274
274		popq	%rsp
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:275
275		USERGS_SYSRET64
(gdb) n
0x000000000043f257 in ?? ()
(gdb) n
Cannot find bounds of current function
(gdb) 


Breakpoint 1, __x64_sys_utime (regs=0xffffc900001b7f58) at fs/utimes.c:204
204	SYSCALL_DEFINE2(utime, char __user *, filename, struct utimbuf __user *, times)
(gdb) n

Breakpoint 7, do_utimes (dfd=-100, filename=0x4a1024 "./b.test", times=0xffffc900001b7ee0, flags=0) at fs/utimes.c:90
90	{
(gdb) n
93		if (times && (!nsec_valid(times[0].tv_nsec) ||
(gdb) n
94			      !nsec_valid(times[1].tv_nsec))) {
(gdb) n
93		if (times && (!nsec_valid(times[0].tv_nsec) ||
(gdb) n
98		if (flags & ~AT_SYMLINK_NOFOLLOW)
(gdb) n
101		if (filename == NULL && dfd != AT_FDCWD) {
(gdb) n
119				lookup_flags |= LOOKUP_FOLLOW;
(gdb) n
121			error = user_path_at(dfd, filename, lookup_flags, &path);
(gdb) n
122			if (error)
(gdb) n
125			error = utimes_common(&path, times);
(gdb) n
126			path_put(&path);
(gdb) n
127			if (retry_estale(error, lookup_flags)) {
(gdb) s
retry_estale (flags=<optimized out>, error=<optimized out>) at ./include/linux/namei.h:91
91		return error == -ESTALE && !(flags & LOOKUP_REVAL);
(gdb) n
do_utimes (dfd=118112576, filename=0x64 <error: Cannot access memory at address 0x64>, times=0xffffc900001b7ee0, flags=0) at fs/utimes.c:135
135	}
(gdb) n
__x64_sys_utime (regs=0xffff8880070a4140) at fs/utimes.c:204
204	SYSCALL_DEFINE2(utime, char __user *, filename, struct utimbuf __user *, times)
(gdb) n
do_syscall_64 (nr=18446612682188144960, regs=0xffffc900001b7f58) at arch/x86/entry/common.c:300
300		syscall_return_slowpath(regs);
(gdb) s
syscall_return_slowpath (regs=<optimized out>) at arch/x86/entry/common.c:300
300		syscall_return_slowpath(regs);
(gdb) s
get_current () at ./arch/x86/include/asm/current.h:15
15		return this_cpu_read_stable(current_task);
(gdb) s
syscall_return_slowpath (regs=<optimized out>) at arch/x86/entry/common.c:256
256		u32 cached_flags = READ_ONCE(ti->flags);
(gdb) n
270		if (unlikely(cached_flags & SYSCALL_EXIT_WORK_FLAGS))
(gdb) n
273		local_irq_disable();
(gdb) s
arch_local_irq_disable () at arch/x86/entry/common.c:273
273		local_irq_disable();
(gdb) s
native_irq_disable () at ./arch/x86/include/asm/irqflags.h:49
49		asm volatile("cli": : :"memory");
(gdb) s
syscall_return_slowpath (regs=<optimized out>) at arch/x86/entry/common.c:274
274		prepare_exit_to_usermode(regs);
(gdb) n
do_syscall_64 (nr=<optimized out>, regs=<optimized out>) at arch/x86/entry/common.c:300
300		syscall_return_slowpath(regs);
(gdb) n
301	}
(gdb) n
entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:184
184		movq	RCX(%rsp), %rcx
(gdb) n
185		movq	RIP(%rsp), %r11
(gdb) n
187		cmpq	%rcx, %r11	/* SYSRET requires RCX == RIP */
(gdb) n
188		jne	swapgs_restore_regs_and_return_to_usermode
(gdb) n
205		shl	$(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
(gdb) n
206		sar	$(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
(gdb) n
210		cmpq	%rcx, %r11
(gdb) n
211		jne	swapgs_restore_regs_and_return_to_usermode
(gdb) n
213		cmpq	$__USER_CS, CS(%rsp)		/* CS must match SYSRET */
(gdb) n
214		jne	swapgs_restore_regs_and_return_to_usermode
(gdb) n
216		movq	R11(%rsp), %r11
(gdb) n
217		cmpq	%r11, EFLAGS(%rsp)		/* R11 == RFLAGS */
(gdb) n
218		jne	swapgs_restore_regs_and_return_to_usermode
(gdb) n
238		testq	$(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
(gdb) n
239		jnz	swapgs_restore_regs_and_return_to_usermode
(gdb) n
243		cmpq	$__USER_DS, SS(%rsp)		/* SS must match SYSRET */
(gdb) n
244		jne	swapgs_restore_regs_and_return_to_usermode
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:253
253		POP_REGS pop_rdi=0 skip_r11rcx=1
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:259
259		movq	%rsp, %rdi
(gdb) n
260		movq	PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:262
262		pushq	RSP-RDI(%rdi)	/* RSP */
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:263
263		pushq	(%rdi)		/* RDI */
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:271
271		SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
(gdb) n
273		popq	%rdi
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:274
274		popq	%rsp
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:275
275		USERGS_SYSRET64
(gdb) n
0x000000000043f257 in ?? ()
(gdb) n
Cannot find bounds of current function
(gdb) 

四、 分析總結

utime的系統調用觸發大致過程如下(錯誤之處望指正):

  1. utime函數觸發系統調用__x64_sys_utime,其主要通過調用do_utimes來完成相應的功能。
  2. do_utimes通過文件描述符引用一個打開的文件,然后操作文件。If times==NULL,就將訪問和修改設置為當前時間。然后調用do_syscall_64從寄存器%rax里面取出系統調用號,然后根據系統調用號,在系統調用表sys_call_table中找到相應的函數進行調用並將寄存器中保存的參數取出來,作為函數參數,然后陷入內核。
  3. 最后系統調用結束前,一般會調用prepare_exit_to_usermode進行准備工作,然后使用jne條件轉移指令等進行一系列的restore,恢復到用戶態。e.g:jne swapgs_restore_regs_and_return_to_usermode

參考文章:


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM