作業要求:
- 找一個系統調用,系統調用號為學號最后2位相同的系統調用
- 通過匯編指令觸發該系統調用
- 通過gdb跟蹤該系統調用的內核處理過程
- 重點閱讀分析系統調用入口的保存現場、恢復現場和系統調用返回,以及重點關注系統調用過程中內核堆棧狀態的變化
一、選擇系統調用
本人學號尾數為31,但是查找syscall_32.tbl
表后發現31號系統調用為stty
,進一步搜素在系統調用描述文件里面找到此系統調用和32號gtty
都為sys_ni_syscall
,進一步查資料發現上述兩個系統調用已經被淘汰,所以它所對應的服務例程就要被指定為sys_ni_syscall
。
知識拓展:
即使31號和32號系統調用已經被淘汰了,但是我們並不能將它們的位置分配給其他的系統調用,因為一些老的代碼可能還會使用到它們。否則,如果某個用戶應用試圖調用這些已經被淘汰的系統調用,所得到的結果,比如打開了一個文件,就會與預期完全不同,這將令人感到非常奇怪。其實,sys_ni_syscall中的"ni"即表示"not implemented(沒有實現)
下面轉而分析31號上面的系統調用,即30號utime
。
# The format is:
# <number> <abi> <name> <entry point> <compat entry point>
30 i386 utime sys_utime32 __ia32_sys_utime32
utime
的作用為修改文件的訪問時間和修改時間。其對應的32位entry point
為sys_utime32
,搜索sys_utime32
在utimes.c文件中找到了其實現,它是通過調用do_utimes
來實現的。do_utimes
的代碼實現如下:
/*
* do_utimes - change times on filename or file descriptor
* @dfd: open file descriptor, -1 or AT_FDCWD
* @filename: path name or NULL
* @times: new times or NULL
* @flags: zero or more flags (only AT_SYMLINK_NOFOLLOW for the moment)
*
* If filename is NULL and dfd refers to an open file, then operate on
* the file. Otherwise look up filename, possibly using dfd as a
* starting point.
*
* If times==NULL, set access and modification to current time,
* must be owner or have write permission.
* Else, update from *times, must be owner or super user.
*/
long do_utimes(int dfd, const char __user *filename, struct timespec64 *times,
int flags)
{
int error = -EINVAL;
if (times && (!nsec_valid(times[0].tv_nsec) ||
!nsec_valid(times[1].tv_nsec))) {
goto out;
}
if (flags & ~AT_SYMLINK_NOFOLLOW)
goto out;
if (filename == NULL && dfd != AT_FDCWD) {
struct fd f;
if (flags & AT_SYMLINK_NOFOLLOW)
goto out;
f = fdget(dfd);
error = -EBADF;
if (!f.file)
goto out;
error = utimes_common(&f.file->f_path, times);
fdput(f);
} else {
struct path path;
int lookup_flags = 0;
if (!(flags & AT_SYMLINK_NOFOLLOW))
lookup_flags |= LOOKUP_FOLLOW;
retry:
error = user_path_at(dfd, filename, lookup_flags, &path);
if (error)
goto out;
error = utimes_common(&path, times);
path_put(&path);
if (retry_estale(error, lookup_flags)) {
lookup_flags |= LOOKUP_REVAL;
goto retry;
}
}
out:
return error;
}
二、 觸發系統調用(直接觸發+匯編觸發)
使用下面的代碼直接觸發utime系統調用:
#include <sys/stat.h>
#include <utime.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
char *pathname;
struct stat sb;
struct utimbuf utb;
if (argc != 2 || strcmp(argv[1], "--help") == 0){
printf("%s file\n", argv[0]);
return 1;
}
pathname = argv[1];
//獲取當前文件時間
if (stat(pathname, &sb) == -1)
return 1;
//把最近修改時間改成訪問時間
utb.actime = sb.st_atime;
utb.modtime = sb.st_atime; /* Make modify time same as access time */
// 調用utime
if (utime(pathname, &utb) == -1) /* Update file times */
return 1;
return 0;
}
對上述的程序進行修改,使用匯編來調用utime,其實就是使用匯編指令傳遞utime的參數,並使用系統調用通過軟中斷0x80陷入內核,跳轉到系統調用處理程序system_call(sys_utime32)函數,並執行相應的服務例程,但由於是代表用戶進程,所以這個執行過程並不屬於中斷上下文,而是處於進程上下文:
#include <sys/stat.h>
#include <utime.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[])
{
char *pathname;
struct stat sb;
struct utimbuf utb;
if (argc != 2 || strcmp(argv[1], "--help") == 0){
printf("%s file\n", argv[0]);
return 1;
}
pathname = argv[1];
//獲取當前文件時間
if (stat(pathname, &sb) == -1)
return 1;
//把最近修改時間改成訪問時間
utb.actime = sb.st_atime;
utb.modtime = sb.st_atime; /* Make modify time same as access time */
int flag;
asm volatile(
"movl %1, %%ebx\n\t" // 將pathname放入ebx
"movl %2, %%ecx\n\t" // 將utimbuf 的引用放入ecx
"movl $30, %%eax\n\t" //通過EAX寄存器返回系統調用值
"int $0x80\n\t" // 通過軟中斷0x80陷入內核
"movl %%eax, %0\n\t" // 將輸出通過eax賦值給flag
:"=m"(flag)
:"b"(pathname),"c"(&utb)
);
if (flag == -1) /* Update file times */
return 1;
return 0;
}
三、 通過gdb跟蹤該系統調用的內核處理過程
3.1、 gdb環境配置
首先執行qemu-system-x86_64 -kernel ../arch/x86/boot/bzImage -initrd rootfs.cpio.gz
啟動qemu(注意路徑),然后把本地使用匯編觸發utime系統調用的編譯過可執行程序copy到rootfs/home/目錄下,然后再在rootfs/home/目錄下建一個b.test文件。然后使用以下命令重新打包根文件系統鏡像(rootfs下執行),再重啟qemu。
find . -print0 | cpio --null -ov --format=newc | gzip -9 > ../rootfs.cpio.gz
// 重新運行qemu
qemu-system-x86_64 -kernel ../arch/x86/boot/bzImage -initrd rootfs.cpio.gz
關掉qemu,在終端使用qemu-system-x86_64 -kernel ./arch/x86/boot/bzImage -initrd ./busybox-1.31.1/rootfs.cpio.gz -S -s -nographic -append "console=ttyS0"
以shell的形式運行qemu進行調試(退出使用killall qemu-system-x86_64
)。再新開一個終端,執行以下命令加載vmlinux和連接gdb server,然后嘗試着在start_kernel處打斷點,可以看到qemu執行到Booting the kernel會停下來:
gdb
file vmlinux
target remote:1234
b start_kernel
c
....
可能出現的錯誤及解決方法:
- ERROR:執行
file vmlinux
可能會報一下錯誤:
- 解決方法:
vi ~/.gdbinit ================添加以下內容============== add-auto-load-safe-path /home/dfx/linux-5.4.34/scripts/gdb/vmlinux-gdb.py set auto-load safe-path / python sys.path.append("/home/dfx/linux-5.4.34/scripts/gdb/vmlinux-gdb.py")
- ERROR:Remote 'g' packet reply is too long
- 解決方法:https://stackoverflow.com/questions/8662468/remote-g-packet-reply-is-too-long
3.2、系統調用分析
使用gcc a.c -static -m32
把a.c
編譯成32位的可執行文件,然后再使用 objdump -S a.out > a32.s
反匯編查看utime的調用過程。
可以看到utime並沒有使用syscall,而是調用0x80ea9f0,使用gdb 運行x 0x80ea9f0
查看該地址的值如下
無奈,只好轉而分析一下64位的utime,使用上述方法重新得到64的反匯編代碼如下(部分):
000000000043f250 <utime>:
43f250: b8 84 00 00 00 mov $0x84,%eax
43f255: 0f 05 syscall
43f257: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
43f25d: 0f 83 4d 52 00 00 jae 4444b0 <__syscall_error>
43f263: c3 retq
43f264: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
43f26b: 00 00 00
43f26e: 66 90 xchg %ax,%ax
從上面的代碼可以看到,utime的系統調用號為0x84(132)
,查看系統調用表可以發現對應的系統調用函數為__x64_sys_utime
3.3、使用gdb調試跟蹤
對__x64_sys_utime
打斷點,然后在qemu運行64位的程序(注意要重新打包rootfs),可以看到成功跟蹤到了utime.c文件的相關代碼
可以看到調用的是do_futimesat
在utime.c中可以發現下面這段注釋:
futimesat()、utimes()和utime()是utimensat()的舊版本為與傳統C庫兼容而提供的。
在現代體系中,我們總是使用libc包裝器utimensat ()
即utime是為了對c語言庫進行兼容,現在使用utimensat,其為第320號系統調用,並且不管是utime還是utimensat,都是調用的do_utimes()函數。
=======================do_utimes描述==========================
/*
* do_utimes - change times on filename or file descriptor
* @dfd: open file descriptor, -1 or AT_FDCWD
* @filename: path name or NULL
* @times: new times or NULL
* @flags: zero or more flags (only AT_SYMLINK_NOFOLLOW for the moment)
*
* If filename is NULL and dfd refers to an open file, then operate on
* the file. Otherwise look up filename, possibly using dfd as a
* starting point.
*
* If times==NULL, set access and modification to current time,
* must be owner or have write permission.
* Else, update from *times, must be owner or super user.
*/
具體的跟蹤過程如下兩段代碼所示(第一段先整體查看調用流程,並監視堆棧的變化,第二段進入部分函數內部,查看細節):
(gdb) b __x64_sys_utime
Note: breakpoints 1, 2, 3, 4, 5 and 6 also set at pc 0xffffffff81206f07.
Breakpoint 8 at 0xffffffff81206f07: file fs/utimes.c, line 204.
(gdb) c
Continuing.
(gdb) bt
#0 __x64_sys_utime (regs=0xffffc900001b7f58) at fs/utimes.c:204
#1 0xffffffff81002603 in do_syscall_64 (nr=<optimized out>, regs=0xffffc900001b7f58) at arch/x86/entry/common.c:290
#2 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
#3 0x0000000000000000 in ?? ()
(gdb) n
Breakpoint 7, do_utimes (dfd=-100, filename=0x4a1024 "./b.test", times=0xffffc900001b7ee0, flags=0) at fs/utimes.c:90
90 {
(gdb) n
93 if (times && (!nsec_valid(times[0].tv_nsec) ||
(gdb) n
94 !nsec_valid(times[1].tv_nsec))) {
(gdb) n
93 if (times && (!nsec_valid(times[0].tv_nsec) ||
(gdb) n
98 if (flags & ~AT_SYMLINK_NOFOLLOW)
(gdb) n
101 if (filename == NULL && dfd != AT_FDCWD) {
(gdb) n
119 lookup_flags |= LOOKUP_FOLLOW;
(gdb) n
121 error = user_path_at(dfd, filename, lookup_flags, &path);
(gdb) n
122 if (error)
(gdb) n
125 error = utimes_common(&path, times);
(gdb) n
126 path_put(&path);
(gdb) n
127 if (retry_estale(error, lookup_flags)) {
(gdb) n
135 }
(gdb) bt
#0 do_utimes (dfd=-100, filename=0x4a1024 "./b.test", times=0xffffc900001b7ee0, flags=0) at fs/utimes.c:119
#1 0xffffffff81206f64 in __do_sys_utime (times=<optimized out>, filename=<optimized out>) at fs/utimes.c:215
#2 __se_sys_utime (times=<optimized out>, filename=<optimized out>) at fs/utimes.c:204
#3 __x64_sys_utime (regs=<optimized out>) at fs/utimes.c:204
#4 0xffffffff81002603 in do_syscall_64 (nr=<optimized out>, regs=0x4a1024) at arch/x86/entry/common.c:290
#5 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
#6 0x0000000000000000 in ?? ()
(gdb) n
__x64_sys_utime (regs=0xffff8880070a4140) at fs/utimes.c:204
204 SYSCALL_DEFINE2(utime, char __user *, filename, struct utimbuf __user *, times)
(gdb) bt
#0 __x64_sys_utime (regs=0xffff8880070a4140) at fs/utimes.c:204
#1 0xffffffff81002603 in do_syscall_64 (nr=<optimized out>, regs=0x0 <fixed_percpu_data>) at arch/x86/entry/common.c:290
#2 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
#3 0x0000000000000000 in ?? ()
(gdb) n
do_syscall_64 (nr=18446612682188144960, regs=0xffffc900001b7f58) at arch/x86/entry/common.c:300
300 syscall_return_slowpath(regs);
(gdb) n
301 }
(gdb) bt
#0 do_syscall_64 (nr=<optimized out>, regs=<optimized out>) at arch/x86/entry/common.c:301
#1 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175
#2 0x0000000000000000 in ?? ()
(gdb) n
entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:184
184 movq RCX(%rsp), %rcx
(gdb) bt
#0 entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:184
#1 0x0000000000000000 in ?? ()
(gdb) n
185 movq RIP(%rsp), %r11
(gdb) n
187 cmpq %rcx, %r11 /* SYSRET requires RCX == RIP */
(gdb) n
188 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
205 shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
(gdb) n
206 sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
(gdb) n
210 cmpq %rcx, %r11
(gdb) n
211 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
213 cmpq $__USER_CS, CS(%rsp) /* CS must match SYSRET */
(gdb) n
214 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
216 movq R11(%rsp), %r11
(gdb) n
217 cmpq %r11, EFLAGS(%rsp) /* R11 == RFLAGS */
(gdb) n
218 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
238 testq $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
(gdb) n
239 jnz swapgs_restore_regs_and_return_to_usermode
(gdb) n
243 cmpq $__USER_DS, SS(%rsp) /* SS must match SYSRET */
(gdb) n
244 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:253
253 POP_REGS pop_rdi=0 skip_r11rcx=1
(gdb) bt
#0 syscall_return_via_sysret () at arch/x86/entry/entry_64.S:253
#1 0x0000000000000000 in ?? ()
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:259
259 movq %rsp, %rdi
(gdb) n
260 movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:262
262 pushq RSP-RDI(%rdi) /* RSP */
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:263
263 pushq (%rdi) /* RDI */
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:271
271 SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
(gdb) n
273 popq %rdi
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:274
274 popq %rsp
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:275
275 USERGS_SYSRET64
(gdb) n
0x000000000043f257 in ?? ()
(gdb) n
Cannot find bounds of current function
(gdb)
Breakpoint 1, __x64_sys_utime (regs=0xffffc900001b7f58) at fs/utimes.c:204
204 SYSCALL_DEFINE2(utime, char __user *, filename, struct utimbuf __user *, times)
(gdb) n
Breakpoint 7, do_utimes (dfd=-100, filename=0x4a1024 "./b.test", times=0xffffc900001b7ee0, flags=0) at fs/utimes.c:90
90 {
(gdb) n
93 if (times && (!nsec_valid(times[0].tv_nsec) ||
(gdb) n
94 !nsec_valid(times[1].tv_nsec))) {
(gdb) n
93 if (times && (!nsec_valid(times[0].tv_nsec) ||
(gdb) n
98 if (flags & ~AT_SYMLINK_NOFOLLOW)
(gdb) n
101 if (filename == NULL && dfd != AT_FDCWD) {
(gdb) n
119 lookup_flags |= LOOKUP_FOLLOW;
(gdb) n
121 error = user_path_at(dfd, filename, lookup_flags, &path);
(gdb) n
122 if (error)
(gdb) n
125 error = utimes_common(&path, times);
(gdb) n
126 path_put(&path);
(gdb) n
127 if (retry_estale(error, lookup_flags)) {
(gdb) s
retry_estale (flags=<optimized out>, error=<optimized out>) at ./include/linux/namei.h:91
91 return error == -ESTALE && !(flags & LOOKUP_REVAL);
(gdb) n
do_utimes (dfd=118112576, filename=0x64 <error: Cannot access memory at address 0x64>, times=0xffffc900001b7ee0, flags=0) at fs/utimes.c:135
135 }
(gdb) n
__x64_sys_utime (regs=0xffff8880070a4140) at fs/utimes.c:204
204 SYSCALL_DEFINE2(utime, char __user *, filename, struct utimbuf __user *, times)
(gdb) n
do_syscall_64 (nr=18446612682188144960, regs=0xffffc900001b7f58) at arch/x86/entry/common.c:300
300 syscall_return_slowpath(regs);
(gdb) s
syscall_return_slowpath (regs=<optimized out>) at arch/x86/entry/common.c:300
300 syscall_return_slowpath(regs);
(gdb) s
get_current () at ./arch/x86/include/asm/current.h:15
15 return this_cpu_read_stable(current_task);
(gdb) s
syscall_return_slowpath (regs=<optimized out>) at arch/x86/entry/common.c:256
256 u32 cached_flags = READ_ONCE(ti->flags);
(gdb) n
270 if (unlikely(cached_flags & SYSCALL_EXIT_WORK_FLAGS))
(gdb) n
273 local_irq_disable();
(gdb) s
arch_local_irq_disable () at arch/x86/entry/common.c:273
273 local_irq_disable();
(gdb) s
native_irq_disable () at ./arch/x86/include/asm/irqflags.h:49
49 asm volatile("cli": : :"memory");
(gdb) s
syscall_return_slowpath (regs=<optimized out>) at arch/x86/entry/common.c:274
274 prepare_exit_to_usermode(regs);
(gdb) n
do_syscall_64 (nr=<optimized out>, regs=<optimized out>) at arch/x86/entry/common.c:300
300 syscall_return_slowpath(regs);
(gdb) n
301 }
(gdb) n
entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:184
184 movq RCX(%rsp), %rcx
(gdb) n
185 movq RIP(%rsp), %r11
(gdb) n
187 cmpq %rcx, %r11 /* SYSRET requires RCX == RIP */
(gdb) n
188 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
205 shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
(gdb) n
206 sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx
(gdb) n
210 cmpq %rcx, %r11
(gdb) n
211 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
213 cmpq $__USER_CS, CS(%rsp) /* CS must match SYSRET */
(gdb) n
214 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
216 movq R11(%rsp), %r11
(gdb) n
217 cmpq %r11, EFLAGS(%rsp) /* R11 == RFLAGS */
(gdb) n
218 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
238 testq $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11
(gdb) n
239 jnz swapgs_restore_regs_and_return_to_usermode
(gdb) n
243 cmpq $__USER_DS, SS(%rsp) /* SS must match SYSRET */
(gdb) n
244 jne swapgs_restore_regs_and_return_to_usermode
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:253
253 POP_REGS pop_rdi=0 skip_r11rcx=1
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:259
259 movq %rsp, %rdi
(gdb) n
260 movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:262
262 pushq RSP-RDI(%rdi) /* RSP */
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:263
263 pushq (%rdi) /* RDI */
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:271
271 SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi
(gdb) n
273 popq %rdi
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:274
274 popq %rsp
(gdb) n
syscall_return_via_sysret () at arch/x86/entry/entry_64.S:275
275 USERGS_SYSRET64
(gdb) n
0x000000000043f257 in ?? ()
(gdb) n
Cannot find bounds of current function
(gdb)
四、 分析總結
utime的系統調用觸發大致過程如下(錯誤之處望指正):
utime
函數觸發系統調用__x64_sys_utime
,其主要通過調用do_utimes
來完成相應的功能。- do_utimes通過文件描述符引用一個打開的文件,然后操作文件。If times==NULL,就將訪問和修改設置為當前時間。然后調用
do_syscall_64
從寄存器%rax里面取出系統調用號,然后根據系統調用號,在系統調用表sys_call_table中找到相應的函數進行調用並將寄存器中保存的參數取出來,作為函數參數,然后陷入內核。- 最后系統調用結束前,一般會調用
prepare_exit_to_usermode
進行准備工作,然后使用jne條件轉移指令等進行一系列的restore,恢復到用戶態。e.g:jne swapgs_restore_regs_and_return_to_usermode
參考文章:
- https://blog.csdn.net/CSLQM/article/details/53202225
- http://www.daileinote.com/computer/linux_sys/13
- https://stackoverflow.com/questions/31062010/ubuntu-14-04-gcc-4-8-4-gdb-pretty-printing-doesnt-work-because-of-python-issu
- https://www.cnblogs.com/guxuanqing/p/5638363.html
- https://blog.csdn.net/zhaoxd200808501/article/details/77838933
- https://www.binss.me/blog/the-analysis-of-linux-system-call/