在看《C程序性能優化》一書時,作者提到使用gcc編譯器選項-fomit-frame-pointer能夠提高程序性能,自己有些不解,決定探個究竟。
假設有如下簡單程序:
#include <stdio.h> int add(int a, int b) { return a + b; } int main() { int sum = 0; sum = add(1,2); printf("%d\n",sum); return 0; }
不使用-fomit-frame-pointer選項編譯出的二進制經過反匯編的代碼如下:
00000000 <add>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 8b 45 0c mov 0xc(%ebp),%eax 6: 8b 55 08 mov 0x8(%ebp),%edx 9: 01 d0 add %edx,%eax b: 5d pop %ebp c: c3 ret 0000000d <main>: d: 55 push %ebp e: 89 e5 mov %esp,%ebp 10: 83 e4 f0 and $0xfffffff0,%esp 13: 83 ec 20 sub $0x20,%esp 16: c7 44 24 1c 00 00 00 movl $0x0,0x1c(%esp) 1d: 00 1e: c7 44 24 04 02 00 00 movl $0x2,0x4(%esp) 25: 00 26: c7 04 24 01 00 00 00 movl $0x1,(%esp) 2d: e8 fc ff ff ff call 2e <main+0x21> 32: 89 44 24 1c mov %eax,0x1c(%esp) 36: b8 00 00 00 00 mov $0x0,%eax 3b: 8b 54 24 1c mov 0x1c(%esp),%edx 3f: 89 54 24 04 mov %edx,0x4(%esp) 43: 89 04 24 mov %eax,(%esp) 46: e8 fc ff ff ff call 47 <main+0x3a> 4b: b8 00 00 00 00 mov $0x0,%eax 50: c9 leave 51: c3 ret
加上編譯選項-fomit-frame-pointer反匯編得到的代碼如下:
00000000 <add>: 0: 8b 44 24 08 mov 0x8(%esp),%eax 4: 8b 54 24 04 mov 0x4(%esp),%edx 8: 01 d0 add %edx,%eax a: c3 ret 0000000b <main>: b: 55 push %ebp c: 89 e5 mov %esp,%ebp e: 83 e4 f0 and $0xfffffff0,%esp 11: 83 ec 20 sub $0x20,%esp 14: c7 44 24 1c 00 00 00 movl $0x0,0x1c(%esp) 1b: 00 1c: c7 44 24 04 02 00 00 movl $0x2,0x4(%esp) 23: 00 24: c7 04 24 01 00 00 00 movl $0x1,(%esp) 2b: e8 fc ff ff ff call 2c <main+0x21> 30: 89 44 24 1c mov %eax,0x1c(%esp) 34: b8 00 00 00 00 mov $0x0,%eax 39: 8b 54 24 1c mov 0x1c(%esp),%edx 3d: 89 54 24 04 mov %edx,0x4(%esp) 41: 89 04 24 mov %eax,(%esp) 44: e8 fc ff ff ff call 45 <main+0x3a> 49: b8 00 00 00 00 mov $0x0,%eax 4e: c9 leave 4f: c3 ret
可以看到不加-fomit-frame-pointer選項編譯出來的代碼少了一些,最主要的區別是少了棧幀的切換和棧地址的保存,棧是從高地址向低地址擴展,而堆是從低地址向高地址擴展。在x86體系結構中,棧頂寄存器是esp,棧底寄存器位ebp,esp的值要小於ebp的值。函數調用時先將函數返回值、傳入參數依次壓入棧中,CPU訪問時采用0x8(%esp)方式訪問傳入的參數,使用-fomit-frame-pointer會由於沒有保存棧調用地址,而導致無法追蹤函數調用順序,我想gcc,vs等編譯器記錄函數調用順序都是采用這種方式吧。