背景
在Linux中,要了解進程的信息,莫過於從 proc 文件系統中入手去看。
proc的詳細介紹,可以參考內核文檔的解讀,里面有很多內容
yum install -y kernel-doc
cat /usr/share/doc/kernel-doc-3.10.0/Documentation/filesystems/proc.txt
proc主要內容
Table of Contents
-----------------
0 Preface
0.1 Introduction/Credits
0.2 Legal Stuff
1 Collecting System Information
1.1 Process-Specific Subdirectories
1.2 Kernel data
1.3 IDE devices in /proc/ide
1.4 Networking info in /proc/net
1.5 SCSI info
1.6 Parallel port info in /proc/parport
1.7 TTY info in /proc/tty
1.8 Miscellaneous kernel statistics in /proc/stat
1.9 Ext4 file system parameters
2 Modifying System Parameters
3 Per-Process Parameters
3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj - Adjust the oom-killer
score
3.2 /proc/<pid>/oom_score - Display current oom-killer score
3.3 /proc/<pid>/io - Display the IO accounting fields
3.4 /proc/<pid>/coredump_filter - Core dump filtering settings
3.5 /proc/<pid>/mountinfo - Information about mounts
3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
3.7 /proc/<pid>/task/<tid>/children - Information about task children
3.8 /proc/<pid>/fdinfo/<fd> - Information about opened file
4 Configuring procfs
4.1 Mount options
和進程內存相關的幾個信息
maps Memory maps to executables and library files (2.4)
statm Process memory status information
status Process status in human readable form
smaps a extension based on maps, showing the memory consumption of
each mapping and flags associated with it
詳解
status
這里可以看到概貌的內存統計
程序啟動后,進程的內存占用可能包括程序本身的空間,共享的內存空間,mmap,malloc 的等
VmPeak peak virtual memory size
VmSize total program size
VmLck locked memory size
VmHWM peak resident set size ("high water mark")
VmRSS size of memory portions
VmData size of data, stack, and text segments
VmStk size of data, stack, and text segments
VmExe size of text segment
VmLib size of shared library code
VmPTE size of page table entries
VmSwap size of swap usage (the number of referred swapents)
statm
內存統計信息,單位為PAGE ,通過getconf可以獲得操作系統的page大小
getconf PAGE_SIZE
Field Content
size total program size (pages) (same as VmSize in status)
resident size of memory portions (pages) (same as VmRSS in status)
shared number of pages that are shared (i.e. backed by a file)
trs number of pages that are 'code' (not including libs; broken,
includes data segment)
lrs number of pages of library (always 0 on 2.6)
drs number of pages of data/stack (including libs; broken,
includes library text)
dt number of dirty pages (always 0 on 2.6)
maps
進程與可執行程序或動態庫文件相關的映射信息
address perms offset dev inode pathname
08048000-08049000 r-xp 00000000 03:00 8312 /opt/test
08049000-0804a000 rw-p 00001000 03:00 8312 /opt/test
0804a000-0806b000 rw-p 00000000 00:00 0 [heap]
a7cb1000-a7cb2000 ---p 00000000 00:00 0
a7cb2000-a7eb2000 rw-p 00000000 00:00 0
a7eb2000-a7eb3000 ---p 00000000 00:00 0
a7eb3000-a7ed5000 rw-p 00000000 00:00 0 [stack:1001]
a7ed5000-a8008000 r-xp 00000000 03:00 4222 /lib/libc.so.6
a8008000-a800a000 r--p 00133000 03:00 4222 /lib/libc.so.6
a800a000-a800b000 rw-p 00135000 03:00 4222 /lib/libc.so.6
a800b000-a800e000 rw-p 00000000 00:00 0
a800e000-a8022000 r-xp 00000000 03:00 14462 /lib/libpthread.so.0
a8022000-a8023000 r--p 00013000 03:00 14462 /lib/libpthread.so.0
a8023000-a8024000 rw-p 00014000 03:00 14462 /lib/libpthread.so.0
a8024000-a8027000 rw-p 00000000 00:00 0
a8027000-a8043000 r-xp 00000000 03:00 8317 /lib/ld-linux.so.2
a8043000-a8044000 r--p 0001b000 03:00 8317 /lib/ld-linux.so.2
a8044000-a8045000 rw-p 0001c000 03:00 8317 /lib/ld-linux.so.2
aff35000-aff4a000 rw-p 00000000 00:00 0 [stack]
ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso]
1. where "address" is the address space in the process that it occupies, "perms"
is a set of permissions:
r = read
w = write
x = execute
s = shared
p = private (copy on write)
2. "offset" is the offset into the mapping,
3. "dev" is the device (major:minor),
4. "inode" is the inode on that device.
0 indicates that no inode is associated with the memory region, as the case would be with BSS (uninitialized data).
5. The "pathname" shows the name associated file for this mapping.
If the mapping is not associated with a file:
[heap] = the heap of the program
[stack] = the stack of the main process
[stack:1001] = the stack of the thread with tid 1001
[vdso] = the "virtual dynamic shared object",
the kernel system call handler
or if empty, the mapping is anonymous.
smaps
對應每個映射的內存開銷詳情
08048000-080bc000 r-xp 00000000 03:02 13130 /bin/bash
Size: 1084 kB
Rss: 892 kB
Pss: 374 kB
Shared_Clean: 892 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 892 kB
Anonymous: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 374 kB
VmFlags: rd ex mr mw me de
1. the size of the mapping(size),
2. the amount of the mapping that is currently resident in RAM (RSS),
3. the process' proportional share of this mapping (PSS),
4. the number of clean and dirty private pages in the mapping.
Note that even a page which is part of a MAP_SHARED mapping, but has only a single pte mapped,
i.e. is currently used by only one process, is accounted as private and not as shared.
5. "Referenced" indicates the amount of memory currently marked as referenced or accessed.
6. "Anonymous" shows the amount of memory that does not belong to any file.
Even a mapping associated with a file may contain anonymous pages:
when MAP_PRIVATE and a page is modified, the file page is replaced by a private anonymous copy.
7. "Swap" shows how much would-be-anonymous memory is also used, but out on
swap.
8. "VmFlags" field deserves a separate description.
This member represents the kernel flags associated with the particular virtual memory area in two letter encoded manner.
The codes are the following:
rd - readable
wr - writeable
ex - executable
sh - shared
mr - may read
mw - may write
me - may execute
ms - may share
gd - stack segment growns down
pf - pure PFN range
dw - disabled write to the mapped file
lo - pages are locked in memory
io - memory mapped I/O area
sr - sequential read advise provided
rr - random read advise provided
dc - do not copy area on fork
de - do not expand area on remapping
ac - area is accountable
nr - swap space is not reserved for the area
ht - area uses huge tlb pages
nl - non-linear mapping
ar - architecture specific flag
dd - do not include area into core dump
mm - mixed map area
hg - huge page advise flag
nh - no-huge page advise flag
mg - mergable advise flag
一般來說,業務進程使用的內存主要有以下幾種情況:
(1)用戶空間的匿名映射頁(Anonymous pages in User Mode address spaces),比如調用malloc分配的內存,以及使用MAP_ANONYMOUS的mmap;當系統內存不夠時,內核可以將這部分內存交換出去;
(2)用戶空間的文件映射頁(Mapped pages in User Mode address spaces),包含map file和map tmpfs;前者比如指定文件的mmap,后者比如IPC共享內存;當系統內存不夠時,內核可以回收這些頁,但回收之前可能需要與文件同步數據;
(3)文件緩存(page in page cache of disk file);發生在程序通過普通的read/write讀寫文件時,當系統內存不夠時,內核可以回收這些頁,但回收之前可能需要與文件同步數據;
(4)buffer pages,屬於page cache;比如讀取塊設備文件。
進程RSS, 進程使用的所有物理內存(file_rss+anon_rss),即Anonymous pages+Mapped apges(包含共享內存)
Resident Set Size:
number of pages the process has in real memory.
This is just the pages which count toward text, data, or stack space.
This does not include pages which have not been demand-loaded in,
or which are swapped out.
顯然如果把所有進程RSS的值相加,可能會超過實際的內存大小,原因是RSS統計存在一定的重復部分,例如在共享內存的計算方面,不同的進程會有重復的現象。
通過smaps可以非常方便的將重復的部分消除掉。
例如有多個進程加載了同樣的庫文件,那么會在這些進程間均攤這部分內存,均攤后的共享部分加上進程私有的內存記為Pss。
Pss: 374 kB
私有的內存則在Private里面計算
Private_Clean: 0 kB
Private_Dirty: 0 kB
在linux中有一個工具叫smem,其實就是通過smaps來統計的。
PSS是Pss的相加
USS則是Private的相加
yum install -y smem smemstat
smem can report proportional set size (PSS), which is a more meaningful representation of the amount of memory used by libraries and applications in a virtual memory system.
Because large portions of physical memory are typically shared among multiple applications, the standard measure of memory usage known as resident set size (RSS) will significantly overestimate memory usage. PSS instead measures each application's "fair share" of each shared area to give a realistic measure.
例子
smem
PID User Command Swap USS PSS RSS
23716 digoal postgres: postgres postgres 0 4924 5387 7040
對應的RSS, PSS, USS分別等於以下相加.
# cat /proc/23716/smaps | grep Rss
# cat /proc/23716/smaps | grep Pss
# cat /proc/23716/smaps | grep Private_
其他參考文章
http://hustcat.github.io/memory-usage-in-process-and-cgroup/
http://blog.hellosa.org/2010/02/26/pmap-process-memory.html
首先 ps 看一下我的系統跑着哪些process
$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
...
czbug 1980 0.0 1.7 180472 34416 ? Sl Feb25 0:01 /usr/bin/yakuake
...
我拿 yakuake 這個小程序作例子。
其中,關於內存的是 VSZ 和 RSS 這兩個
man ps 看看它們的含義:
rss RSS resident set size, the non-swapped physical memory that a task has used (in kiloBytes). (alias rssize, rsz).
vsz VSZ virtual memory size of the process in KiB (1024-byte units). Device mappings are currently excluded; this is subject to change. (alias vsize).
簡單一點說,RSS 就是這個process 實際占用的物理內存,VSZ 就是process 的虛擬內存,就是process 現在沒有使用但未來可能會分配的內存大小。
其實這里的ps 出來的結果,是有點不正確的,如果把所有程序的 RSS 加起來,恐怕比你的實際內存還要大呢。為什么呢??因為 ps 的結果,RSS 那部分,是包括共享內存的。這里我用 pmap 來看看。
$ pmap -d 1980
1980: /usr/bin/yakuake
Address Kbytes Mode Offset Device Mapping
00110000 2524 r-x-- 0000000000000000 008:00002 libkio.so.5.3.0
00387000 4 ----- 0000000000277000 008:00002 libkio.so.5.3.0
00388000 32 r---- 0000000000277000 008:00002 libkio.so.5.3.0
00390000 16 rw--- 000000000027f000 008:00002 libkio.so.5.3.0
00394000 444 r-x-- 0000000000000000 008:00002 libQtDBus.so.4.5.2
00403000 4 ----- 000000000006f000 008:00002 libQtDBus.so.4.5.2
00404000 4 r---- 000000000006f000 008:00002 libQtDBus.so.4.5.2
00405000 4 rw--- 0000000000070000 008:00002 libQtDBus.so.4.5.2
00407000 228 r-x-- 0000000000000000 008:00002 libkparts.so.4.3.0
00440000 8 r---- 0000000000039000 008:00002 libkparts.so.4.3.0
00442000 4 rw--- 000000000003b000 008:00002 libkparts.so.4.3.0
00443000 3552 r-x-- 0000000000000000 008:00002 libkdeui.so.5.3.0
007bb000 76 r---- 0000000000377000 008:00002 libkdeui.so.5.3.0
007ce000 24 rw--- 000000000038a000 008:00002 libkdeui.so.5.3.0
007d4000 4 rw--- 0000000000000000 000:00000 [ anon ]
....
mapped: 180472K writeable/private: 19208K shared: 20544K
我略去了一部分輸出,都是差不多的,重點在最后那行輸出。
linux 會把一些shared libraries 載入到內存中,在pmap 的輸出中,這些shared libraries 的名字通常是 lib*.so 。如 libX11.so.6.2.0 。這個 libX11.so.6.2.0 會被很多process load 到自己的運行環境中,同時,ps 輸出的RSS 結果中,每個process 都包含了這個libX11.so.6.2.0 ,而事實上它只被load 了一次,如果單純把ps 的結果相加,這樣就重復計算了。
而 pmap 的輸出中,writeable/private: 19208K ,這個就是yakuake 這個程序真正占用的物理內存,不包含shared libraries 。在這里,它只有19208K,而ps 的RSS 是34416K。
我在看這方面的資料時,還看到一些關於virtual memory 的,再記錄下。
以下兩個命令均可查看 vmsize 。
$ cat /proc/<pid>/stat | awk '{print $23 / 1024}'
$ cat /proc/<pid>/status | grep -i vmsize
一般來說,得出來的值,是和 ps 的 VSZ 是一樣的,但有一種情況例外,就是查看X server 的時候。
舉個例:
$ ps aux|grep /usr/bin/X|grep -v grep | awk '{print $2}' # 得出X server 的 pid ...
1076
$ cat /proc/1076/stat | awk '{print $23 / 1024}'
139012
$ cat /proc/1076/status | grep -i vmsize
VmSize: 106516 kB
而 ps 的 VSZ 為 106516 ,與后者是一致的。
據說是因為
VmSize = memory + memory-mapped hardware (e.g. video card memory).
參考:https://github.com/digoal/blog/blob/master/201606/20160608_01.md