1. 在Linux下,如何確認是多核或多CPU:
#cat /proc/cpuinfo
如果有多個類似以下的項目,則為多核或多CPU:
processor : 0
......
processor : 1
2. Linux下,如何看每個CPU的使用率:
#top -d 1
之后按下1. 則顯示多個CPU
Cpu0 : 1.0%us, 3.0%sy, 0.0%ni, 96.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
3. 如何察看某個進程在哪個CPU上運行:
#top -d 1
之后按下f.進入top Current Fields設置頁面:
選中:j: P = Last used cpu (SMP)
則多了一項:P 顯示此進程使用哪個CPU。
Sam經過試驗發現:同一個進程,在不同時刻,會使用不同CPU Core.這應該是Linux Kernel SMP處理的。
4. 配置Linux Kernel使之支持多Core:
內核配置期間必須啟用 CONFIG_SMP 選項,以使內核感知 SMP。
Processor type and features ---> Symmetric multi-processing support
察看當前Linux Kernel是否支持(或者使用)SMP
#uname -a
5. Kernel 2.6的SMP負載平衡:
在 SMP 系統中創建任務時,這些任務都被放到一個給定的 CPU 運行隊列中。通常來說,我們無法知道一個任務何時是短期存在的,何時需要長期運行。因此,最初任務到 CPU 的分配可能並不理想。
為了在 CPU 之間維護任務負載的均衡,任務可以重新進行分發:將任務從負載重的 CPU 上移動到負載輕的 CPU 上。Linux 2.6 版本的調度器使用負載均衡(load balancing) 提供了這種功能。每隔 200ms,處理器都會檢查 CPU 的負載是否不均衡;如果不均衡,處理器就會在 CPU 之間進行一次任務均衡操作。
這個過程的一點負面影響是新 CPU 的緩存對於遷移過來的任務來說是冷的(需要將數據讀入緩存中)。
記住 CPU 緩存是一個本地(片上)內存,提供了比系統內存更快的訪問能力。如果一個任務是在某個 CPU 上執行的,與這個任務有關的數據都會被放到這個 CPU 的本地緩存中,這就稱為熱的。如果對於某個任務來說,CPU 的本地緩存中沒有任何數據,那么這個緩存就稱為冷的。
不幸的是,保持 CPU 繁忙會出現 CPU 緩存對於遷移過來的任務為冷的情況。
6. 應用程序如何利用多Core :
開發人員可將可並行的代碼寫入線程,而這些線程會被SMP操作系統安排並發運行。
另外,Sam設想,對於必須順序執行的代碼。可以將其分為多個節點,每個節點為一個thread.並在節點間放置channel.節點間形如流水線。這樣也可以大大增強CPU利用率。
=============================
Linux最大線程數限制及當前線程數查詢
檢查 使用 ps -fe |grep programname 查看獲得進程的pid,再使用 ps -Lf pid 查看對應進程下的線程數.
查找資料發現可以通過設置 ulimit -s 來增加每進程線程數。 每進程可用線程數 = VIRT上限/stack size 32位x86系統默認的VIRT上限是3G(內存分配的3G+1G方式),64位x86系統默認的VIRT上限是64G
1.根據進程號進行查詢:
pstree -p 進程號
top -Hp 進程號
1、總結系統限制有:
查看最大進程數:
cat /proc/sys/kernel/pid_max
#我8G內存,可用最大進程值32768
查看最大線程數:
cat /proc/sys/kernel/threads-max
#我8G內存, 可用最大線程值61036
ulimit -s
#可以查看默認的線程棧大小,一般情況下,這個值是 8M[8192]
查看用戶可用進程數量 max_user_process
ulimit -u
31508
/proc/sys/vm/max_map_count
系統理論最大進程數 65530
/proc/sys/vm下內核參數解析
[wuyaalan@localhost desktop]$ cd /proc/sys/vm/
[wuyaalan@localhost vm]$ ls
block_dump hugepages_treat_as_movable oom_kill_allocating_task
compact_memory hugetlb_shm_group overcommit_memory
dirty_background_bytes laptop_mode overcommit_ratio
dirty_background_ratio legacy_va_layout page-cluster
dirty_bytes lowmem_reserve_ratio panic_on_oom
dirty_expire_centisecs max_map_count percpu_pagelist_fraction
dirty_ratio min_free_kbytes scan_unevictable_pages
dirty_writeback_centisecs mmap_min_addr stat_interval
drop_caches nr_hugepages swappiness
extfrag_threshold nr_overcommit_hugepages vdso_enabled
extra_free_kbytes nr_pdflush_threads vfs_cache_pressure
highmem_is_dirtyable oom_dump_tasks would_have_oomkilled
從上面結果可以看出,proc文件系統給用戶提供了很多內核信息幫助,使得用戶可以通過修改內核參數達到提高系統性能的目的。
接下來對上面列出的部分參數含義進行解釋說明。
一 block_dump block_dump enables block I/O debugging when set to a nonzero value. If you want to find out which process caused the disk to spin up(see /proc/sys/vm/laptop_mode ), you can gather information by setting the flag. When this flag is set, Linux reports all disk read and write operations that take place, and all block dirtyings done to files. This makes it possible to debug why a disk needs to spin up, and to increase battery life even more. The output of block_dump is written to the kernel output, and it can be retrieved using "dmesg". When you use block_dump and your kernel logging level also includes kernel debugging messages, you probably want to turn off klogd, otherwise the output of block_dump will be logged, causing disk activity that is not normally there.
參數block_dump使塊I / O調試時設置為一個非零的值。如果你想找出哪些過程引起的磁盤旋轉(見/proc/sys/vm/laptop_mode), 你可以通過設置標志收集信息。設置該標志后,Linux將會以文件的形式報告所有磁盤活動時的讀寫操作以及所有臟塊。這使得它可以解釋為什么一個磁盤需要 旋轉起來,甚至可以增加電池壽命。把block_dump輸出寫至內核輸出,可以使用“dmesg”相關信息。當你使用block_dump和內核日志記 錄級別,還包括內核調試信息,你可能要關閉klogd,否則block_dump輸出將被記錄,導致不正常的磁盤活動有。
二 dirty_background_ratio Contains, as a percentage of total system memory, the number of pages at which the pdflush background writeback daemon will start writing out dirty data.
參數dirty_background_ratio是當所有被更改頁面總大小占工作內存超過 一定比例 時,pdflush 會開始寫回工作。用戶可以增加這個比例,以增加頁面駐留在內存的時間。
三 dirty_expire_centisecs This tunable is used to define when dirty data is old enough to be eligible for writeout by the pdflush daemons. It is expressed in 100'ths of a second. Data which has been dirty in memory for longer than this interval will be written out next time a pdflush daemon wakes up.
參數dirty_expire_centisecs控制一個更改過的頁面經過多長時間后被認為是過期的、必須被寫回的頁面。
四 dirty_ratio Contains, as a percentage of total system memory, the number of pages at which a process which is generating disk writes will itself start writing out dirty data.
五 dirty_writeback_centisecs The pdflush writeback daemons will periodically wake up and write "old" data out to disk. This tunable expresses the interval between those wakeups, in 100'ths of a second. Setting this to zero disables periodic writeback altogether.
參數dirty_writeback_centisecs 是在pdflash線程周期喚醒的時間間隔。也就是每過一定時間pdflsh就會將修改過得數據回寫到磁盤。
六 drop_caches Writing to this will cause the kernel to drop clean caches, dentries and inodes from memory, causing that memory to become free. To free pagecache:
echo 1 > /proc/sys/vm/drop_caches
To free dentries and inodes:
echo 2 > /proc/sys/vm/drop_caches
To free pagecache, dentries and inodes:
echo 3 > /proc/sys/vm/drop_caches
As this is a non-destructive operation, and dirty objects are not freeable, the user should run "sync" first in order to make sure all cached objects are freed. This tunable was added in 2.6.16.
七 hugepages_treat_as_movable When a non-zero value is written to this tunable, future allocations for the huge page pool will use ZONE_MOVABLE. Despite huge pages being non-movable, we do not introduce additional external fragmentation of note as huge pages are always the largest contiguous block we care about. Huge pages are not movable so are not allocated from ZONE_MOVABLE by default. However, as ZONE_MOVABLE will always have pages that can be migrated or reclaimed, it can be used to satisfy hugepage allocations even when the system has been running a long time. This allows an administrator to resize the hugepage pool at runtime depending on the size of ZONE_MOVABLE.
八 hugetlb_shm_group hugetlb_shm_group contains group id that is allowed to create SysV shared memory segment using hugetlb page
九 laptop_mode laptop_mode is a knob that controls "laptop mode". When the knob is set, any physical disk I/O (that might have caused the hard disk to spin up, see 。/proc/sys/vm/block_dump) causes Linux to flush all dirty blocks. The result of this is that after a disk has spun down, it will not be spun up anymore to write dirty blocks, because those blocks had already been written immediately after the most recent read operation. The value of the laptop_mode knob determines the time between the occurrence of disk I/O and when the flush is triggered. A sensible value for the knob is 5 seconds. Setting the knob to 0 disables laptop mode.
在“筆記本模式”下,內核更智能的使用 I/O 系統,它會盡量使磁盤處於低能耗的狀態下。“筆記本模式”會將許多的 I/O 操作組織在一起,一次完成,而在每次的磁盤 I/O 之間是默認長達 10 分鍾的非活動期,這樣會大大減少磁盤啟動的次數。為了完成這么長時間的非活動期,內核就要在一次活動期時完成盡可能多的 I/O 任務。在一次活動期間,要完成大量的預讀,然后將所有的緩沖同步。
十 legacy_va_layout If non-zero, this sysctl disables the new 32-bit mmap map layout - the kernel will use the legacy (2.4) layout for all processes
十一 lowmem_reserve_ratio Ratio of total pages to free pages for each memory zone.
十二 max_map_count This file contains the maximum number of memory map areas a process may have. Memory map areas are used as a side-effect of calling malloc, directly by mmap and mprotect, and also when loading shared libraries. While most applications need less than a thousand maps, certain programs, particularly malloc debuggers, may consume lots of them, e.g., up to one or two maps per allocation. The default value is 65536.
十三 min_free_kbytes This is used to force the Linux VM to keep a minimum number of kilobytes free. The VM uses this number to compute a pages_min value for each lowmem zone in the system. Each lowmem zone gets a number of reserved free pages based proportionally on its size.
十四 mmap_min_addr This file indicates the amount of address space which a user process will be restricted from mmaping. Since kernel null dereference bugs could accidentally operate based on the information in the first couple of pages of memory userspace processes should not be allowed to write to them. By default this value is set to 0 and no protections will be enforced by the security module. Setting this value to something like 64k will allow the vast majority of applications to work correctly and provide defense in depth against future potential kernel bugs.
十五 nr_hugepages nr_hugepages configures number of hugetlb page reserved for the system.
十六 nr_pdflush_threads The count of currently-running pdflush threads. This is a read-only value.
十七 numa_zonelist_order This sysctl is only for NUMA. 'Where the memory is allocated from' is controlled by zonelists. In non-NUMA case, a zonelist for GFP_KERNEL is ordered as following: ZONE_NORMAL -> ZONE_DMA. This means that a memory allocation request for GFP_KERNEL will get memory from ZONE_DMA only when ZONE_NORMAL is not available. In NUMA case, you can think of following 2 types of order. Assume 2 node NUMA and below is zonelist of Node(0)'s GFP_KERNEL:
(A) Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA -> Node(1) ZONE_NORMAL
(B) Node(0) ZONE_NORMAL -> Node(1) ZONE_NORMAL -> Node(0) ZONE_DMA. Type(A) offers the best locality for processes on Node(0), but ZONE_DMA will be used before ZONE_NORMAL exhaustion. This increases possibility of out-of-memory (OOM) of ZONE_DMA because ZONE_DMA is tend to be small. Type(B) cannot offer the best locality but is more robust against OOM of the DMA zone. Type(A) is called as "Node" order. Type (B) is "Zone" order. "Node order" orders the zonelists by node, then by zone within each node. Specify "[Nn]ode" for node order. "Zone Order" orders the zonelists by zone type, then by node within each zone. Specify "[Zz]one" for zone order. Specify "[Dd]efault" to request automatic configuration. Autoconfiguration will select "node" order in following case:
(1) if the DMA zone does not exist or
(2) if the DMA zone comprises greater than 50% of the available memory or
(3) if any node's DMA zone comprises greater than 60% of its local memory and the amount of local memory is big enough. Otherwise, "zone" order will be selected. Default order is recommended unless this is causing problems for your system/application.
十八 overcommit_memory Controls overcommit of system memory, possibly allowing processes to allocate (but not use) more memory than is actually available.
0 - Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slighly more memory in this mode. This is the default.
1 - Always overcommit. Appropriate for some scientific applications.
2 - Don't overcommit. The total address space commit for the system is not permitted to exceed swap plus a configurable percentage (default is 50) of physical RAM. Depending on the percentage you use, in most situations this means a process will not be killed while attempting to use already-allocated memory but will receive errors on memory allocation as appropriate.
十九 overcommit_ratio Percentage of physical memory size to include in overcommit calculations. Memory allocation limit = swapspace + physmem * (overcommit_ratio / 100) swapspace = total size of all swap areas
physmem = size of physical memory in system
二十 page-cluster page-cluster controls the number of pages which are written to swap in a single attempt. The swap I/O size. It is a logarithmic value - setting it to zero means "1 page", setting it to 1 means "2 pages", setting it to 2 means "4 pages", etc. The default value is three (eight pages at a time). There may be some small benefits in tuning this to a different value if your workload is swap-intensive.
二十一 panic_on_oom This enables or disables panic on out-of-memory feature. If this is set to 1, the kernel panics when out-of-memory happens. If this is set to 0, the kernel will kill some rogue process, by calling oom_kill(). Usually, oom_killer can kill rogue processes and system will survive. If you want to panic the system rather than killing rogue processes, set this to 1. The default value is 0.
二十二 percpu_pagelist_fraction This is the fraction of pages at most (high mark pcp->high) in each zone that are allocated for each per cpu page list. The min value for this is 8. It means that we don't allow more than 1/8th of pages in each zone to be allocated in any single per_cpu_pagelist. This entry only changes the value of hot per cpu pagelists. User can specify a number like 100 to allocate 1/100th of each zone to each per cpu page list. The batch value of each per cpu pagelist is also updated as a result. It is set to pcp->high / 4. The upper limit of batch is (PAGE_SHIFT * 8). The initial value is zero. Kernel does not use this value at boot time to set the high water marks for each per cpu page list.
二十三 stat_interval With this tunable you can configure VM statistics update interval. The default value is 1. This tunable first appeared in 2.6.22 kernel.
二十四 swap_token_timeout This file contains valid hold time of swap out protection token. The Linux VM has token based thrashing control mechanism and uses the token to prevent unnecessary page faults in thrashing situation. The unit of the value is second. The value would be useful to tune thrashing behavior. This tunable was removed in 2.6.20 when the algorithm got improved.
二十五 swappiness swappiness is a parameter which sets the kernel's balance between reclaiming pages from the page cache and swapping process memory. The default value is 60. If you want kernel to swap out more process memory and thus cache more file contents increase the value. Otherwise, if you would like kernel to swap less decrease it.
二十六 vdso_enabled When this flag is set, the kernel maps a vDSO page into newly created processes and passes its address down to glibc upon exec(). This feature is enabled by default. vDSO is a virtual DSO (dynamic shared object) exposed by the kernel at some address in every process' memory. It's purpose is to speed up system calls. The mapping address used to be fixed (0xffffe000), but starting with 2.6.18 it's randomized (besides the security implications, this also helps debuggers
二十七 vfs_cache_pressure Controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects. At the default value of vfs_cache_pressure = 100 the kernel will attempt to reclaim dentries and inodes at a "fair" rate with respect to pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry and inode caches. Increasing vfs_cache_pressure beyond 100 causes the kernel to prefer to reclaim dentries and inodes.
硬件內存大小
3、查詢當前某程序的線程或進程數(pid為3660 )
pstree -p ps -e | grep java | awk '{print $1}' | wc -l
或
pstree -p 3660 | wc -l
ps -p 3660 H
或 ps H -p 3660
4、查詢當前整個系統已用的線程或進程數
pstree -p | wc -l
1、 cat /proc/${pid}/status
#查看該pid的thread數 等信息, ppid=》parent pid
2、pstree -p ${pid}
#顯示該pid 的(子)進程
3、top -p ${pid}
再按H 或者直接輸入 top -bH -d 3 -p ${pid}
top -H
手冊中說:-H : Threads toggle
加上這個選項啟動top,top一行顯示一個線程。否則,它一行顯示一個進程。
4、ps xH
手冊中說:H Show threads as if they were processes
這樣可以查看所有存在的線程。
5、ps -mp <PID>
手冊中說:m Show threads after processes
這樣可以查看一個進程起的線程數。