Transparent Huge Pages (THP) are enabled by default in RHEL 6 for all applications. The kernel attempts to allocate hugepages whenever possible and any Linux process will receive 2MB pages if the mmap region is 2MB naturally aligned. The main kernel address space itself is mapped with hugepages, reducing TLB pressure from kernel code. For general information on Hugepages, see: What are Huge Pages and what are the advantages of using them?
The kernel will always attempt to satisfy a memory allocation using hugepages. If no hugepages are available (due to non availability of physically continuous memory for example) the kernel will fall back to the regular 4KB pages. THP are also swappable (unlike hugetlbfs). This is achieved by breaking the huge page to smaller 4KB pages, which are then swapped out normally.
But to use hugepages effectively, the kernel must find physically continuous areas of memory big enough to satisfy the request, and also properly aligned. For this, a khugepaged kernel thread has been added. This thread will occasionally attempt to substitute smaller pages being used currently with a hugepage allocation, thus maximizing THP usage.
In userland, no modifications to the applications are necessary (hence transparent). But there are ways to optimize its use. For applications that want to use hugepages, use of posix_memalign() can also help ensure that large allocations are aligned to huge page (2MB) boundaries.
Also, THP is only enabled for anonymous memory regions. There are plans to add support for tmpfs and page cache. THP tunables are found in the /sys tree under /sys/kernel/mm/redhat_transparent_hugepage.
一般而言,內存管理的最小塊級單位叫做page,一個page是4096bytes,1M的內存會有256個page,1GB的話就會有256,000個page。CPU通過內置的內存管理單元維護着page表記錄。
正常來說,有兩種方式來增加內存可以管理的內存大小:
1.增大硬件內存管理單元的大小。
2.增大page的大小。
第一個方法不是很現實,現代的硬件內存管理單元最多只支持數百到上千的page表記錄,並且,對於數百萬page表記錄的維護算法必將與目前的數百條記錄的維護算法大不相同才能保證性能,目前的解決辦法是,如果一個程序所需內存page數量超過了內存管理單元的處理大小,操作系統會采用軟件管理的內存管理單元,但這會使程序運行的速度變慢。
從redhat 6(centos,sl,ol)開始,操作系統開始支持 Huge Pages,也就是大頁。
簡單來說, Huge Pages就是大小為2M到1GB的內存page,主要用於管理數千兆的內存,比如1GB的page對於1TB的內存來說是相對比較合適的。
THP(Transparent Huge Pages)是一個使管理Huge Pages自動化的抽象層。
目前需要注意的是,由於實現方式問題,THP會造成內存鎖影響性能,尤其是在程序不是專門為大內內存頁開發的時候,簡單介紹如下:
操作系統后台有一個叫做khugepaged的進程,它會一直掃描所有進程占用的內存,在可能的情況下會把4kpage交換為Huge Pages,在這個過程中,對於操作的內存的各種分配活動都需要各種內存鎖,直接影響程序的內存訪問性能,並且,這個過程對於應用是透明的,在應用層面不可控制,對於專門為4k page優化的程序來說,可能會造成隨機的性能下降現象。
小知識點:
1:從RedHat 6, OEL 6, SLES 11 and UEK2 kernels 開始,系統缺省會啟用 Transparent HugePages :用來提高內存管理的性能透明大頁(Transparent HugePages )和之前版本中的大頁功能上類似。主要的區別是:Transparent HugePages 可以實時配置,不需要重啟才能生效配置;
2:Transparent Huge Pages在32位的RHEL 6中是不支持的。
Transparent Huge Pages are not available on the 32-bit version of RHEL 6.
3: ORACLE官方不建議我們使用RedHat 6, OEL 6, SLES 11 and UEK2 kernels 時的開啟透明大頁(Transparent HugePages ), 因為透明大頁(Transparent HugePages ) 存在一些問題:
1.在RAC環境下 透明大頁(Transparent HugePages )會導致異常節點重啟,和性能問題;
2.在單機環境中,透明大頁(Transparent HugePages ) 也會導致一些異常的性能問題;
Transparent HugePages memory is enabled by default with Red Hat Enterprise Linux 6, SUSE Linux Enterprise Server 11, and Oracle Linux 6 with earlier releases of Oracle Linux Unbreakable Enterprise Kernel 2 (UEK2) kernels. Transparent HugePages memory is disabled in later releases of Oracle Linux UEK2 kernels.Transparent HugePages can cause memory allocation delays during runtime. To avoid performance issues, Oracle recommends that you disable Transparent HugePages on all Oracle Database servers. Oracle recommends that you instead use standard HugePages for enhanced performance.Transparent HugePages memory differs from standard HugePages memory because the kernel khugepaged thread allocates memory dynamically during runtime. Standard HugePages memory is pre-allocated at startup, and does not change during runtime.
Starting with RedHat 6, OEL 6, SLES 11 and UEK2 kernels, Transparent HugePages are implemented and enabled (default) in an attempt to improve the memory management. Transparent HugePages are similar to the HugePages that have been available in previous Linux releases. The main difference is that the Transparent HugePages are set up dynamically at run time by the khugepaged thread in kernel while the regular HugePages had to be preallocated at the boot up time. Because Transparent HugePages are known to cause unexpected node reboots and performance problems with RAC, Oracle strongly advises to disable the use of Transparent HugePages. In addition, Transparent Hugepages may cause problems even in a single-instance database environment with unexpected performance problems or delays. As such, Oracle recommends disabling Transparent HugePages on all Database servers running Oracle.
4:安裝Vertica Analytic Database時也必須關閉透明大頁功能。
RHEL6優化了內存申請的效率,而且在某些場景下對KVM的性能有明顯提升:http://www.linux-kvm.org/wiki/images/9/9e/2010-forum-thp.pdf。
Hadoop是個高密集型內存運算系統,這個改動似乎給它帶來了副作用。理論上運算型Java程序應該更多的使用用戶態CPU才對,Cloudera官方也推薦關閉THP。於是參考一些文章作了調整:
http://structureddata.org/2012/06/18/linux-6-transparent-huge-pages-and-hadoop-workloads/
效果很明顯,大概12:05分的時候操作的,系統態占用基本消失了。文件Cache使用上升、機器負載下降。
除了手動修改運行時參數之外,還可以修改 /etc/grub.conf 里內核的啟動參數,追加“transparent_hugepage=never”(此選項只對 /sys/kernel/mm/redhat_transparent_hugepage/enabled 有效)。