前言:
對於有Oracle運維經驗的童鞋來說,如果服務器內存很大,一般都會設置HugePages,是因為如下原因:
對於 Linux 操作系統,通過 Linux kswapd 進程和頁表內存結構(針對系統中存在的每個進程包含一條記錄)實現內存管理。 linux的內存管理采取的是分頁存取機制,為了保證物理內存能得到充分的利用, 內核會按照LRU算法在適當的時候將物理內存中不經常使用的內存頁自動交換到虛擬內存中, 而將經常使用的信息保留到物理內存。通常情況下,Linux默認情況下每頁是4K,這就意味着如果物理內存很大,則映射表的條目將會非常多, 會影響CPU的檢索效率。而且也浪費內存。因為內存大小是固定的,為了減少映射表的條目,可采取的辦法只有增加頁的尺寸。 因此Hugepage便因此而來。也就是打破傳統的小頁面的內存管理方式,使用大頁面2m,4m,16m,但是Linux系統的大頁默認就是2M 如此一來映射條目則明顯減少。如果系統有大量的物理內存(大於64G),建議使用Hugepage。 注意事項 1、HugePage使用的是共享內存,在操作系統啟動期間被動態分配並被保留,因為他們不會被置換。 2、由於不會被置換的特點,在使用hugepage的內存不能被其他的進程使用。所以要合理設置該值,避免造成內存浪費。 3、如果增加HugePage或添加物理內存或者是當前服務器增加了新的instance以及SGA設置發生變化,應該重新設置所需的HugePage。
辣么,MySQL也是支持滴,那么下面開始講講怎么設置大頁內存
1.首先來看看共享段內存, ###centos6的默認共享段內存大小是64G,如果你服務器內存沒有超過128G,可以不用修改 # Controls the maximum shared segment size, in bytes kernel.shmmax = 68719476736 # Controls the maximum number of shared memory segments, in pages kernel.shmall = 4294967296 ###先透露一下,使用大頁內存的和沒有使用大頁內存的PageTables [root@crmdbL-172 ~]# free -m total used free shared buffers cached Mem: 32058 29144 2913 0 20 11526 -/+ buffers/cache: 17597 14460 Swap: 8191 3 8188 [root@crmdbL-172 ~]# [root@crmdbL-172 ~]# cat /proc/meminfo | grep PageTables PageTables: 44808 kB [root@crmdbL-172 ~]# [root@node-207 ~]# free -m total used free shared buffers cached Mem: 32095 28501 3593 0 21 9233 -/+ buffers/cache: 19246 12848 Swap: 8095 0 8095 [root@node-207 ~]# [root@node-207 ~]# cat /proc/meminfo | grep PageTables PageTables: 5372 kB [root@node-207 ~]# 差距呢44808-5372=39436
2.設置MySQL使用大頁內存
########下面開始設置使用大頁內存 innodb_buffer_pool_size = 16384M innodb_additional_mem_pool_size = 16M 16384M+16M/2=8200 根據以往對ORACLE設置大頁的經驗,大頁內存要大於這個內存,所以我設置了8211個大頁 vim /etc/sysctl.conf #### HugePages 大小 vm.nr_hugepages=8211 ###使用大頁內存的用戶ID vm.hugetlb_shm_group=3306 (id mysql得到的結果) 設置當前系統生效,只要刷新一下就行了 sysctl -p vim /etc/security/limits.conf * soft nofile 65535 * soft nproc 65535 * hard nofile 65535 * hard nproc 65535 #* soft core 0 #* hard rss 10000 #@student hard nproc 20 #@faculty soft nproc 20 #@faculty hard nproc 50 #ftp hard nproc 0 #@student - maxlogins 4 ###設置mysql 使用 HugePages @mysql soft memlock unlimited @mysql hard memlock unlimited oracle使用大頁也是這樣設置 重啟MySQL,查看錯誤日志, 150728 16:37:43 mysqld_safe mysqld from pid file /data/3306/tmp/mysql.pid ended 150728 16:37:44 mysqld_safe Starting mysqld daemon with databases from /data/3306/data 2015-07-28 16:37:45 0 [Note] /opt/app/mysql/bin/mysqld (mysqld 5.6.24-log) starting as process 13420 ... 2015-07-28 16:37:45 13420 [Note] Plugin 'FEDERATED' is disabled. 2015-07-28 16:37:45 7f56f311d740 InnoDB: Warning: Using innodb_additional_mem_pool_size is DEPRECATED. This option may be removed in future releases, together with the option innodb_use_sys_malloc and with the InnoDB's internal memory allocator. 2015-07-28 16:37:45 13420 [Note] InnoDB: Using atomics to ref count buffer pool pages 2015-07-28 16:37:45 13420 [Note] InnoDB: The InnoDB memory heap is disabled 2015-07-28 16:37:45 13420 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2015-07-28 16:37:45 13420 [Note] InnoDB: Memory barrier is not used 2015-07-28 16:37:45 13420 [Note] InnoDB: Compressed tables use zlib 1.2.3 2015-07-28 16:37:45 13420 [Note] InnoDB: Using Linux native AIO 2015-07-28 16:37:45 13420 [Note] InnoDB: Using CPU crc32 instructions 2015-07-28 16:37:45 13420 [Note] InnoDB: Initializing buffer pool, size = 16.0G InnoDB: HugeTLB: Warning: Failed to allocate 2197815296 bytes. errno 12 InnoDB HugeTLB: Warning: Using conventional memory pool 居然兩個警告 InnoDB: HugeTLB: Warning: Failed to allocate 2197815296 bytes. errno 12 InnoDB HugeTLB: Warning: Using conventional memory pool ####using conventional memory pool 因為hugepage分配內存的時候,預分配、 而且這些分配的內存不能被其他進程占用,而且也不會交換到swap里面去。 因為這邊配置的,不夠大,innodb引擎要求的比你分配的大,這點內存不夠使用,所以轉成使用常規內存了 oracle這方面也出過案例,配置的內存小於SGA大小,白白的浪費那么多內存,造成是用到swap 既然報警說不夠,查看官方文檔,才知道大頁內存大小要大於(innodb_buffer_pool_size+innodb_additional_mem_pool_size+innodb_log_buffer_size+tmp_table_size),那么剛才配置的顯然不夠 那么我來慷慨點9300個大頁也就是說有(9300*2M=18600M,有18.1G的內存),看看能正常啟用大頁的日志是怎么樣的,再次啟動mysql看看,這次就不報錯了 150728 16:55:33 mysqld_safe mysqld from pid file /data/3306/tmp/mysql.pid ended 150728 16:56:04 mysqld_safe Starting mysqld daemon with databases from /data/3306/data 2015-07-28 16:56:05 0 [Note] /opt/app/mysql/bin/mysqld (mysqld 5.6.24-log) starting as process 17256 ... 2015-07-28 16:56:05 17256 [Note] Plugin 'FEDERATED' is disabled. 2015-07-28 16:56:05 7fa0048e5740 InnoDB: Warning: Using innodb_additional_mem_pool_size is DEPRECATED. This option may be removed in future releases, together with the option innodb_use_sys_malloc and with the InnoDB's internal memory allocator. 2015-07-28 16:56:05 17256 [Note] InnoDB: Using atomics to ref count buffer pool pages 2015-07-28 16:56:05 17256 [Note] InnoDB: The InnoDB memory heap is disabled 2015-07-28 16:56:05 17256 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2015-07-28 16:56:05 17256 [Note] InnoDB: Memory barrier is not used 2015-07-28 16:56:05 17256 [Note] InnoDB: Compressed tables use zlib 1.2.3 2015-07-28 16:56:05 17256 [Note] InnoDB: Using Linux native AIO 2015-07-28 16:56:05 17256 [Note] InnoDB: Using CPU crc32 instructions 2015-07-28 16:56:05 17256 [Note] InnoDB: Initializing buffer pool, size = 16.0G 2015-07-28 16:56:06 17256 [Note] InnoDB: Completed initialization of buffer pool 2015-07-28 16:56:06 17256 [Note] InnoDB: Highest supported file format is Barracuda. 2015-07-28 16:56:06 17256 [Note] InnoDB: 128 rollback segment(s) are active. 2015-07-28 16:56:06 17256 [Note] InnoDB: Waiting for purge to start 2015-07-28 16:56:07 17256 [Note] InnoDB: 5.6.24 started; log sequence number 26564145028 2015-07-28 16:56:07 17256 [Note] Server hostname (bind-address): '*'; port: 3306 2015-07-28 16:56:07 17256 [Note] IPv6 is available. 2015-07-28 16:56:07 17256 [Note] - '::' resolves to '::'; 2015-07-28 16:56:07 17256 [Note] Server socket created on IP: '::'. 2015-07-28 16:56:07 17256 [Warning] Recovery from master pos 155925988 and file mysql-bin.000025. 2015-07-28 16:56:07 17256 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2015-07-28 16:56:07 17256 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000025' at position 155925988, relay log '/data/3306/logs/relay-bin.000058' position: 4 2015-07-28 16:56:07 17256 [Note] Slave I/O thread: connected to master 'slave@172.16.117.247:3306',replication started in log 'mysql-bin.000025' at position 155925988 2015-07-28 16:56:07 17256 [Note] Event Scheduler: Loaded 0 events 2015-07-28 16:56:07 17256 [Note] /opt/app/mysql/bin/mysqld: ready for connections. Version: '5.6.24-log' socket: '/data/3306/tmp/mysql.sock' port: 3306 MySQL Community Server (GPL) [root@node-207 ~]# cat /proc/meminfo | grep ^HugePages HugePages_Total: 9300 HugePages_Free: 9067 HugePages_Rsvd: 8178 HugePages_Surp: 0 Hugepagesize: 2048 kB [root@node-207 ~]# 因為大頁內存是獨占的,你給多了,也是浪費,那么根據計算公式設置合理的大頁大小。 然后根據公式在計算了一下 innodb_buffer_pool_size = 16384M innodb_additional_mem_pool_size = 16M innodb_log_buffer_size = 32M tmp_table_size=512M max_heap_table_size=512M (16384+16+32+512)=16944/2=8472,因為大頁內存要比這個大,所以設置了8476,多了四個,因為這是獨占的,設置多的也是不能使用的,一般多設置(2個大頁以上,5個大頁以下) 注意這邊指的臨時表是max_heap_table_size這個參數值大小,是說允許創建內存引擎的臨時表大小, 下面我們來看看啟動日志是不是正常 150728 17:14:23 mysqld_safe Starting mysqld daemon with databases from /data/3306/data 2015-07-28 17:14:23 0 [Note] /opt/app/mysql/bin/mysqld (mysqld 5.6.24-log) starting as process 18569 ... 2015-07-28 17:14:23 18569 [Note] Plugin 'FEDERATED' is disabled. 2015-07-28 17:14:23 7fee7b559740 InnoDB: Warning: Using innodb_additional_mem_pool_size is DEPRECATED. This option may be removed in future releases, together with the option innodb_use_sys_malloc and with the InnoDB's internal memory allocator. 2015-07-28 17:14:23 18569 [Note] InnoDB: Using atomics to ref count buffer pool pages 2015-07-28 17:14:23 18569 [Note] InnoDB: The InnoDB memory heap is disabled 2015-07-28 17:14:23 18569 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2015-07-28 17:14:23 18569 [Note] InnoDB: Memory barrier is not used 2015-07-28 17:14:23 18569 [Note] InnoDB: Compressed tables use zlib 1.2.3 2015-07-28 17:14:23 18569 [Note] InnoDB: Using Linux native AIO 2015-07-28 17:14:23 18569 [Note] InnoDB: Using CPU crc32 instructions 2015-07-28 17:14:23 18569 [Note] InnoDB: Initializing buffer pool, size = 16.0G 2015-07-28 17:14:24 18569 [Note] InnoDB: Completed initialization of buffer pool 2015-07-28 17:14:24 18569 [Note] InnoDB: Highest supported file format is Barracuda. 2015-07-28 17:14:25 18569 [Note] InnoDB: 128 rollback segment(s) are active. 2015-07-28 17:14:25 18569 [Note] InnoDB: Waiting for purge to start 2015-07-28 17:14:25 18569 [Note] InnoDB: 5.6.24 started; log sequence number 26585446708 2015-07-28 17:14:25 18569 [Note] Server hostname (bind-address): '*'; port: 3306 2015-07-28 17:14:25 18569 [Note] IPv6 is available. 2015-07-28 17:14:25 18569 [Note] - '::' resolves to '::'; 2015-07-28 17:14:25 18569 [Note] Server socket created on IP: '::'. 2015-07-28 17:14:25 18569 [Warning] Recovery from master pos 166617263 and file mysql-bin.000025. 2015-07-28 17:14:25 18569 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2015-07-28 17:14:25 18569 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000025' at position 166617263, relay log '/data/3306/logs/relay-bin.000060' position: 4 2015-07-28 17:14:25 18569 [Note] Slave I/O thread: connected to master 'slave@172.16.117.247:3306',replication started in log 'mysql-bin.000025' at position 166617263 2015-07-28 17:14:25 18569 [Note] Event Scheduler: Loaded 0 events 2015-07-28 17:14:25 18569 [Note] /opt/app/mysql/bin/mysqld: ready for connections. Version: '5.6.24-log' socket: '/data/3306/tmp/mysql.sock' port: 3306 MySQL Community Server (GPL) OK,非常好 那么我們來看看大頁內存使用了多少, [root@node-207 ~]# cat /proc/meminfo | grep ^HugePages HugePages_Total: 8476 HugePages_Free: 8202 HugePages_Rsvd: 8137 HugePages_Surp: 0 Hugepagesize: 2048 kB [root@node-207 ~]# 才使用了一點點 HugePages_Total: 8476 HugePages_Free: 8202 HugePages_Rsvd: 8137
Hugepagesize: 2048 kB
那么我們來個大表count(主鍵) 再來看看 [root@node-207 ~]# cat /proc/meminfo | grep ^HugePages HugePages_Total: 8476 HugePages_Free: 8123 HugePages_Rsvd: 8058 HugePages_Surp: 0 Hugepagesize: 2048 kB [root@node-207 ~]# cat /proc/meminfo | grep ^HugePages HugePages_Total: 8476 HugePages_Free: 7233 HugePages_Rsvd: 7201 HugePages_Surp: 0
Hugepagesize: 2048 kB ####看到木有,有在使用大頁了 HugePages_Free: 8123 #### HugePages_Free: 7233 在看看innodb情況,算起來是用了那么多內存,到此大頁內存是配置好了 ---BUFFER POOL 7 Buffer pool size 131072 Free buffers 113960 Database pages 17102 Old database pages 8571 Modified db pages 1164 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages made young 55, not young 0 0.13 youngs/s, 0.00 non-youngs/s Pages read 17080, created 22, written 1056 0.80 reads/s, 0.00 creates/s, 3.33 writes/s Buffer pool hit rate 974 / 1000, young-making rate 4 / 1000 not 0 / 1000 Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s LRU len: 17102, unzip_LRU len: 0 I/O sum[0]:cur[4], unzip sum[0]:cur[0] -------------- ROW OPERATIONS -------------- 0 queries inside InnoDB, 0 queries in queue 0 read views open inside InnoDB Main thread process no. 18569, id 140659839203072, state: sleeping Number of rows inserted 12222, updated 10955, deleted 184, read 1819484 10.53 inserts/s, 7.67 updates/s, 0.40 deletes/s, 8.07 reads/s ---------------------------- END OF INNODB MONITOR OUTPUT ============================ 1 row in set (0.00 sec) 參考資料: https://dev.mysql.com/doc/refman/5.0/en/large-page-support.html I hope this comment will save severals hours and white nights on production launching... After folowing every How-to and all's documentation over Google, to enable huge pages... i must give you this post. For enabling huge pages with Linux Debian 6.0.5 on Linux 2.6.32-5-amd64 #x86_64 GNU/Linux (64Bits) and MySQL 5.1, you got to add this your /etc/sysctl.conf : # Total of allowed memory vm.nr_hugepages = YYYYYY # total amount of memory that can be allocated to shared memory, huge pages or not, on the box kernel.shmall = XXXXXXXXXX # maximum single shared memory segment, which for me was basically innodb_buffer_pool+1% kernel.shmmax = XXXXXXXXXX # Groupe autorisé vm.hugetlb_shm_group = `id -g mysql` XXXXX is given by this script shell in bash : ##### SCRIPT START ######### #!/bin/bash # keep 2go memory for system # (i got 68Go on this one ans 128Go RAM on other one) keep_for_system=2097152 mem=$(free|grep Mem|awk '{print$2}') mem=$(echo "$mem-$marge"|bc) totmem=$(echo "$mem*1024"|bc) huge=$(grep Hugepagesize /proc/meminfo|awk '{print $2}') max=$(echo "$totmem*75/100"|bc) all=$(echo "$max/$huge"|bc) echo "kernel.shmmax = $max" echo "kernel.shmall = $all" ######### SCRIPT END ######### check memory usage before reboot by command : cat /proc/meminfo | grep -i huge Reboot your system. and check memory usage again. It works ! ;-) Posted by John Anderson on May 13 2015 11:09am [Delete] [Edit] A bit of a note on the math here, some articles and blogs say that you should add your innodb_buffer_pool size to your innodb_additional_mem_pool_size, and divide that by your hugetlb page size. Then add a few on to that. Unfortunately, that doesn't seem to be the whole story. For those who want to allocate as little RAM as possible to HugeTLB while still satisfying the requirements outlined in my.cnf, this formula might be a little better. This is after some experimentation led me to put some effort behind finding out why I always had to allocate many more pages than the math suggested. The real formula should be: (innodb_buffer_pool_size in kb + innodb_additional_mem_pool_size in kb + tmp_table_size in kb + innodb_log_buffer_size in kb) / hugetlb size in kb Then to that, add an additional 11 - 15 pages until MySQL starts. I give my best guess as to why these pages are unaccounted for below. First, a note on why tmp_table_size is included: I'm not sure if it *should* be tmp_table_size * max_tmp_tables, but MySQL starts and runs with only tmp_table_size included. I think this only applies if default_tmp_storage_engine is InnoDB. If a tmp table needs to be created for a sort or order, and that table is going to be InnoDB in RAM, then hugetlb will need to be used. Secondly, I noticed in the source code that the InnoDB buffer log uses the 'os_mem_alloc_large' function. So I think that should be included in the calculation as well. In my experimentation, I had 22 pages unaccounted for until I found that, then my unaccounted for pages went down to 11. As for the pages which don't seem to be accounted for, I think that is the overhead cost of the nature of pages. For instance, if you have an innodb_buffer_pool size of 256 MB, and you have 8 buffer instances then you have: (268435456 bytes / 8 instances ) = 33554.4 kilobtes to allocate per page. At 2048 KB per page, that comes to 16.4 pages per buffer. That .4 of a page means an entire page must be allocated, or 17 pages per buffer instead of 16.4. That would account for 8 pages right there. So if one is really picky, declaring buffer sizes that meet the page size exactly would theoretically leave no overhead to absorb. I don't know why but MySQL and google convert have differing opinions on how to convert megabytes to bytes, and vice versa. So if you want to cut it as close as possible, fill out your my.cnf. Start mysql without large-pages, and take note of the values of these 4 variables. Then convert those values into kilobytes for the page count calculation.