ES內存持續上升問題定位

當然，還可能是因為ES lucene自身內存泄漏bug導致。 Your bug description reminds me of https://issues.apache.org/jira/browse/LUCENE-7657 but this bug is expected to be fixed in 5.5.2.

https://discuss.elastic.co/t/es-vs-lucene-memory/20959

I've read the recommendations for ES_HEAP_SIZE
which
basically state to set -Xms and -Xmx to 50% physical RAM.
It says the rest should be left for Lucene to use (OS filesystem caching).
But I'm confused on how Lucene uses that. Doesn't Lucene run in the same
JVM as ES? So they would share the same max heap setting of 50%.

nik9000 Nik Everett Elastic Team Member
Nov '14
Lucene runs in the same JVM as Elasticsearch but (by default) it mmaps
files and then iterates over their content inteligently. That means most
of its actual storage is "off heap" (its a java buzz-phrase). Anyway,
Linux will serve reads from mmaped files from its page cache. That is why
you want to leave linux a whole bunch of unused memory.

Nik
就是官方人員建議預留一些空閑的內存給ES(lucene)的底層文件系統用於File cache。

pountz Adrien Grand Elastic Team Member
Nov '14
Indeed the behaviour is the same on Windows and Linux: memory that is not
used by processes is used by the operating system in order to cache the
hottest parts of the file system. The reason why the docs say that the rest
should be left to Lucene is that most disk accesses that elasticsearch
performs are done through Lucene.

I used procexp and VMMap to double check, ya, i think they are file system
cache.
Is there anyway to control the size of file system cache? Coz now it's
easily driving up OS memory consumption. When it's reaching 100%, the node
would fail to respond...
他們也遇到ES機器的內存（java heap+文件系統）使用達到了100%。

不過該帖子沒有給出解決方案。

類似ES內存問題在：https://discuss.elastic.co/t/jvm-memory-increase-1-time-more-xmx16g-while-elasticsearch-heap-is-stable/55413/4
其操作是每天都定期查詢。

Elasticsearch uses not only heap but also out-of-heap memory buffers because of Lucene.

I just read the Lucene blog post and I already know that Lucene/ES start to use the file system cache (with MMapDirectory).
That why in my graph memory you can see: Free (in green) + Used memory (in red) + cached memory (the FS cache in blue).

https://discuss.elastic.co/t/memory-usage-of-the-machine-with-es-is-continuously-increasing/23537/2

提到：ES內存每天都上升200MB，而重啟ES則一切又正常了。
Note
that when I restart the ES it gets cleared(most of it, may be OS clears up
this cache once it sees that the parent process has been stopped).

When the underlying lucene engine interacts with a segment the OS will
leverage free system RAM and keep that segment in memory. However
Elasticsearch/lucene has no way to control of OS level caches.
這個是操作系統的cache，ES本身無法控制。

記一次內存使用率過高的報警

轉自：http://farll.com/2016/10/high-memory-usage-alarm/

Linux Centos服務器內存使用率過高的報警, 最后得出結論是因為 nss-softokn的bug導致curl 大量請求后, dentry 緩存飆升.

問題的開始是收到雲平台發過來的內存使用率平均值超過報警值的短信, 登錄雲監控后台查看發現從前兩天開始內存使用曲線緩慢地呈非常有規律上升趨勢.

內存使用率

用top命令, 然后使用M按內存使用大小排序發現並沒有特別消耗內存的進程在跑, 用H查詢線程情況也正常. 最高的mysql占用6.9%內存. memcached進程telnet后stats並未發現有內存消耗過大等異常情況.

top M

free -m 查看 -/+ buffers/cache used很高, 可用的free內存只剩下百分之十幾, 那么內存消耗究其在哪里去了?

使用cat /proc/meminfo 查看內存分配更詳細的信息: Slab 和 SReclaimable 達幾個G, Slab內存分配管理, SReclaimable看字面意思是可回收內存.

MemTotal: 8058056 kB
MemFree: 3892464 kB
Buffers: 192016 kB
Cached: 873860 kB
SwapCached: 0 kB
Active: 1141088 kB
Inactive: 690580 kB
Active(anon): 765260 kB
Inactive(anon): 22220 kB
Active(file): 375828 kB
Inactive(file): 668360 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 24 kB
Writeback: 0 kB
AnonPages: 765784 kB
Mapped: 58648 kB
Shmem: 21696 kB
Slab: 2261236 kB
SReclaimable: 2236844 kB
SUnreclaim: 24392 kB
KernelStack: 1448 kB
PageTables: 8404 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 4029028 kB
Committed_AS: 1500552 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 25952 kB
VmallocChunk: 34359710076 kB
HardwareCorrupted: 0 kB
AnonHugePages: 673792 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 6144 kB
DirectMap2M: 8382464 kB

使用slabtop 命令按C按分配slab的size大小排列, dentry (directory entry cache)排最前面高其他很多, 其次是inode_cache 文件緩存. 一個Laravel隊列引發的報警這篇文章提到laravel隊列會生成和讀寫操作大量的小文件導致dentry cache 飆升, 超大目錄(一個目錄里含有上百萬數量的文件)也有可能是原因之一.

slabtop C

因為最近並沒有對服務器配置和程序代碼進行變更, 所以第一反應是想是不是AliyunUpdate自動更新程序更改了什么. 但是 strace 依次監控Ali系列的進程並沒有發現有大量的文件操作. crond也發現會讀寫少量的session等臨時文件, 但數量少並不至於到這個程度.

ps aux | grep "Ali"
strace -fp {pid} -e trace=open,stat,close,unlink

官方的工單回復是可以考慮升級服務器內存，如果內存不足影響業務，需要臨時釋放一下slab占用的內存，參考以下步驟：

回收dentry cache和inode cache占用的內存
#echo 2 > /proc/sys/vm/drop_caches
等內存回收完畢后再恢復:
#echo 0 > /proc/sys/vm/drop_caches

來自linux內核文檔:
To free pagecache:
	echo 1 > /proc/sys/vm/drop_caches
To free reclaimable slab objects (includes dentries and inodes):
	echo 2 > /proc/sys/vm/drop_caches
To free slab objects and pagecache:
	echo 3 > /proc/sys/vm/drop_caches

只給了個臨時解決方案, 總不能還要開個自動任務定時執行回收dentry cache的任務吧. 不過我們逐漸接近了事情的本質, 即目前看來是過高的slab內存無法回收導致內存不足. 那么我們可以提高slab內存釋放的優先級, Linux 提供了 vfs_cache_pressure 這個參數, 默認為100, 設置為高於100的數, 數值越大slab回收優先級越高(root 身份運行):

echo 10000 > /proc/sys/vm/vfs_cache_pressure

注意有文章提到有時vfs_cache_pressure的設置並不會在系統中馬上體現出來, dentry 和inode cache會有一個到高峰后突然下降,然后逐漸正常波動的過程. 因此需要運行個24小時再來下效果定論.

dentry cache line

並不建議通過如上Laravel隊列文章最后提到的使用min_free_kbytes 和 extra_free_kbytes參數來實現回收slab緩存的, 內核文檔中比較明確地指明了vfs_cache_pressure 適合於控制directory 和inode 緩存的回收:

vfs_cache_pressure
------------------

This percentage value controls the tendency of the kernel to reclaim
the memory which is used for caching of directory and inode objects.

At the default value of vfs_cache_pressure=100 the kernel will attempt to
reclaim dentries and inodes at a "fair" rate with respect to pagecache and
swapcache reclaim.  Decreasing vfs_cache_pressure causes the kernel to prefer
to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
never reclaim dentries and inodes due to memory pressure and this can easily
lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
causes the kernel to prefer to reclaim dentries and inodes.

Increasing vfs_cache_pressure significantly beyond 100 may have negative
performance impact. Reclaim code needs to take various locks to find freeable
directory and inode objects. With vfs_cache_pressure=1000, it will look for
ten times more freeable objects than there are.

雖然slab占用過多內存得到了有效控制和實時回收, 但實際上我們還是沒有找到問題的本質, 按理來說默認的vfs_cache_pressure值應該是能夠較好地實現可用內存和slab緩存之間的平衡的.

最后去查該時間節點的網站服務日志才發現, 內存使用率一直升的原因在於從這個時間開始, 有系統crontab任務一直在不斷地通過curl請求外部資源api接口, 而他這個api基於https. 罪魁禍首就在於Libcurl附帶的Mozilla網絡安全服務庫NSS(Network Security Services) 的bug, 只有curl請求ssl即https的資源才會引入NSS.

其內部機制是: NSS為了檢測訪問的臨時目錄是本地的還是網絡資源, 它會訪問數百個不存在的文件並統計所需要的時間, 在這過程就會為這些不存在的文件生成大量dentry cache. 當curl請求產生的dentry cache超過系統的內存回收能力時, 內存使用率自然會逐步攀升. 有篇外文blog有比較詳細的介紹, 以及. NSS從后面的版本開始解決了這個bug: NSS now avoids calls to sdb_measureAccess in lib/softoken/sdb.c s_open if NSS_SDB_USE_CACHE is “yes”. 所以我們得到的最終解決辦法是:

第一步: 確保nss-softokn是已經解決了bug的版本

查看是否大於等於3.16.0版本(根據上面的bug修復鏈接, 大於nss-softokn-3.14.3-12.el6 也可以):
yum list nss-softokn
若版本太低需要升級:
sudo yum update -y nss-softokn

Resolving Dependencies
 --> Running transaction check
 ---> Package nss-softokn.x86_64 0:3.14.3-10.el6_5 will be updated
 ---> Package nss-softokn.x86_64 0:3.14.3-23.3.el6_8 will be an update
 --> Processing Dependency: nss-softokn-freebl(x86-64) >= 3.14.3-23.3.el6_8 for package: nss-softokn-3.14.3-23.3.el6_8.x86_64
 --> Processing Dependency: libnssutil3.so(NSSUTIL_3.17.1)(64bit) for package: nss-softokn-3.14.3-23.3.el6_8.x86_64
 --> Running transaction check
 ---> Package nss-softokn-freebl.x86_64 0:3.14.3-10.el6_5 will be updated
 ---> Package nss-softokn-freebl.x86_64 0:3.14.3-23.3.el6_8 will be an update
 ---> Package nss-util.x86_64 0:3.16.1-1.el6_5 will be updated
 ---> Package nss-util.x86_64 0:3.21.0-2.el6 will be an update
 --> Processing Dependency: nspr >= 4.11.0-1 for package: nss-util-3.21.0-2.el6.x86_64
 --> Running transaction check
 ---> Package nspr.x86_64 0:4.10.6-1.el6_5 will be updated
 ---> Package nspr.x86_64 0:4.11.0-1.el6 will be an update
 --> Finished Dependency Resolution

第二步: 設置變量NSS_SDB_USE_CACHE=yes

Apache:
echo "export NSS_SDB_USE_CACHE=yes" >> /etc/sysconfig/httpd
service httpd restart
PHP:
putenv('NSS_SDB_USE_CACHE=yes');     
//注php設置環境變量, 如果safe_mode開啟的話, 會受 safe_mode_allowed_env_vars 和 safe_mode_protected_env_vars 指令配置的影響.
More
Nginx:
fastcgi_param NSS_SDB_USE_CACHE yes;

第三步: 重啟Web服務器
重啟Apache 或重啟Nginx和php-fpm. 否則curl_error可能出現 Problem with the SSL CA cert (path? access rights?) 的錯誤.

設置后dentry cache終於慢慢恢復正常.

ES內存持續上升問題定位

記一次內存使用率過高的報警

免責聲明！