一、引言:
單個salt-master下的minion數已經達到2101個了,所以在master日志有如下的提示:
2016-09-09 11:36:22,221 [salt.utils.verify][CRITICAL][10919] The number of accepted minion keys(2101) should be lower than 1/4 of the max open files soft setting(4096). Please consider raising this value.
如果不能解決這個問題將無數加入新節點。從日志中可以看出max open files的值是4096,很奇怪!
通過ulimit -a看到open files是65535,從這里聯想到是不是salt得問題?
二、解決問題:
在度娘和G哥上一頓搜索,該github上saltstack有一個issues:
salt-master not recognizing max files increase #5323
在/usr/lib/python2.7/site-packages/salt/utils/verify.py腳本check_max_open_files的函數,具體如下:
def check_max_open_files(opts): ''' Check the number of max allowed open files and adjust if needed ''' mof_c = opts.get('max_open_files', 100000) if sys.platform.startswith('win'): # Check the Windows API for more detail on this # http://msdn.microsoft.com/en-us/library/xt874334(v=vs.71).aspx # and the python binding http://timgolden.me.uk/pywin32-docs/win32file.html mof_s = mof_h = win32file._getmaxstdio() else: mof_s, mof_h = resource.getrlimit(resource.RLIMIT_NOFILE) accepted_keys_dir = os.path.join(opts.get('pki_dir'), 'minions') accepted_count = len(os.listdir(accepted_keys_dir)) log.debug( 'This salt-master instance has accepted {0} minion keys.'.format( accepted_count ) ) level = logging.INFO if (accepted_count * 4) <= mof_s: # We check for the soft value of max open files here because that's the # value the user chose to raise to. # # The number of accepted keys multiplied by four(4) is lower than the # soft value, everything should be OK return msg = ( 'The number of accepted minion keys({0}) should be lower than 1/4 ' 'of the max open files soft setting({1}). '.format( accepted_count, mof_s ) ) with open("/tmp/openfile.txt","a") as f: f.write("mof_s-->%s\n"%mof_s) f.write("accepted_count-->%s\n"%accepted_count) if accepted_count >= mof_s: # This should never occur, it might have already crashed msg += 'salt-master will crash pretty soon! ' level = logging.CRITICAL elif (accepted_count * 2) >= mof_s: # This is way too low, CRITICAL level = logging.CRITICAL elif (accepted_count * 3) >= mof_s: level = logging.WARNING # The accepted count is more than 3 time, WARN elif (accepted_count * 4) >= mof_s: level = logging.INFO if mof_c < mof_h: msg += ('According to the system\'s hard limit, there\'s still a ' 'margin of {0} to raise the salt\'s max_open_files ' 'setting. ').format(mof_h - mof_c) msg += 'Please consider raising this value.' log.log(level=level, msg=msg)
通過resource.getrlimit(resource.RLIMIT_NOFILE)得到軟和硬的兩種打開最大文件數,單獨執行該方法:
>>> import resource >>> resource.getrlimit(resource.RLIMIT_NOFILE) (65535, 65535)
很奇怪,為什么單獨執行是65535,而salt執行出來的是4096。
線上有多個salt-master,正好操作系統的版本是不一樣的,經過檢查發現只有centos7 以上的才會出現這種情況,那就是系統的問題了。
在centos5/6等版本中,資源限制的配置可以在/etc/security/limits.conf設置,針對root/user等各個用戶或者*代表所有用戶來設置。當然,/etc/security/limits.d/中可以配置,系統是先加載limits.conf然后按照英文字母順序加載limits.d目錄下的配置文件,后加載配置覆蓋之前的配置。
不過在centos7/rhel7的系統中,使用Systemd替代了之前的SysV,因此/etc/security/limits.conf文件的配置作用域縮小了一些。limits.conf這里的配置,只適用於通過PAM認證登錄用戶的資源限制,它對systemd的service的資源限制不生效。登錄用戶的限制,與上面講的一樣,通過/etc/security/limits.conf和limits.d來配置即可。
對於systemd services的資源限制,如何配置呢?
全局的配置,放在文件/etc/systemd/system.conf和/etc/systemd/user.conf。同時,也會加載兩個對應的目錄中的所有.conf文件/etc/systemd/system.conf.d/*.conf和/etc/systemd/user.conf.d/*.conf。其中,system.conf是系統實例使用的,user.conf是用戶實例使用的。一般的service使用system.conf中的配置即可。system.conf.d/*.conf中的配置會覆蓋system.conf。
但是如果修改/etc/systemd/system.conf的話需要重啟系統才會生效。
針對單個service,可以直接設置它自己的:
然后運行如下命令,才能生效:
sudo systemctl daemon-reload
sudo systemctl restart salt-master.service