問題背景:
我們的項目要用Gateway實現對微服務的分流,就是控制流量打到一個微服務的不同實例的比例,所以在geteway里寫了很多調用Nacos的API的方法。
在部署新環境的時候,報了以下錯誤,我們的服務器使用的是k8s,鏡像都是統一的。
2021-11-23 16:53:54.568 ERROR [***-gateway,,,] 1 --- [ main] com.alibaba.nacos.client.naming : [NA] failed to write cache for dom:DEFAULT_GROUP@@***-**** java.lang.IllegalStateException: failed to create cache dir: /root/nacos/naming/753378b3-d4ad-4f1a-859b-f9d57df33c9f at com.alibaba.nacos.client.naming.cache.DiskCache.makeSureCacheDirExists(DiskCache.java:154) ~[nacos-client-1.1.4.jar:na] at com.alibaba.nacos.client.naming.cache.DiskCache.write(DiskCache.java:45) ~[nacos-client-1.1.4.jar:na] at com.alibaba.nacos.client.naming.core.HostReactor.processServiceJSON(HostReactor.java:184) [nacos-client-1.1.4.jar:na]
問題排查過程:
錯誤內容很明顯,就是要往服務器里寫入緩存文件,失敗了。通過錯誤提示,我們在 nacos-client-1.1.4.jar 里找到了報錯的類
package com.alibaba.nacos.client.naming.cache; public class DiskCache { public static void write(ServiceInfo dom, String dir) { try { makeSureCacheDirExists(dir); File file = new File(dir, dom.getKeyEncoded()); if (!file.exists()) { // add another !file.exists() to avoid conflicted creating-new-file from multi-instances if (!file.createNewFile() && !file.exists()) { throw new IllegalStateException("failed to create cache file"); } } StringBuilder keyContentBuffer = new StringBuilder(""); String json = dom.getJsonFromServer(); if (StringUtils.isEmpty(json)) { json = JSON.toJSONString(dom); } keyContentBuffer.append(json); //Use the concurrent API to ensure the consistency. ConcurrentDiskUtil.writeFileContent(file, keyContentBuffer.toString(), Charset.defaultCharset().toString()); } catch (Throwable e) { NAMING_LOGGER.error("[NA] failed to write cache for dom:" + dom.getName(), e); } } *******
private static File makeSureCacheDirExists(String dir) { File cacheDir = new File(dir); if (!cacheDir.exists() && !cacheDir.mkdirs()) { throw new IllegalStateException("failed to create cache dir: " + dir); } return cacheDir; } }
write方法調用了makeSureCacheDirExists,在makeSureCacheDirExists方法里,判斷緩存文件不存在,並且創建目錄失敗了,就會拋出異常。
我們通過調動關系,要找到誰調用了DiskCache的write方法,我找到了HostReactor,緩存地址cacheDir是通過構造方法傳進來的。
package com.alibaba.nacos.client.naming.core;
public class HostReactor {
public HostReactor(EventDispatcher eventDispatcher, NamingProxy serverProxy, String cacheDir, boolean loadCacheAtStart, int pollingThreadCount) {
......
}
}
再往前找,發現是 NacosNamingService 實例化的時候,調用了 HostReactor
package com.alibaba.nacos.client.naming; @SuppressWarnings("PMD.ServiceOrDaoClassShouldEndWithImplRule") public class NacosNamingService implements NamingService { private HostReactor hostReactor; public NacosNamingService(String serverList) { Properties properties = new Properties(); properties.setProperty(PropertyKeyConst.SERVER_ADDR, serverList); init(properties); } public NacosNamingService(Properties properties) { init(properties); } private void init(Properties properties) { namespace = InitUtils.initNamespaceForNaming(properties); initServerAddr(properties); InitUtils.initWebRootContext(); initCacheDir(); initLogName(properties); eventDispatcher = new EventDispatcher(); serverProxy = new NamingProxy(namespace, endpoint, serverList); serverProxy.setProperties(properties); beatReactor = new BeatReactor(serverProxy, initClientBeatThreadCount(properties)); hostReactor = new HostReactor(eventDispatcher, serverProxy, cacheDir, isLoadCacheAtStart(properties), initPollingThreadCount(properties)); }
private void initCacheDir() { cacheDir = System.getProperty("com.alibaba.nacos.naming.cache.dir"); if (StringUtils.isEmpty(cacheDir)) { cacheDir = System.getProperty("user.home") + "/nacos/naming/" + namespace; } } ...... }
NacosNamingService 的構造方法都調用了init方法,而init方法調用了initCacheDir()方法,給cacheDir變量賦值,最后完成了HostReactor 類的初始化。
當看到 initCacheDir 方法的內容后,大家應該就都明白了,指定Nacos緩存路徑有2種方式:
1. 在項目配置文件中指定,參數:com.alibaba.nacos.naming.cache.dir
2. 服務器的運行用戶的根目錄 + /nacos/naming/
解決方法:
1. 如果服務器上的只有root賬號,可以嘗試讓運維同學把 /root/nacos/naming/ 目錄的寫入權限放開
2. 一般情況root的目錄是禁止隨便寫入的,那可以更換服務器上的其他賬號,啟動應用程序,並開放 /user/nacos/naming/ 目錄的寫入權限
3. 在程序的yml文件中配置 com.alibaba.nacos.naming.cache.dir ,把緩存寫到一個開放的文件目錄。
