現象:zookeeper集群大量臨時節點沒有釋放掉,導致集群響應很慢
分析過程:
通過工具排查,發現大量創建lock對象的節點沒有釋放,奇怪的是release的時候應該刪除的呀!只能看源碼羅。
private static final String LOCK_NAME = "lock-";
internals = new LockInternals(client, driver, path, lockName, maxLeases);
this.path = ZKPaths.makePath(path, lockName);
public static String makePath(String parent, String child)
{
StringBuilder path = new StringBuilder();
joinPath(path, parent, child);
return path.toString();
}
發現了一段代碼邏輯:InterProcessMutex加鎖的時候,是創建'..../name/lock-'這樣的節點,創建了一個父節點,父節點下面只有一個子節點,release的時候只刪除了子節點'...../lock-','..../name'並沒有刪除,坑於此。明白了坑的地方,接下來就明了了:unlock的時候,順便刪除'..../name'這個節點。
思考為什么要這么設計? 先來看一下底層實現。發現InterProcessSemaphore也是這樣設計的,但是為什么要設置子節點還是沒有搞懂,直到我看到了InterProcessReadWriteLock的是現實,里面有兩個字段
private static final String READ_LOCK_NAME = "__READ__";
private static final String WRITE_LOCK_NAME = "__WRIT__";
再看到初始化讀寫鎖的時候
public InterProcessReadWriteLock(CuratorFramework client, String basePath, byte[] lockData)
{
lockData = (lockData == null) ? null : Arrays.copyOf(lockData, lockData.length);
writeMutex = new InternalInterProcessMutex
(
client,
basePath,
WRITE_LOCK_NAME,
lockData,
1,
new SortingLockInternalsDriver()
{
@Override
public PredicateResults getsTheLock(CuratorFramework client, List<String> children, String sequenceNodeName, int maxLeases) throws Exception
{
return super.getsTheLock(client, children, sequenceNodeName, maxLeases);
}
}
);
readMutex = new InternalInterProcessMutex
(
client,
basePath,
READ_LOCK_NAME,
lockData,
Integer.MAX_VALUE,
new SortingLockInternalsDriver()
{
@Override
public PredicateResults getsTheLock(CuratorFramework client, List<String> children, String sequenceNodeName, int maxLeases) throws Exception
{
return readLockPredicate(children, sequenceNodeName);
}
}
);
}
明白了,讀寫鎖需要在一個路徑下創建兩個節點。至此一切明了。
之前懷疑作者想做緩存,沒有刪除父節點,看了一下源碼?原來時curator兼容老版本的bug,因為為了支持'CONTAINER'這種類型,如果zookeeper版本低,本來應該是臨時節點的,被存儲為了持久化節點
String fixForNamespace(String path, boolean isSequential)
{
if ( ensurePathNeeded.get() )
{
try
{
final CuratorZookeeperClient zookeeperClient = client.getZookeeperClient();
RetryLoop.callWithRetry
(
zookeeperClient,
new Callable<Object>()
{
@Override
public Object call() throws Exception
{
ZKPaths.mkdirs(zookeeperClient.getZooKeeper(), ZKPaths.makePath("/", namespace), true, client.getAclProvider(), true);
return null;
}
}
);
ensurePathNeeded.set(false);
}
catch ( Exception e )
{
client.logError("Ensure path threw exception", e);
}
}
return ZKPaths.fixForNamespace(namespace, path, isSequential);
}
//這是創建父節點的入口
ZKPaths.mkdirs(zookeeperClient.getZooKeeper(), ZKPaths.makePath("/", namespace), true, client.getAclProvider(), true);
zookeeper.create(subPath, new byte[0], acl, getCreateMode(asContainers));
// asContainers = true 獲取的CONTAINER類型的節點
return asContainers ? getContainerCreateMode() : CreateMode.PERSISTENT;
//兼容版本有段代碼比較坑
try
{
localCreateMode = CreateMode.valueOf("CONTAINER");
}
catch ( IllegalArgumentException ignore )
{
localCreateMode = NON_CONTAINER_MODE;
log.warn("The version of ZooKeeper being used doesn't support Container nodes. CreateMode.PERSISTENT will be used instead.");
}
低版本的沒有這種類型,就處理成了
private static final CreateMode NON_CONTAINER_MODE = CreateMode.PERSISTENT;
這算是curator在處理版本兼容時的bug,踩了版本兼容的坑.