前言
在開始寫具體的邏輯之前,還需要把zk的數據管理和事務的日志與保存了解得更深入一些。這部分內容不少,下面幾篇都會是相關的內容。
內存數據
zk的數據結構模型是基於ZNode的樹狀模型。在ZK內部通過類似內存數據庫的方式保存了整棵樹的內容,並定時寫入磁盤。
zk的內存數據放在DataTree中,它是zk內存數據存儲的核心,也是一個樹形結構。
/**
* This class maintains the tree data structure. It doesn't have any networking
* or client connection code in it so that it can be tested in a stand alone
* way.
* <p>
* The tree maintains two parallel data structures: a hashtable that maps from
* full paths to DataNodes and a tree of DataNodes. All accesses to a path is
* through the hashtable. The tree is traversed only when serializing to disk.
*/
public class DataTree {
private static final Logger LOG = LoggerFactory.getLogger(DataTree.class);
/**
* This hashtable provides a fast lookup to the datanodes. The tree is the
* source of truth and is where all the locking occurs
*/
private final ConcurrentHashMap<String, DataNode> nodes =
new ConcurrentHashMap<String, DataNode>();
private final WatchManager dataWatches = new WatchManager();
private final WatchManager childWatches = new WatchManager();
/** the root of zookeeper tree */
private static final String rootZookeeper = "/";
/** the zookeeper nodes that acts as the management and status node **/
private static final String procZookeeper = Quotas.procZookeeper;
/** this will be the string thats stored as a child of root */
private static final String procChildZookeeper = procZookeeper.substring(1);
/**
* the zookeeper quota node that acts as the quota management node for
* zookeeper
*/
private static final String quotaZookeeper = Quotas.quotaZookeeper;
/** this will be the string thats stored as a child of /zookeeper */
private static final String quotaChildZookeeper = quotaZookeeper
.substring(procZookeeper.length() + 1);
/**
* the path trie that keeps track fo the quota nodes in this datatree
*/
private final PathTrie pTrie = new PathTrie();
/**
* This hashtable lists the paths of the ephemeral nodes of a session.
*/
private final Map<Long, HashSet<String>> ephemerals =
new ConcurrentHashMap<Long, HashSet<String>>();
private final ReferenceCountedACLCache aclCache = new ReferenceCountedACLCache();
...
可以看到DataTree主要和四個類有關聯關系,即DataNode,Quotas,PathTrie,StatsTrack。接下來會逐個說一下。
DataNode
DataNode類是zookeeper中數據存儲的最小單元。在DataTree中,private final ConcurrentHashMap<String, DataNode> nodes = new ConcurrentHashMap<String, DataNode>();
所有的datanode存在一個concurrentHashMap中,對zk中所有的znode進行操作,其實底層就是對這個map進行操作。其中path是key,datanode是value。
特別的是,對於所有的臨時節點,private final Map<Long, HashSet<String>> ephemerals = new ConcurrentHashMap<Long, HashSet<String>>();
專門有一個map去存它們,便於實時的訪問和session結束后的集中清理。
其中,DataNode類的代碼:
public class DataNode implements Record {
/** the parent of this datanode */
DataNode parent;
/** the data for this datanode */
byte data[];
/**
* the acl map long for this datanode. the datatree has the map
*/
Long acl;
/**
* the stat for this node that is persisted to disk.
*/
public StatPersisted stat;
/**
* the list of children for this node. note that the list of children string
* does not contain the parent path -- just the last part of the path. This
* should be synchronized on except deserializing (for speed up issues).
*/
private Set<String> children = null;
可以看到,DataNode中存儲的信息共有三類,數據內容data[],acl列表和節點狀態stat。其中數據內容和節點狀態就是在客戶端上getdata獲取到的那些數據。同時,DataNode中還記錄了節點的父節點和子節點列表,並提供了對子節點列表的操作。
加孩子:
/**
* Method that inserts a child into the children set
*
* @param child
* to be inserted
* @return true if this set did not already contain the specified element
*/
public synchronized boolean addChild(String child) {
if (children == null) {
// let's be conservative on the typical number of children
children = new HashSet<String>(8);//初始化
}
return children.add(child);//加入set中
}
刪孩子:
/**
* Method that removes a child from the children set
*
* @param child
* @return true if this set contained the specified element
*/
public synchronized boolean removeChild(String child) {
if (children == null) {
return false;
}
return children.remove(child);//把孩子從set中移除
}
get/set:
/**
* convenience method for setting the children for this datanode
*
* @param children
*/
public synchronized void setChildren(HashSet<String> children) {
this.children = children;
}
/**
* convenience methods to get the children
*
* @return the children of this datanode
*/
public synchronized Set<String> getChildren() {//get/set中都加同步,避免了多線程請求時對共享變量形成競態條件
if (children == null) {
return EMPTY_SET;
}
return Collections.unmodifiableSet(children);
}
都是很簡單的方法,配合注釋應該很容易看懂。
Quotas
在看后面的內容前,強烈建議看一看zk權限管理與配額。Quotas其實就是為ZNode設置的節點個數和數據量大小的限制(只是在日志中會提醒,並不是真正限制)。
public class Quotas {
/** the zookeeper nodes that acts as the management and status node **/
public static final String procZookeeper = "/zookeeper";
/** the zookeeper quota node that acts as the quota
* management node for zookeeper */
public static final String quotaZookeeper = "/zookeeper/quota";
/**
* the limit node that has the limit of
* a subtree
*/
public static final String limitNode = "zookeeper_limits";
/**
* the stat node that monitors the limit of
* a subtree.
*/
public static final String statNode = "zookeeper_stats";
limitnode和statnode的區別:一個是在set quota是的限制,一個是真實的情況。這個會在后面說PathTrie的時候說下。這里說明一點,所有成功設立了quota的節點都會在/zookeeper/quota下建立一個樹形的數據結構,並且每個節點都會有兩個孩子接點,即path+"zookeeper_limits"和path+"zookeeper_stats"。分別對應上面的limitnode和statnode。特別的是,前面這句話中成功設立是有條件的,如果發現有父節點或者兄弟孩子節點有quota,那么設置quota會失敗。
public static String quotaPath(String path) {
return quotaZookeeper + path +
"/" + limitNode;//limitnode
}
public static String statPath(String path) {
return quotaZookeeper + path + "/" +
statNode;//statnode
}
上面兩個方法負責statnode和limitnode的路徑生成。
PathTrie
關於字典樹的簡介,可以看一下 淺談字典樹。我自己簡單理解了一下,大概就是如果單詞有公共字串(從第一個字母開始的),那么這部分公用,剩下的再建立新的接點。
public class PathTrie {
/**
* the logger for this class
*/
private static final Logger LOG = LoggerFactory.getLogger(PathTrie.class);
/**
* the root node of PathTrie
*/
private final TrieNode rootNode ;
static class TrieNode {
boolean property = false;//表示當前節點是否有配額
final HashMap<String, TrieNode> children;
TrieNode parent = null;
結構很簡單,就是典型的樹結構,其中靜態內部類TrieNode是節點。
前面說到的一點,果發現有父節點或者兄弟孩子節點有quota,那么設置quota會失敗。為什么會這樣其實是在PathTrie里控制的,而且這點之前看了很多博客都沒提到,一定要注意。
可以從上面三張圖片里看到,一旦給一個節點加了quota之后,給它的父節點和子節點加quota都會失敗。
原因:
public void addPath(String path) {
if (path == null) {
return;
}
String[] pathComponents = path.split("/");//把路徑按照/分開
TrieNode parent = rootNode;
String part = null;
if (pathComponents.length <= 1) {
throw new IllegalArgumentException("Invalid path " + path);
}
for (int i=1; i<pathComponents.length; i++) {//一層一層查
part = pathComponents[i];
if (parent.getChild(part) == null) {
parent.addChild(part, new TrieNode(parent));////找到位置,插入
}
parent = parent.getChild(part);
}
parent.setProperty(true);
}
從這里看,確實是按照字典樹的規則插入的,但是在zk接受客戶端命令的位置在ZookeeperMain中processCMD方法中:
if (cmd.equals("setquota") && args.length >= 4) {
String option = args[1];
String val = args[2];
path = args[3];
System.err.println("Comment: the parts are " +
"option " + option +
" val " + val +
" path " + path);
if ("-b".equals(option)) {
// we are setting the bytes quota
createQuota(zk, path, Long.parseLong(val), -1);//發送setquota命令后真正添加節點的
} else if ("-n".equals(option)) {
// we are setting the num quota
createQuota(zk, path, -1L, Integer.parseInt(val));
} else {
usage();
}
}
這里可以看到setquota中有一個createQuota方法,其中:
/ check for more than 2 children --
// if zookeeper_stats and zookeeper_qutoas
// are not the children then this path
// is an ancestor of some path that
// already has quota
String realPath = Quotas.quotaZookeeper + path;
//檢查孩子節點中是否已經有quota
try {
List<String> children = zk.getChildren(realPath, false);
for (String child: children) {
if (!child.startsWith("zookeeper_")) {
throw new IllegalArgumentException(path + " has child " +
child + " which has a quota");
}
}
} catch(KeeperException.NoNodeException ne) {
// this is fine
}
//check for any parent that has been quota
//檢查父節點中是否有quota,可以點進去看,邏輯和判斷孩子的差不多的。
checkIfParentQuota(zk, path);
在這里判斷了一下后,這也就導致了我先前描述的那種情況,為什么在父節點和子節點有quota時無法添加的quota。特別重要的是,這個判斷在客戶端就完成了判斷。如果父節點和子節點沒有quota,客戶端會發送請求到服務端創建節點(代碼在ZookeeperMain中),如下所示:
!!!這里我之前說錯了!!!!並不是僅僅依靠客戶端就完成了判斷,需要與服務器通信的。List
if (zk.exists(quotaPath, false) == null) {
try {
//在create內部會將請求發送到服務器端。
zk.create(Quotas.procZookeeper, null, Ids.OPEN_ACL_UNSAFE,
CreateMode.PERSISTENT);
zk.create(Quotas.quotaZookeeper, null, Ids.OPEN_ACL_UNSAFE,
CreateMode.PERSISTENT);
} catch(KeeperException.NodeExistsException ne) {
// do nothing
}
}
如果想了解字典樹的添加和刪除,可以看一下 Zk數據模型-配額。
StatsTrack
StatsTrack其實就是記錄某個接點實際的count和bytes信息。
/**
* a class that represents the stats associated with quotas
*/
public class StatsTrack {
private int count;
private long bytes;
private String countStr = "count";
private String byteStr = "bytes";
StatsTrack其實就是個實體類,存的就是字典樹中statnode節點的數據。下面的代碼就是生成statnode時的代碼,可以看到,是把statstrack轉化為字符串后放入statnode中作為它的內容。
StatsTrack strack = new StatsTrack(null);
strack.setBytes(bytes);
strack.setCount(numNodes);
try {
zk.create(quotaPath, strack.toString().getBytes(),
Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
StatsTrack stats = new StatsTrack(null);
stats.setBytes(0L);
stats.setCount(0);
zk.create(statPath, stats.toString().getBytes(),
Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
}
思考:
有時間可以再仔細研究研究quotas的原理。
不清楚的幾點:
為什么父節點、子節點有quota就不讓添加了呢???為什么要這樣設計。