Zookeeper源碼閱讀（二）數據存儲

本文轉載自查看原文 2018-09-11 21:45 1028 Zookeeper

前言

在開始寫具體的邏輯之前，還需要把zk的數據管理和事務的日志與保存了解得更深入一些。這部分內容不少，下面幾篇都會是相關的內容。

內存數據

zk的數據結構模型是基於ZNode的樹狀模型。在ZK內部通過類似內存數據庫的方式保存了整棵樹的內容，並定時寫入磁盤。

zk的內存數據放在DataTree中，它是zk內存數據存儲的核心，也是一個樹形結構。

/**
 * This class maintains the tree data structure. It doesn't have any networking
 * or client connection code in it so that it can be tested in a stand alone
 * way.
 * <p>
 * The tree maintains two parallel data structures: a hashtable that maps from
 * full paths to DataNodes and a tree of DataNodes. All accesses to a path is
 * through the hashtable. The tree is traversed only when serializing to disk.
 */
public class DataTree {
    private static final Logger LOG = LoggerFactory.getLogger(DataTree.class);

    /**
     * This hashtable provides a fast lookup to the datanodes. The tree is the
     * source of truth and is where all the locking occurs
     */
    private final ConcurrentHashMap<String, DataNode> nodes =
        new ConcurrentHashMap<String, DataNode>();

    private final WatchManager dataWatches = new WatchManager();

    private final WatchManager childWatches = new WatchManager();

    /** the root of zookeeper tree */
    private static final String rootZookeeper = "/";

    /** the zookeeper nodes that acts as the management and status node **/
    private static final String procZookeeper = Quotas.procZookeeper;

    /** this will be the string thats stored as a child of root */
    private static final String procChildZookeeper = procZookeeper.substring(1);

    /**
     * the zookeeper quota node that acts as the quota management node for
     * zookeeper
     */
    private static final String quotaZookeeper = Quotas.quotaZookeeper;

    /** this will be the string thats stored as a child of /zookeeper */
    private static final String quotaChildZookeeper = quotaZookeeper
            .substring(procZookeeper.length() + 1);

    /**
     * the path trie that keeps track fo the quota nodes in this datatree
     */
    private final PathTrie pTrie = new PathTrie();

    /**
     * This hashtable lists the paths of the ephemeral nodes of a session.
     */
    private final Map<Long, HashSet<String>> ephemerals =
        new ConcurrentHashMap<Long, HashSet<String>>();

    private final ReferenceCountedACLCache aclCache = new ReferenceCountedACLCache();
    ...

可以看到DataTree主要和四個類有關聯關系，即DataNode，Quotas，PathTrie，StatsTrack。接下來會逐個說一下。

DataNode

DataNode類是zookeeper中數據存儲的最小單元。在DataTree中，private final ConcurrentHashMap<String, DataNode> nodes = new ConcurrentHashMap<String, DataNode>();所有的datanode存在一個concurrentHashMap中，對zk中所有的znode進行操作，其實底層就是對這個map進行操作。其中path是key，datanode是value。

特別的是，對於所有的臨時節點，private final Map<Long, HashSet<String>> ephemerals = new ConcurrentHashMap<Long, HashSet<String>>();專門有一個map去存它們，便於實時的訪問和session結束后的集中清理。

其中，DataNode類的代碼：

public class DataNode implements Record {
    /** the parent of this datanode */
    DataNode parent;

    /** the data for this datanode */
    byte data[];

    /**
     * the acl map long for this datanode. the datatree has the map
     */
    Long acl;

    /**
     * the stat for this node that is persisted to disk.
     */
    public StatPersisted stat;

    /**
     * the list of children for this node. note that the list of children string
     * does not contain the parent path -- just the last part of the path. This
     * should be synchronized on except deserializing (for speed up issues).
     */
    private Set<String> children = null;

可以看到，DataNode中存儲的信息共有三類，數據內容data[]，acl列表和節點狀態stat。其中數據內容和節點狀態就是在客戶端上getdata獲取到的那些數據。同時，DataNode中還記錄了節點的父節點和子節點列表，並提供了對子節點列表的操作。

加孩子：

/**
 * Method that inserts a child into the children set
 * 
 * @param child
 *            to be inserted
 * @return true if this set did not already contain the specified element
 */
public synchronized boolean addChild(String child) {
    if (children == null) {
        // let's be conservative on the typical number of children
        children = new HashSet<String>(8);//初始化
    }
    return children.add(child);//加入set中
}

刪孩子：

/**
 * Method that removes a child from the children set
 * 
 * @param child
 * @return true if this set contained the specified element
 */
public synchronized boolean removeChild(String child) {
    if (children == null) {
        return false;
    }
    return children.remove(child);//把孩子從set中移除
}

get/set:

/**
 * convenience method for setting the children for this datanode
 * 
 * @param children
 */
public synchronized void setChildren(HashSet<String> children) {
    this.children = children;
}

/**
 * convenience methods to get the children
 * 
 * @return the children of this datanode
 */
public synchronized Set<String> getChildren() {//get/set中都加同步，避免了多線程請求時對共享變量形成競態條件
    if (children == null) {
        return EMPTY_SET;
    }

    return Collections.unmodifiableSet(children);
}

都是很簡單的方法，配合注釋應該很容易看懂。

Quotas

在看后面的內容前，強烈建議看一看zk權限管理與配額。Quotas其實就是為ZNode設置的節點個數和數據量大小的限制（只是在日志中會提醒，並不是真正限制）。

public class Quotas {

    /** the zookeeper nodes that acts as the management and status node **/
    public static final String procZookeeper = "/zookeeper";

    /** the zookeeper quota node that acts as the quota
     * management node for zookeeper */
    public static final String quotaZookeeper = "/zookeeper/quota";

    /**
     * the limit node that has the limit of
     * a subtree
     */
    public static final String limitNode = "zookeeper_limits";

    /**
     * the stat node that monitors the limit of
     * a subtree.
     */
    public static final String statNode = "zookeeper_stats";

limitnode和statnode的區別：一個是在set quota是的限制，一個是真實的情況。這個會在后面說PathTrie的時候說下。這里說明一點，所有成功設立了quota的節點都會在/zookeeper/quota下建立一個樹形的數據結構，並且每個節點都會有兩個孩子接點，即path+"zookeeper_limits"和path+"zookeeper_stats"。分別對應上面的limitnode和statnode。特別的是，前面這句話中成功設立是有條件的，如果發現有父節點或者兄弟孩子節點有quota，那么設置quota會失敗。

public static String quotaPath(String path) {
    return quotaZookeeper + path +
    "/" + limitNode;//limitnode
}

public static String statPath(String path) {
    return quotaZookeeper + path + "/" +
    statNode;//statnode
}

上面兩個方法負責statnode和limitnode的路徑生成。

PathTrie

關於字典樹的簡介，可以看一下淺談字典樹。我自己簡單理解了一下，大概就是如果單詞有公共字串（從第一個字母開始的），那么這部分公用，剩下的再建立新的接點。

public class PathTrie {
    /**
     * the logger for this class
     */
    private static final Logger LOG = LoggerFactory.getLogger(PathTrie.class);
    
    /**
     * the root node of PathTrie
     */
    private final TrieNode rootNode ;
    
    static class TrieNode {
        boolean property = false;//表示當前節點是否有配額
        final HashMap<String, TrieNode> children;
        TrieNode parent = null;

結構很簡單，就是典型的樹結構，其中靜態內部類TrieNode是節點。

前面說到的一點，果發現有父節點或者兄弟孩子節點有quota，那么設置quota會失敗。為什么會這樣其實是在PathTrie里控制的，而且這點之前看了很多博客都沒提到，一定要注意。

可以從上面三張圖片里看到，一旦給一個節點加了quota之后，給它的父節點和子節點加quota都會失敗。

原因：

public void addPath(String path) {
    if (path == null) {
        return;
    }
    String[] pathComponents = path.split("/");//把路徑按照/分開
    TrieNode parent = rootNode;
    String part = null;
    if (pathComponents.length <= 1) {
        throw new IllegalArgumentException("Invalid path " + path);
    }
    for (int i=1; i<pathComponents.length; i++) {//一層一層查
        part = pathComponents[i];
        if (parent.getChild(part) == null) {
            parent.addChild(part, new TrieNode(parent));////找到位置，插入
        }
        parent = parent.getChild(part);
    }
    parent.setProperty(true);
}

從這里看，確實是按照字典樹的規則插入的，但是在zk接受客戶端命令的位置在ZookeeperMain中processCMD方法中：

if (cmd.equals("setquota") && args.length >= 4) {
    String option = args[1];
    String val = args[2];
    path = args[3];
    System.err.println("Comment: the parts are " +
                       "option " + option +
                       " val " + val +
                       " path " + path);
    if ("-b".equals(option)) {
        // we are setting the bytes quota
        createQuota(zk, path, Long.parseLong(val), -1);//發送setquota命令后真正添加節點的
    } else if ("-n".equals(option)) {
        // we are setting the num quota
        createQuota(zk, path, -1L, Integer.parseInt(val));
    } else {
        usage();
    }

}

這里可以看到setquota中有一個createQuota方法，其中:

/ check for more than 2 children --
// if zookeeper_stats and zookeeper_qutoas
// are not the children then this path
// is an ancestor of some path that
// already has quota
String realPath = Quotas.quotaZookeeper + path;
//檢查孩子節點中是否已經有quota
try {
    List<String> children = zk.getChildren(realPath, false);
    for (String child: children) {
        if (!child.startsWith("zookeeper_")) {
            throw new IllegalArgumentException(path + " has child " +
                    child + " which has a quota");
        }
    }
} catch(KeeperException.NoNodeException ne) {
    // this is fine
}

//check for any parent that has been quota
//檢查父節點中是否有quota，可以點進去看，邏輯和判斷孩子的差不多的。
checkIfParentQuota(zk, path);

在這里判斷了一下后，這也就導致了我先前描述的那種情況，為什么在父節點和子節點有quota時無法添加的quota。特別重要的是，這個判斷在客戶端就完成了判斷。如果父節點和子節點沒有quota，客戶端會發送請求到服務端創建節點(代碼在ZookeeperMain中)，如下所示：

！！！這里我之前說錯了！！！！並不是僅僅依靠客戶端就完成了判斷，需要與服務器通信的。List children = zk.getChildren(realPath, false);這里getChildren其實是一個與服務端通信的動作。

if (zk.exists(quotaPath, false) == null) {
    try {
    	//在create內部會將請求發送到服務器端。
        zk.create(Quotas.procZookeeper, null, Ids.OPEN_ACL_UNSAFE,
                CreateMode.PERSISTENT);
        zk.create(Quotas.quotaZookeeper, null, Ids.OPEN_ACL_UNSAFE,
                CreateMode.PERSISTENT);
    } catch(KeeperException.NodeExistsException ne) {
        // do nothing
    }
}

如果想了解字典樹的添加和刪除，可以看一下 Zk數據模型-配額。

StatsTrack

StatsTrack其實就是記錄某個接點實際的count和bytes信息。

/**
 * a class that represents the stats associated with quotas
 */
public class StatsTrack {
    private int count;
    private long bytes;
    private String countStr = "count";
    private String byteStr = "bytes";

StatsTrack其實就是個實體類，存的就是字典樹中statnode節點的數據。下面的代碼就是生成statnode時的代碼，可以看到，是把statstrack轉化為字符串后放入statnode中作為它的內容。

StatsTrack strack = new StatsTrack(null);
strack.setBytes(bytes);
strack.setCount(numNodes);
try {
    zk.create(quotaPath, strack.toString().getBytes(),
            Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
    StatsTrack stats = new StatsTrack(null);
    stats.setBytes(0L);
    stats.setCount(0);
    zk.create(statPath, stats.toString().getBytes(),
            Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
}

思考：

有時間可以再仔細研究研究quotas的原理。

不清楚的幾點：

為什么父節點、子節點有quota就不讓添加了呢？？？為什么要這樣設計。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Zookeeper數據存儲總結 Zookeeper數據與存儲 Zookeeper源碼閱讀(十八) 選舉之快速選舉算法FastLeaderElection caffe源碼閱讀(1)-數據流Blob zookeeper原理解析-數據存儲【分布式】Zookeeper數據與存儲【ZooKeeper】ZooKeeper源碼編譯 [閱讀筆記]fsnotify源碼閱讀探索etcd，Zookeeper和Consul一致鍵值數據存儲的性能 3 Kafka兩個版本在Zookeeper的元數據存儲