Tries

本文轉載自查看原文 2018-08-01 10:31 924 algorithms-princeton-coursera

單詞查找樹

和以字符串為鍵的排序算法類似，以字符串為鍵的符號表也有更加高效的實現，可以避免檢測整個鍵。於是乎，先貼下我們要實現的 API:

api

R-way Tries

單詞查找樹（Tires），來自 retrieval，為和‘tree’區分，讀作‘try’。

節點里存儲字符而不是鍵。
每個節點有 R（字母表大小，像拓展 ASCII 是 256）個孩子，分別表示可能的下一個字符。
為了方便，我們先不畫出空鏈接。

於是乎，來個例圖：

tries

Tries: search

查找自然只有匹配和沒有這兩種情況：

Search hit: 查找時的最后一個字符所在節點有非空值，例如：
Search miss: 查找時的遇到空鏈接或是最后一個字符所在節點沒有值，如：

Tries: insertion

往單詞查找樹里插入新字符串時，順着和每個字符匹配的鏈接走下去，然后：

遇到空鏈接時，創建新的節點。
更新字符串最后一個字符所在節點的值。

Tries: Java implementaion

public class TriesST<Value> {
    private static final int R = 256;    // extended ASCII
    private Node root = new Node();

    private static class Node {
        private Object value;
        private Node[] next = new Node[R];
    }

    public void put(String key, Value val) {
        root = put(root, key, val, 0);
    }

    private Node put(Node x, String key, Value val, int d) {
        if (x == null) x = new Node();
        if (d == key.length()) {
            x.val = val;
            return x;
        }
        char c = key.charAt(d);
        // 即用下個字符本身作為數組索引
        x.next[c] = put(x.next[c], key, val, d + 1);
        return x;
    }

    public boolean contains(String key) {
        return get(key) != null;
    }

    public Value get(String key) {
        Node x = get(root, key, 0);
        if (x == null) return null;
        return (Value) x.val;
    }

    private Node get(Node x, String key, int d) {
        if (x == null) return null;
        if (d == key.length()) return x;
        char c = key.charAt(d);
        return get(x.next[c], key, d + 1);
    }
}

Tries: performance

Search hit: 需要檢查字符串的每個字符。
Search miss: 有可能只要首個字符就能判斷沒有，典型案例中只需檢查一些字符（亞線性級別）。
Space: 每個葉節點都有 R 個空鏈接。但如果很多短字符串共享相同前綴，那空間為亞線性級別是可能的。

所以總的來說，可以很快匹配到字符串，判斷沒有甚至更快，但是浪費空間。

Tries: deletion

刪除單詞查找樹中的某個字符串時，首先要找到它，然后把最后一個節點的值置空，再遞歸刪除沒有非空鏈接的空值節點。例子：

deletion

於是，總的來說，對於 R 比較小的情況，可以考慮使用 R-way trie。但是，當 R 比較大時，它所需要的空間就太大了，16 位的 Unicode 就會是 65536-way trie。

Ternary Search Tries

R 路單詞查找樹每個節點的空鏈接數太多，會占用大量的空間，於是我們來了解下三向單詞查找樹。TST 每個節點只有左中右三個鏈接，例圖：

tst

TST: search

查找時和節點存的字符匹配就走中間，比它大走左邊，比它小走右邊。這個過程中碰到空鏈接就說明沒有該字符串，或是字符串最后一個字符所在的節點沒有值，那就也沒有該字符串。

Search Hit

search-hit

Search Miss

search-miss

TST: construction

構造（插入新字符串）TST 和查找類似，只是碰到空鏈接就新建個節點，字符串最后一個字符所到節點給它賦值。

TST: Java implementation

public class TST<Value> {
    private Node root;

    private class Node {
        private Value val;
        private char c;
        private Node left, mid, right;
    }

    public void put(String key, Value val) {
        root = put(root, key, val, 0);
    }

    private Node put(Node x, String key, Value val, int d) {
        char c = key.charAt(d);
        if (x == null) {
            x = new Node();
            x.c = c;
        }
        if (c < x.c) x.left = put(x.left, key, val, d);
        else if (c > x.c) x.right = put(x.right, key, val, d);
        else if (d < key.length() - 1) x.mid = put(x.mid, key, val, d + 1);
        else x.val = val;
        return x;
    }

    public boolean contains(String key) {
        return get(key) != null;
    }

    public Value get(String key) {
        Node x = get(root, key, 0);
        if (x == null) return null;
        return x.val;
    }

    private Node get(Node x, String key, int d) {
        if (x == null) return null;
        char c = key.charAt(d);
        if (c < x.c) return get(x.left, key, d);
        else if (c > x.c) return get(x.right, key, d);
        else if (d < key.length() - 1) return get(x.mid, key, d + 1);
        else return x;
    }
}

TST 的復雜度和紅黑樹相當，查找的效率和哈希實現的符號表差不多。你也可以通過旋轉操作保持它的平衡，來保證最壞情況下的性能。還可以把它和 R-way tries 結合起來：

tst-r2

雖然空間上會多花一點，但是實際中會讓算法跑得更快一些，因為前面兩個字符很快就能判斷有沒有匹配的，可能直接就是 search miss 的情況。

Character-based Operations

這節的最后介紹了一些基於字符的操作。

Keys

keys

即返回單詞查找樹中存儲的所有字符串（鍵），這里是中序遍歷 Trie。

public Iterable<String> keys() {
    Queue<String> queue = new Queue<String>();
    collect(root, "", queue);
    return queue;
}

// prefix: sequence of characters on path from root to x
private void collect(Node x, String prefix, Queue<String> q) {
    if (x == null) return;
    if (x. val != null) q.enqueue(prefix);
    for (char c = 0; c < R; c++)
        collect(x.next[c], prefix + c, q);
}

Prefix

prefix

即找出以輸入字符串為前綴的字符串，像瀏覽器搜索時出現的提示框那樣。

public Iterable<String> keyWithPrefix(String prefix) {
    Queue<String> queue = new Queue<String>();
    // x: root of subtrie for all strings
    // beginning with given prefix
    Node x = get(root, prefix, 0);
    collect(x, prefix, queue);
    return queue;
}

Longest Prefix

例子：

longest-prefix-sample

再來張圖：

longest-prefix

即找出 Trie 中為輸入字符串前綴的最長字符串。

public String longestPrefixOf(String query) {
    int length = search(root, query, 0, 0);
    return query.substring(0, length);
}

private int search(Node x, String query, int d, int length) {
    if (x == null) return length;
    if (x.val != null) length = d;
    if ( d == query.length()) return length;
    char c = query.charAt(d);
    return search(x.next[c], query, d + 1, length);
}

最后稍微提了下前綴樹（patricia trie）和后綴樹（suffix tree），貼圖感受下。

Patricia Trie

patricia-trie

Suffix Tree

suffix-tree

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries. 開發環境解決 kafka Failed to send messages after 3 tries 【異常】kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries nginx proxy_next_upstream 與openresty balancer.set_more_tries的使用 kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries. 最無語的配置解決HBase 出現client.RpcRetryingCaller: Call exception, tries=11, retries=35的超時問題「問題修復」「cargo」warning: spurious network error (2 tries remaining): [6] Couldn't resolve host name (Could not resolve host: crates)