Trie 樹 及Java實現


來源於英文“retrieval”.   Trie樹就是字符樹,其核心思想就是空間換時間。

舉個簡單的例子。
   給你100000個長度不超過10的單詞。對於每一個單詞,我們要判斷他出沒出現過,如果出現了,第一次出現第幾個位置。
這題當然可以用hash來,但是我要介紹的是trie樹。在某些方面它的用途更大。比如說對於某一個單詞,我要詢問它的前綴是否出現過。這樣hash就不好搞了,而用trie還是很簡單。

   現在回到例子中,如果我們用最傻的方法,對於每一個單詞,我們都要去查找它前面的單詞中是否有它。那么這個算法的復雜度就是O(n^2)。顯然對於100000的范圍難以接受。現在我們換個思路想。假設我要查詢的單詞是abcd,那么在他前面的單詞中,以b,c,d,f之類開頭的我顯然不必考慮。而只要找以a開頭的中是否存在abcd就可以了。同樣的,在以a開頭中的單詞中,我們只要考慮以b作為第二個字母的……這樣一個樹的模型就漸漸清晰了……

   我們可以看到,trie樹每一層的節點數是26^i級別的。所以為了節省空間。我們用動態鏈表,或者用數組來模擬動態。空間的花費,不會超過單詞數×單詞長度。(轉自一大牛)

Trie樹的java代碼 實現如下:

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;


/** *//**
 * A word trie which can only deal with 26 alphabeta letters.
 * @author Leeclipse
 * @since 2007-11-21
 */

public class Trie{
 
   private Vertex root;//一個Trie樹有一個根節點

    //內部類
    protected class Vertex{//節點類
        protected int words;
        protected int prefixes;
        protected Vertex[] edges;//每個節點包含26個子節點(類型為自身)
        Vertex() {
            words = 0;
            prefixes = 0;
            edges = new Vertex[26];
            for (int i = 0; i < edges.length; i++) {
                edges[i] = null;
            }
        }
    }

  
    public Trie () {
        root = new Vertex();
    }

   
    /** *//**
     * List all words in the Trie.
     * 
     * @return
     */

    public List< String> listAllWords() {
       
        List< String> words = new ArrayList< String>();
        Vertex[] edges = root.edges;
       
        for (int i = 0; i < edges.length; i++) {
            if (edges[i] != null) {
                     String word = "" + (char)('a' + i);
                depthFirstSearchWords(words, edges[i], word);
            }
        }        
        return words;
    }

     /** *//**
     * Depth First Search words in the Trie and add them to the List.
     * 
     * @param words
     * @param vertex
     * @param wordSegment
     */

    private void depthFirstSearchWords(List words, Vertex vertex, String wordSegment) {
        Vertex[] edges = vertex.edges;
        boolean hasChildren = false;
        for (int i = 0; i < edges.length; i++) {
            if (edges[i] != null) {
                hasChildren = true;
                String newWord = wordSegment + (char)('a' + i);                
                depthFirstSearchWords(words, edges[i], newWord);
            }            
        }
        if (!hasChildren) {
            words.add(wordSegment);
        }
    }

    public int countPrefixes(String prefix) {
        return countPrefixes(root, prefix);
    }

    private int countPrefixes(Vertex vertex, String prefixSegment) {
        if (prefixSegment.length() == 0) { //reach the last character of the word
            return vertex.prefixes;
        }

        char c = prefixSegment.charAt(0);
        int index = c - 'a';
        if (vertex.edges[index] == null) { // the word does NOT exist
            return 0;
        } else {

            return countPrefixes(vertex.edges[index], prefixSegment.substring(1));

        }        

    }

    public int countWords(String word) {
        return countWords(root, word);
    }    

    private int countWords(Vertex vertex, String wordSegment) {
        if (wordSegment.length() == 0) { //reach the last character of the word
            return vertex.words;
        }

        char c = wordSegment.charAt(0);
        int index = c - 'a';
        if (vertex.edges[index] == null) { // the word does NOT exist
            return 0;
        } else {
            return countWords(vertex.edges[index], wordSegment.substring(1));

        }        

    }

    
    /** *//**
     * Add a word to the Trie.
     * 
     * @param word The word to be added.
     */

    public void addWord(String word) {
        addWord(root, word);
    }

    
    /** *//**
     * Add the word from the specified vertex.
     * @param vertex The specified vertex.
     * @param word The word to be added.
     */

    private void addWord(Vertex vertex, String word) {
       if (word.length() == 0) { //if all characters of the word has been added
            vertex.words ++;
        } else {
            vertex.prefixes ++;
            char c = word.charAt(0);
            c = Character.toLowerCase(c);
            int index = c - 'a';
            if (vertex.edges[index] == null) { //if the edge does NOT exist
                vertex.edges[index] = new Vertex();
            }

            addWord(vertex.edges[index], word.substring(1)); //go the the next character
        }
    }

    public static void main(String args[])  //Just used for test
    {
    Trie trie = new Trie();
    trie.addWord("China");
    trie.addWord("China");
    trie.addWord("China");

    trie.addWord("crawl");
    trie.addWord("crime");
    trie.addWord("ban");
    trie.addWord("China");

    trie.addWord("english");
    trie.addWord("establish");
    trie.addWord("eat");
    System.out.println(trie.root.prefixes);
     System.out.println(trie.root.words);


   
     List< String> list = trie.listAllWords();
     Iterator listiterator = list.listIterator();
     
     while(listiterator.hasNext())
     {
          String s = (String)listiterator.next();
          System.out.println(s);
     }

   
     int count = trie.countPrefixes("ch");
     int count1=trie.countWords("china");
     System.out.println("the count of c prefixes:"+count);
     System.out.println("the count of china countWords:"+count1);

 
    }
}
運行:
C:\test>java   Trie
10
0
ban
china
crawl
crime
eat
english
establish
the count of c prefixes:4
the count of china countWords:4

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM