數據結構Tire 樹實際應用----過濾禁詞

本文轉載自查看原文 2012-02-13 17:55 5258 數據結構/實用算法

又稱單詞查找樹，Trie樹，是一種樹形結構，是一種哈希樹的變種。典型應用是用於統計，排序和保存大量的字符串（但不僅限於字符串），所以經常被搜索引擎系統用於文本詞頻統計。它的優點是：利用字符串的公共前綴來節約存儲空間，最大限度地減少無謂的字符串比較，查詢效率比哈希表高。

它有3個基本性質：

　　根節點不包含字符，除根節點外每一個節點都只包含一個字符。從根節點到某一節點，路徑上經過的字符連接起來，為該節點對應的字符串。每個節點的所有子節點包含的字符都不相同。

基本操作

　　其基本操作有:查找插入和刪除,當然刪除操作比較少見.我在這里只是實現了對整個樹的刪除操作,至於單個word的刪除操作也很簡單.

namespace ConsoleApplication1
{
   public  class TireTree
    {
          private TireNode root = new TireNode(' ');
          public TireTree()
          {
            
          }
          private void CreateTireTree(TireNode node, string word, int index)
          {
              if (word.Length == index) return;
              char key = word[index];
              TireNode newNode = null;
              if (node.NextNode.ContainsKey(key))
              {
                  newNode = node.NextNode[key];
              }
              else
              {
                  newNode = new TireNode(key);
                  node.NextNode.Add(key, newNode);
              }
              if (word.Length - 1 == index)
              {
                  newNode.Word = word;
              }
              CreateTireTree(node.NextNode[key], word, index + 1);
          }

          public void AddWords(string word)
          {
              CreateTireTree(root, word, 0);
          }
          public List<string> SearchWords(string content)
          {
              List<string> result = new List<string>();
              char[] charArr = content.ToCharArray();
              TireNode currentNode = root;
              for (int i = 0; i < charArr.Length; i++)
              {
                  if (currentNode.NextNode.ContainsKey(charArr[i])) //如果下個節點找得到當前字，則繼續往下找下個字符。
                  {
                      currentNode = currentNode.NextNode[charArr[i]];
                  }
                  else if (root.NextNode.ContainsKey(charArr[i]))   //如果下個節點找不到當前字，則從根節點找。
                  {
                      currentNode = root.NextNode[charArr[i]];
                  }
                  else                                              //否則下個字符，也從根節點找。
                  {
                      currentNode = root;
                  }
                  if (currentNode.IsWord)
                  {
                      if (!result.Contains(currentNode.Word))
                          result.Add(currentNode.Word);
                  }
              }

              return result;
          }
          private class TireNode
           {
              public char Key { get; set; }
              public string Word { get; set; }
              public bool IsWord { get { return this.Word != null; } }
              private Dictionary<char, TireNode> nextNode = new Dictionary<char, TireNode>();
              public Dictionary<char, TireNode> NextNode
              {
                  get { return nextNode; }
                  set { nextNode = value; }
              }

              public TireNode(char key)
              {
                this.Key = key;
              }
        
           }
    }
}

實現方法

　　搜索字典項目的方法為(1) 從根結點開始一次搜索；

　　(2) 取得要查找關鍵詞的第一個字母，並根據該字母選擇對應的子樹並轉到該子樹繼續進行檢索；

　　(3) 在相應的子樹上，取得要查找關鍵詞的第二個字母,並進一步選擇對應的子樹進行檢索。

　　(4) 迭代過程……

　　(5) 在某個結點處，關鍵詞的所有字母已被取出，則讀取附在該結點上的信息，即完成查找。

　測試代碼

    private void TestTireTree()
        {  const int count=100;  
            string[] Arr = new string[count]; for (int i = 0; i < Arr.Length; i++) { Arr[i] = (Guid.NewGuid()).ToString(); } // 這里是只是演示放入字符，建立Tire樹
            
            TireTree Tree = new TireTree();

            foreach (string str in Arr)
            {
                Tree.AddWords(str );
            }
            string word = "檢測的字符";
           List<string> ResultList=  Tree.SearchWords(word);
            foreach(string result in ResultList)
            {
           Console.WriteLine(string.Format("檢測到非法詞語{0}"),result);
            }

        }

代碼中 CreateTireTree(TireNode node, string word, int index) 使用遞歸實現的，可以修改回溯版本，提高效率。

  private void CreateArrayTireTree(TireNode node, string word)
          {
              char[] wordsarray = word.ToCharArray();
              for (int i = 0; i < wordsarray.Length; i++)
              {
                  char key = word[i];
                  TireNode newNode = null;
                  if (node.NextNode.ContainsKey(key))
                  {
                      newNode = node.NextNode[key];
                  }
                  else
                  {
                      newNode = new TireNode(key);
                      node.NextNode.Add(key, newNode);
                  }
                  if (i == wordsarray.Length - 1)
                  {
                      newNode.Word = word;
                  }

                  node = node.NextNode[key];
              }

            
          }

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 [數據結構]字典樹(Tire樹) Python數據結構應用6——樹如何將數據結構和算法應用到實際之中？如何將數據結構和算法應用到實際之中？數據結構：樹數據結構-樹【數據結構】樹數據結構-樹數據結構——樹數據結構與樹論