樹及其衍生算法（Trees and tree algorithms）

本文轉載自查看原文 2018-12-08 13:59 702 Python Algorithm

1，二叉樹（Binary tree）

　　　　二叉樹：每一個節點最多兩個子節點，如下圖所示：

　　　　相關概念：節點Node，路徑path，根節點root，邊edge，子節點 children，父節點parent，兄弟節點sibling, 子樹subtree，葉子節點leaf node, 度level,樹高hight

節點Node：
路徑path：從一個節點到擰一個節點間的邊
根節點root，
邊edge：節點間的連線
子節點 children，
父節點parent，
兄弟節點sibling, 
子樹subtree，
葉子節點leaf node, 
度level：從當前節點到根節點的路徑中邊的數量
高度 hight：樹中所有節點的最大level

View Code

　　　　二叉樹可以通過多級列表的形式實現,多級列表形式如下，根節點r，有兩個子節點a , b，且a, b節點沒有子節點。

　　　　　　　　　　　mytree =[ r,

　　　　　　　　　　　　　　　　[ a, [ ], [ ] ], [ b, [ ], [ ] ]

　　　　　　　　　　　　　　　 ]

　　　　python實現代碼如下：

#coding:utf-8


#多級列表實現
def binaryTree(r):
    return [r,[],[]]  #root[]為根節點,root[1]左子樹,root[2]右子樹

def insertLeftTree(root,newbranch):
    t = root.pop(1)
    if len(t)>1:
        root.insert(1, [newbranch, t, []])
    else:
        root.insert(1,[newbranch, [], []])
    return root

def insertRightTree(root,newbranch):
    t = root.pop(2)
    if len(t)>1:
        root.insert(2, [newbranch, [], t])
    else:
        root.insert(2,[newbranch, [], []])
    return root
def getRootVal(root):
    return root[0]

def setRootVal(root,val):
    root[0]= val

def getLeftChildren(root):
    return root[1]

def getRightChildren(root):
    return root[2]

r = binaryTree(3)
insertLeftTree(r,4)
insertLeftTree(r,5)
insertRightTree(r,6)
insertRightTree(r,7)
l = getLeftChildren(r)
print(l)

setRootVal(l,9)
print(r)
insertLeftTree(l,11)
print(r)
print(getRightChildren(getRightChildren(r)))

多級列表形式

　　　　二叉樹可以通過節點的形式實現，如下所示：

　　　　python實現代碼如下：

class BinaryTree(object):
    def __init__(self,value):
        self.key = value
        self.leftChild = None
        self.rightChild = None

    def insertLeft(self,newNode):
        if self.leftChild != None:
            temp = BinaryTree(newNode)
            temp.leftChild = self.leftChild
            self.leftChild = temp
        else:
            self.leftChild = BinaryTree(newNode)

    def insertRight(self,newNode):
        if self.rightChild != None:
            temp = BinaryTree(newNode)
            temp.rightChild= self.rightChild
            self.rightChild = temp
        else:
            self.rightChild = BinaryTree(newNode)

    def getRootVal(self):
        return self.key

    def setRootVal(self,value):
        self.key = value

    def getLeftChild(self):
        return self.leftChild
    
    def getRightChild(self):
        return self.rightChild

節點形式

2，二叉樹的應用

　　2.1 解析樹（parse tree）

　　　　解析樹常用於表示真實世界的結構表示，如句子和數學表達式。如下圖是（（7+3）*（5-2））的解析樹表示，根據解析樹的層級結構，從下往上計算，能很好的代替括號的表達式中括號的作用

　　　　將一個全括號數學表達式轉化為解析樹的過程如下：

　　　　　　遍歷表達式：

　　　　　　　　　　1，若碰到“（”，為當前節點插入左節點，並移動到左節點

　　　　　　　　　　2，若碰到 + ,- ,* , /，設置當前節點的值為該符號，並為當前節點插入右節點，並移動到右節點

　　　　　　　　　　3，若碰到數字，設置當前節點的值為該數字，並移動到其父節點

　　　　　　　　　　4，若碰到“）”，移動到當前節點的父節點

　　　　　　python實現代碼如下：（Stack 參見數據結構之棧）

from stackDemo import Stack  #參見數據結構之棧

def buildParseTree(expstr):
    explist = expstr.split()
    s = Stack()
    t = BinaryTree('')
    s.push(t)
    current = t
    for token in explist:
        #token = token.strip()
        if token =='(':
            current.insertLeft('')
            s.push(current)
            current = current.getLeftChild()
        elif token in ['*','/','+','-']:
            current.setRootVal(token)
            current.insertRight('')
            s.push(current)
            current = current.getRightChild()
        elif token not in ['(','*','/','+','-',')']:
            current.setRootVal(token)
            current = s.pop()
        elif token==')':
            current = s.pop()
        else:
            raise ValueError
    return t

t = buildParseTree("( ( 10 + 5 ) * 3 )")

構造解析樹

　　　　計算解析樹：數學表達式轉化為解析樹后，可以對其進行計算，python代碼如下：　

import operator
def evaluate(parseTree):
    operators={'+':operator.add,'-':operator.sub,'*':operator.mul,'/':operator.div }
    rootval = parseTree.getRootVal()
    left = parseTree.getLeftChild()
    right = parseTree.getRightChild()

    if left and right:
        fn = operators[rootval]
        return fn(evaluate(left),evaluate(right))
    else:
        return parseTree.getRootVal()

計算解析樹

　　　　中序遍歷解析樹，可以將其還原為全括號數學表達式，python代碼如下：

#解析樹轉換為全括號數學表達式
def printexp(tree):
    val = ''
    if tree:
        val = '('+printexp(tree.getLeftChild())
        val = val +str(tree.getRootVal())
        val = val +printexp(tree.getRightChild())+')'
        if tree.getLeftChild()==None and tree.getRightChild()==None:
            val = val.strip('()')
    return val

t = buildParseTree("( ( 10 + 5 ) * 3 )")
exp = printexp(t)
print exp

View Code

3，樹的遍歷

　　　　樹的遍歷包括前序遍歷(preorder)，中序遍歷(inorder)和后序遍歷(postorder).

　　　　前序遍歷：先訪問根節點，再訪問左子樹，最后訪問右子樹（遞歸），python代碼實現如下：

def preorder(tree):
    if tree:
        print tree.getRootVal()
        preorder(tree.getLeftChild())
        preorder(tree.getRightChild())

#定義在類中的前序遍歷
# def preorder(self):
#     print self.key
#     if self.leftChild:
#         self.leftChild.preorder()
#     if self.rightChild:
#         self.rightChild.preorder()

preorder

　　　　中序遍歷：先訪問左子樹，再訪問根節點，最后訪問右子樹（遞歸），python代碼實現如下：

#中序遍歷inorder
def inorder(tree):
    if tree:
        preorder(tree.getLeftChild())
        print tree.getRootVal()
        preorder(tree.getRightChild())

View Code

　　　　后續遍歷：先訪問左子樹，再訪問右子樹，最后訪問根節點，python代碼實現如下：

def postorder(tree):
    if tree :
        postorder(tree.getLeftChild())
        postorder(tree.getRightChild())
        print(tree.getRootVal())

View Code

　　 樹的層次遍歷，樹的深度，前序遍歷和中序遍歷構建樹，判斷兩棵樹是否相同：

class TreeNode(object):
    def __init__(self, data, leftchild=None, rightchild=None):
        self.data = data
        self.leftchild = leftchild
        self.rightchild = rightchild
    def preorder(self):
        print self.data
        if self.leftchild:
            self.leftchild.preorder()
        if self.rightchild:
            self.rightchild.preorder()
    def midorder(self):
        if self.leftchild:
            self.leftchild.preorder()
        print self.data
        if self.rightchild:
            self.rightchild.preorder()
t1 = TreeNode(4,TreeNode(3,TreeNode(5,TreeNode(10)),TreeNode(8)),TreeNode(9,TreeNode(7),TreeNode(12)))    



# #層次遍歷
def lookup(root):
    row=[root]
    while row:
        print [x.data for x in row]
        temp=[]
        for item in row:
            if item.leftchild:
                temp.append(item.leftchild)
            if item.rightchild:
                temp.append(item.rightchild)
        row = temp
lookup(t1)

#樹的深度
def get_height(root):
    if root ==None:
        return 0
    return max(get_height(root.leftchild),get_height(root.rightchild))+1
print(get_height(t1))

#根據前序遍歷和中序遍歷構建樹
pre=[4,3,5,10,8,9,7,12]  # t1.preorder()
mid=[3,5,10,8,4,9,7,12]  # t1.midorder()
def build(pre,mid):
    if not pre:
        return None
    node = TreeNode(pre[0])
    index = mid.index(pre[0])
    node.leftchild = build(pre[1:index+1],mid[:index])
    node.rightchild = build(pre[index+1:],mid[index+1:])
    return node
tt = build(pre,mid)
tt.preorder()

#判斷兩棵樹是否相同
t1 = TreeNode(4,TreeNode(3,TreeNode(5,TreeNode(10)),TreeNode(8)),TreeNode(9,TreeNode(7),TreeNode(12)))    
t2 = TreeNode(4,TreeNode(3,TreeNode(5,TreeNode(10)),TreeNode(8)),TreeNode(9,TreeNode(7),TreeNode(12)))    
t3 = TreeNode(4,TreeNode(3,TreeNode(8,TreeNode(40)),TreeNode(13)),TreeNode(9,TreeNode(7),TreeNode(12)))
def is_same_tree(t1,t2):
    if t1==None and t2==None:
        return True
    elif t1 and t2:
        return is_same_tree(t1.leftchild,t2.leftchild) and t1.data==t2.data and is_same_tree(t1.rightchild,t2.rightchild)
    else:
        return False
print(is_same_tree(t1,t2))
print(is_same_tree(t1,t3))

View Code

　　　morris 遍歷：上面的前中后序遍歷方法都使用了遞歸，需要額外的空間，morris 遍歷為非遞歸，空間復雜度為O(1), 當二叉樹數據量龐大時更加適用

　　　　　　Morris遍歷算法的步驟如下：（中序遍歷）

　　　　　　　　1，根據當前節點，找到其前序節點，如果前序節點的右孩子是空，那么把前序節點的右孩子指向當前節點，然后進入當前節點的左孩子。

　　　　　　　　2，如果當前節點的左孩子為空，打印當前節點，然后進入右孩子。

　　　　　　　　3，如果當前節點的前序節點其右孩子指向了它本身，那么把前序節點的右孩子設置為空，打印當前節點，然后進入右孩子。

前序節點：給定某個節點，在中序遍歷中，直接排在它前面的節點，我們稱之為該節點的前序節點

　　　 前序節點尋找算法：

　　　　　　如果該節點有左孩子，那么從左孩子開始，沿着左孩子的右孩子指針一直向下走到底，得到的節點就是它的前序節點

　　　　　　如果左孩子的右節點指針是空，那么左孩子就是當前節點的前序節點

　　　　　　如果當前節點沒有左孩子，並且它是其父節點的右孩子，那么它的前序節點就是它的父節點

　　　　　　如果當前節點沒有左孩子，並且它是父節點的左孩子，那么它沒有前序節點，並且它自己就是首節點。

　　　python實現morris遍歷代碼如下：　　

class TreeNode(object):
    def __init__(self, data, leftchild=None, rightchild=None):
        self.data = data
        self.leftchild = leftchild
        self.rightchild = rightchild
    def preorder(self):
        print self.data
        if self.leftchild:
            self.leftchild.preorder()
        if self.rightchild:
            self.rightchild.preorder()
    def midorder(self):
        if self.leftchild:
            self.leftchild.midorder()
        print self.data
        if self.rightchild:
            self.rightchild.midorder()
t1 = TreeNode(4,TreeNode(3,TreeNode(5,TreeNode(10)),TreeNode(8)),TreeNode(9,TreeNode(7),TreeNode(12)))    
    
#morris遍歷
def morris(root):
    if root==None:
        return None
    cur=root
    while cur!=None:
        if cur.leftchild==None:
            print cur.data
            cur = cur.rightchild
        else:
            pre = get_predecessor(cur)
            if pre.rightchild==None:
                pre.rightchild=cur
                cur = cur.leftchild
            elif(pre.rightchild==cur):
                pre.rightchild=None
                print cur.data
                cur = cur.rightchild
def get_predecessor(node):
    pre = node
    if pre.leftchild!=None:
        pre = pre.leftchild
        while pre.rightchild!=None and pre.rightchild!=node:
            pre = pre.rightchild
    return pre
t1.midorder()
print("="*20)
morris(t1)

morris遍歷（中序）

　　　　參考：二叉樹的morris遍歷

　　　　　　 morris方法遍歷二叉樹　　　　

4，優先隊列和二叉堆（priority queue and binary heap）

　　　　優先隊列：優先隊列和隊列類似，enqueue操作能加入元素到隊列末尾，dequeue操作能移除隊列首位元素，不同的是優先隊列的元素具有優先級，首位元素具有最高或最小優先級，因此當進行enqueue操作時，還需要根據元素的優先級將其移動到適合的位置。優先隊列一般利用二叉堆來實現，其enqueue和dequeue的復雜度都為O(logn)。（也可以用list來實現，但list的插入復雜度為O(n)，再進行排序的復雜度為O(n logn)）

　　　　二叉堆：二叉堆是一顆完全二叉樹，當父節點的鍵值總是大於或等於任何一個子節點的鍵值時為最大堆，當父節點的鍵值總是小於或等於任何一個子節點的鍵值時為最小堆。（完全二叉樹：除最后一層外，每一層上的節點數均達到最大值；在最后一層上只缺少右邊的若干結點；滿二叉樹：除葉子結點外的所有結點均有兩個子結點。節點數達到最大值。所有葉子結點必須在同一層上）

　　　　最小堆示例及操作如下：（父節點的值總是小於或等於子節點）

BinaryHeap() #創建空的二叉堆
insert(k)   #插入新元素
findMin()    #返回最小值，不刪除
delMin()     #返回最小值，並刪除
isEmpty()
size()
buildHeap(list)  #通過list創建二叉堆

View Code

　　　　對於完全二叉樹，若根節點的序號為p，則左右節點的序號應該為2p和2p+1，結合上圖可以發現，可以用一個隊列（首位元素為0）來表示二叉堆的結構。最小堆的python實現代碼如下：（heaplist中第一個元素為0，不會用到，只是為了保證二叉堆的序列從1開始，方便進行除和乘2p，2p+1）

#coding:utf-8

class BinaryHeap(object):
    def __init__(self):
        self.heapList=[0]
        self.size = 0

    #將元素加到完全二叉樹末尾，然后再根據其大小調整其位置
    def insert(self,k):
        self.heapList.append(k)
        self.size = self.size+1
        self._percUp(self.size)

    # 如果當前節點比父節點小，和父節點交換位置，一直向上重復該過程
    def _percUp(self,size):
        i = size
        while i>0:
            if self.heapList[i]<self.heapList[i//2]:
                temp = self.heapList[i]
                self.heapList[i] = self.heapList[i//2]
                self.heapList[i//2] = temp
            i=i//2

    # 將根元素返回，並將最末尾元素移動到根元素保持完全二叉樹結構不變，再根據大小，將新的根元素向下移動到合適的位置
    def delMin(self):
        temp = self.heapList[1]
        self.heapList[1]=self.heapList[self.size]
        self.size = self.size-1
        self.heapList.pop()
        self._percDown(1)
        return temp

    # 如果當前節點比最小子節點大，和該子節點交換位置，一直向下重復該過程
    def _percDown(self,i):
        while (2*i)<=self.size:
            mc = self._minChild(i)
            if self.heapList[i]>self.heapList[mc]:
                temp = self.heapList[i]
                self.heapList[i]=self.heapList[mc]
                self.heapList[mc] =temp
            i = mc

    #返回左右子節點中較小子節點的位置
    def _minChild(self,i):
        if (2*i+1)>self.size:
            return 2*i
        else:
            if self.heapList[2*i] < self.heapList[2*i+1]:
                return 2*i
            else:
                return 2*i+1

    #通過一個list建立二叉堆
    def buildHeap(self,list):
        i = len(list)//2
        self.heapList = [0]+list[:]
        self.size = len(list)
        while i>0:
            self._percDown(i)
            i = i-1

View Code

　　　　 insert()插入過程示例圖如下：將元素加到完全二叉樹末尾，然后再根據其大小調整其位置

　　　　delMin()操作過程示例如下：將根元素返回，並將最末尾元素移動到根元素保持完全二叉樹結構不變，再根據大小，將新的根元素向下移動到合適的位置

　　　　insert和delMin的復雜度都為O(log n), buildHeap的復雜度為O(n)，利用二叉堆對list進行排序，復雜度為O(n log n)，代碼如下：

#通過list構造二叉堆，然后不斷將堆頂元素返回，就得到排序好的list
alist = [54,26,93,17,98,77,31,44,55,20]
h = BinaryHeap()
h.buildHeap(alist)
s=[]
while h.size>0:
    s.append(h.delMin())
print s

View Code

#堆排序
def build_min_heap(alist):
    size = len(alist)
    hq = [0]+alist
    i = len(alist)//2
    while i>0:
        movedown(hq,i,size)
        i = i-1
    return hq
def movedown(hq,i,size):
    while (2*i)<=size:
        small = 2*i
        if 2*i+1<=size and hq[2*i]>hq[2*i+1]:
            small = 2*i+1
        if hq[i]>hq[small]:
            hq[i],hq[small] = hq[small],hq[i]
        i = small

def heappop(hq):
    temp = hq[1]
    hq[1]=hq[-1]
    hq.pop()
    movedown(hq,1,len(hq)-1)
    return temp    

alist = [2,4,6,7,1,2,5,25,15,20,1,21,33,18,29]
q = build_min_heap(alist)
t = []
for i in range(len(alist)):
    t.append(heappop(q))
print t

堆排序

#coding:utf-8

#堆排序
def build_max_heap(alist):
    length = len(alist)
    for i in range(length/2,-1,-1):
        heapify(alist,i,length)
        
def heapify(alist,i,length):
    left = 2*i+1
    right = 2*i+2
    largest = i
    if left<length and alist[left]>alist[largest]:  
        largest = left
    if right<length and alist[right]>alist[largest]:
        largest = right    
    if largest!=i:
        swap(alist,i,largest)
        heapify(alist,largest,length)
def swap(alist,i,j):
    alist[i],alist[j] = alist[j],alist[i]

def heapsort(alist):
    length = len(alist)
    build_max_heap(alist)
    for i in range(len(alist)-1,0,-1):
        swap(alist,0,i)
        length = length-1
        heapify(alist,0,length)
    return alist
alist = [2,4,6,7,1,2,5,80,10,9,25,15,20,1,21,33,18,29]
print(heapsort(alist))

最大堆排序列表

5，二叉搜索樹（Binary Search Tree, bst）

　　　　二叉搜索樹：左節點的值，總是小於其父節點的值，右節點的值總是大於其父節點的值(bst property)。如下圖所示：

　　　　利用python實現二叉搜索樹代碼如下：

#二叉查找樹
class TreeNode(object):
    def __init__(self,value,leftchild=None,rightchild=None,parent=None):
        self.value = value
        self.leftchild = leftchild
        self.rightchild = rightchild
        self.parent = parent
        
    def is_leaf(self):
        return not self.leftchild and not self.rightchild
    
    def is_leftchild(self):
        return self.parent.leftchild==self
    
    def is_rightchild(self):
        return self.parent.rightchild==self
    
    def has_both_children(self):
        return self.leftchild and self.rightchild
    
    def has_left_child(self):
        return self.leftchild
    
    def has_right_child(self):
        return self.rightchild
    
    def delete(self):
        if self.is_leftchild():
            self.parent.leftchild=None
        elif self.is_rightchild():
            self.parent.rightchild=None
        
class BinarySearchTree(object):
    def __init__(self,node=None):
        self.root=node
        self.size = 0
        
    def length(self):
        return self.szie
        
    def insert(self,value):
        if self.root==None:
            self.root = TreeNode(value)
        else:
            self._insert(self.root,value)
    def _insert(self,node,value):
        if node.value>value:
            if node.leftchild:
                self._insert(node.leftchild,value)
            else:
                temp = TreeNode(value)
                node.leftchild=temp
                temp.parent = node
        elif node.value<value:
            if node.rightchild:
                self._insert(node.rightchild,value)
            else:
                temp = TreeNode(value)
                node.rightchild=temp 
                temp.parent = node
        else:
            print("%s已經存在"%value)
            
    def search(self,value):
        if self.root==None:
            return None
        else:
            return self._search(self.root,value)
            
    def _search(self,node,value):
        if node==None:
            return None
        if node.value>value:
            return self._search(node.leftchild,value)
        elif node.value<value:
            return self._search(node.rightchild,value)
        else:
            return node
            
    def delete(self,value):
        node = self._search(self.root,value)
        if node==None:
            return None
        if node.is_leaf():    #刪除節點為葉子結點
            node.delete()
        elif node.has_both_children():  #刪除節點有兩個孩子
            successor = self.find_min(node)
            node.value = successor.value
            if successor.is_leaf():
                successor.delete()
            else:  #successor 只可能有一個右節點
                if successor.is_leftchild():
                    successor.parent.leftchild = successor.rightchild
                elif successor.is_rightchild():
                    successor.parent.rightchild = successor.rightchild
                successor.rightchild.parent = successor.parent
        else:                         #刪除節點只有一個孩子
            if node.has_left_child():
                if node.is_leftchild():
                    node.parent.leftchild=node.leftchild
                    node.leftchild.parent=node.parent
                elif node.is_rightchild:
                    node.parent.rightchild = node.leftchild
                    node.leftchild.parent = node.parent
            elif node.has_right_child():
                if node.is_leftchild():
                    node.parent.leftchild = node.rightchild
                    node.rightchild.parent = node.parent
                elif node.is_rightchild():
                    node.parent.rightchild = node.rightchild
                    node.rightchild.parent = node.parent    
    
    def find_min(self,node):
        cur = node.rightchild
        while cur.leftchild:     #右子樹的最小值
            cur = cur.leftchild
        return cur
    
    def traverse(self):
        row=[self.root]
        while row:
            print([i.value for i in row])
            temp=[]
            for node in row:
                if node.leftchild:
                    temp.append(node.leftchild)
                if node.rightchild:
                    temp.append(node.rightchild)
            row = temp

if __name__=='__main__':
    root = BinarySearchTree()
    root.insert(18)
    root.insert(13)
    root.insert(8)
    root.insert(16)
    root.insert(28)
    root.insert(20)
    root.insert(38)
    root.traverse()
    root.insert(17)
    root.insert(10)
    print(root.search(16))
    print(root.search(12))
    print("*"*30)
    root.traverse()
    # print("delete leaf")
    # root.delete(10)
    # root.traverse()
    # print("delete node with one child")
    # root.delete(16)
    # root.traverse()
    print("delete node with two children")
    root.delete(13)
    root.traverse()

二叉查找樹

　　　　上述代碼中，進行節點刪除時注意有三種情況：

　　　　　　刪除節點為葉子結點：直接刪除節點，然后將其父節點的左子節點或右子節點設為None

　　　　　　刪除節點有一個孩子節點：利用子節點代替刪除節點原來的位置

　　　　　　刪除節點有兩個孩子節點：找到刪除節點的后繼節點（其左子樹的最右邊節點，或者是其右子樹的最左邊節點），利用后繼節點代替該節點的位置

　　　　利用二叉搜索樹可以實現map（字典），常用操作如下：

Map()   # 創建字典
put(key,val)    #  字典中插入數據
get(key)        #  取鍵值
del                 # 刪除
len()              # 求長度
in              #  是否存在

View Code

　　　　python實現map代碼如下：

#coding:utf-8

class TreeNode(object):
    def __init__(self,key, value, leftChild=None,rightChild=None,parent=None):
        self.key = key
        self.value = value
        self.leftChild = leftChild
        self.rightChild = rightChild
        self.parent = parent
        self.balanceFactor =0

    def hasLeftChild(self):
        return self.leftChild

    def hasRightChild(self):
        return self.rightChild

    def isLeftChild(self):
        return self.parent and self.parent.leftChild==self

    def isRightChild(self):
        return self.parent and self.parent.rightChild==self

    def isRoot(self):
        return not self.parent

    def isLeaf(self):
        return not (self.leftChild or self.rightChild)

    def hasAnyChildren(self):
        return self.leftChild or self.rightChild

    def hasBothChildren(self):
        return self.leftChild and self.rightChild

    def replaceNodeData(self,key,value,lc=None,rc=None):
        self.key=key
        self.value = value
        self.leftChild = lc
        self.rightChild = rc
        if self.hasLeftChild():
            self.leftChild.parent = self
        if self.hasRightChild():
            self.rightChild = self

    def __iter__(self):
        if self:
            if self.hasLeftChild():
                for elem in self.leftChild:  #調用self.leftChiLd.__iter__()，所以此處是遞歸的
                    yield elem
            yield self.key, self.value, self.balanceFactor
            if self.hasRightChild():
                for elem in self.rightChild:  #調用self.rightChiLd.__iter__()
                    yield elem

    def findSuccessor(self):  #尋找繼承
        succ = None
        if self.hasRightChild():
            succ = self.rightChild._findMin()
        else:
            if self.parent:
                if self.isLeftChild():
                    succ = self.parent
                else:
                    self.parent.rightChild = None
                    succ = self.parent.findSuccessor()
                    self.parent.rightChild = self
        return succ

    def _findMin(self):
        current = self
        while current.hasLeftChild():
            current = current.leftChild
        return current

    def spliceOut(self):
        if self.isLeaf():
            if self.isLeftChild():
                self.parent.leftChild=None
            else:
                self.parent.rightChild=None
        elif self.hasAnyChildren():
            if self.hasLeftChild():
                if self.isLeftChild():
                    self.parent.leftChild = self.leftChild
                else:
                    self.parent.rightChild = self.leftChild
                self.leftChild.parent = self.parent
            else:
                if self.isLeftChild():
                    self.parent.leftChild = self.rightChild
                else:
                    self.parent.rightChild = self.rightChild
                self.rightChild.parent = self.parent


class BinarySearchTree(object):

    def __init__(self):
        self.root = None
        self.size = 0

    def length(self):
        return self.size

    def __len__(self):
        return self.size

    def __iter__(self):
        return self.root.__iter__()

    #加入元素
    def put(self,key,value):
        if self.root:
            self._put(key,value,self.root)
        else:
            self.root = TreeNode(key,value)
        self.size = self.size+1

    def _put(self,key,value,currentNode):
        if currentNode.key<key:
            if currentNode.hasRightChild():
                self._put(key,value,currentNode.rightChild)
            else:
                currentNode.rightChild=TreeNode(key,value,parent=currentNode)
        elif currentNode.key>key:
            if currentNode.hasLeftChild():
                self._put(key,value,currentNode.leftChild)
            else:
                currentNode.leftChild=TreeNode(key,value,parent=currentNode)
        else:
            currentNode.replaceNodeData(key,value)

    def __setitem__(self, key, value):
        self.put(key,value)

    #獲取元素值
    def get(self,key):
        if self.root:
            node = self._get(key,self.root)
            if node:
                return node.value
            else:
                return None
        else:
            return None

    def _get(self,key,currentNode):
        if not currentNode:
            return None
        if currentNode.key==key:
            return currentNode
        elif currentNode.key<key:
            return self._get(key,currentNode.rightChild)  #rightChild可能不存在
        else:
            return self._get(key,currentNode.leftChild)  #leftChild可能不存在

    # def _get(self,key,currentNode):
    #     if currentNode.key == key:
    #         return currentNode
    #     elif currentNode.key<key:
    #         if currentNode.hasRightChild():
    #             return self._get(key,currentNode.rightChild)
    #         else:
    #             return None
    #     else:
    #         if currentNode.hasLeftChild():
    #             return self._get(key,currentNode.leftChild)
    #         else:
    #             return None

    def __getitem__(self, key):
        return self.get(key)

    def __contains__(self, key): #實現 in 操作
        if self._get(key,self.root):
            return True
        else:
            return False

    def delete(self,key):
        if self.size>1:
            node = self._get(key,self.root)
            if node:
                self._del(node)
                self.size = self.size - 1
            else:
                raise KeyError('Error, key not in tree')
        elif self.size==1 and self.root.key==key:
            self.root = None
            self.size = self.size - 1
        else:
            raise KeyError('Error, key not in tree')

    def _del(self,currentNode):
        if currentNode.isLeaf():
            if currentNode.isLeftChild():
                currentNode.parent.leftChild = None
            elif currentNode.isRightChild():
                currentNode.parent.rightChild = None
        elif currentNode.hasBothChildren():
            successor = currentNode.findSuccessor()  #此處successor為其右子樹的最小值，即最左邊的值
            successor.spliceOut()
            currentNode.key = successor.key
            currentNode.value = successor.value
        elif currentNode.hasAnyChildren():
            if currentNode.hasLeftChild():
                if currentNode.isLeftChild():
                    currentNode.parent.leftChild = currentNode.leftChild
                    currentNode.leftChild.parent = currentNode.parent
                elif currentNode.isRightChild():
                    currentNode.parent.rightChild = currentNode.leftChild
                    currentNode.leftChild.parent = currentNode.parent
                else:  # currentNode has no parent (is root)
                    currentNode.replaceNodeData(currentNode.leftChild.key,
                                        currentNode.leftChild.value,
                                        currentNode.leftChild.leftChild,
                                        currentNode.leftChild.rightChild)
            elif currentNode.hasRightChild():
                if currentNode.isLeftChild():
                    currentNode.parent.leftChild = currentNode.rightChild
                    currentNode.rightChild.parent = currentNode.parent
                elif currentNode.isRightChild():
                    currentNode.parent.rightChild = currentNode.rightChild
                    currentNode.rightChild.parent = currentNode.parent
                else:  # currentNode has no parent (is root)
                    currentNode.replaceNodeData(currentNode.rightChild.key,
                                        currentNode.rightChild.value,
                                        currentNode.rightChild.leftChild,
                                        currentNode.rightChild.rightChild)

    def __delitem__(self, key):
        self.delete(key)
if __name__ == '__main__':
    mytree = BinarySearchTree()
    mytree[8]="red"
    mytree[4]="blue"
    mytree[6]="yellow"
    mytree[5]="at"
    mytree[9]="cat"
    mytree[11]="mat"

    print(mytree[6])
    print(mytree[5])
    for x in mytree:
        print x

    del mytree[6]
    print '-'*12
    for x in mytree:
        print x

View Code

　　　　在上述代碼中最復雜的為刪除操作，刪除節點時有三種情況：節點為葉子節點，節點有兩個子節點，節點有一個子節點。當節點有兩個子節點時，對其刪除時，應該用其右子樹的最小值來代替其位置（即右子樹中最左邊的值）。

　　　　對於map進行復雜度分析，可以發現put，get取決於tree的高度，當節點隨機分配時復雜度為O(log n)，但當節點分布不平衡時，復雜度會變成O(n)，如下圖所示：

6, 平衡二叉搜索樹（Balanced binary search tree, AVL tree）

　　　　平衡二叉搜索樹：又稱為AVL Tree，取名於發明者G.M. Adelson-Velskii 和E.M. Landis，在二叉搜索樹的基礎上引入平衡因子（balance factor），每次插入和刪除節點時都保持樹平衡，從而避免上面出現的搜索二叉樹復雜度會變成O(n)。一個節點的balance factor的計算公式如下，即該節點的左子樹高度減去右子樹高度。

　　　　當樹所有節點的平衡因子為-1,0,1時，該樹為平衡樹，平衡因子大於1或小於-1時，樹不平衡需要調整，下圖為一顆樹的各個節點的平衡因子。（1時樹left-heavy，0時完全平衡，-1時right-heavy）

　　　　相比於二叉搜索樹，AVL樹的put和delete操作后，需要對節點的平衡因子進行更新，如果某個節點不平衡時，需要進行平衡處理，主要分為左旋轉和右旋轉。

　　　　左旋轉：如圖，節點A的平衡因子為-2（right heavy），不平衡，對其進行左旋轉，即以A為旋轉點，AB邊逆時針旋轉。

　　　　　　　　詳細操作為：1，A的右節點B作為新的子樹根節點

　　　　　　　　　　　　　　2，A成為B的左節點，如果B有左節點時，將其左節點變為A的右節點（A的右節點原來為B，所以A的右節點現在為空）

　　　　右旋轉：如圖，節點E的平衡因子為2（left heavy），不平衡，對其進行右旋轉，即以E為旋轉點，EC邊順時針旋轉。

　　　　　　　　詳細操作為：1，E的左節點C作為新的子樹根節點

　　　　　　　　　　　　　　2，E成為C的右節點，如果C有右節點時，將其右節點變為E的左節點（E的左節點原來為C，所以E的左節點現在為空）

　　　　特殊情況：當出現下面的情況時，如圖所示，A依舊為right heavy，但若進行左旋轉，又會出現left heavy，無法完成平衡操作。所以在進行左旋轉和右旋轉前需要進行一步判斷，具體操作如下：

　　　　　　1，如果某節點需要進行左旋轉平衡時（right heavy），檢查其右子節點的平衡因子，若右子節點為left heavy，先對右子節點右旋轉，然后對該節點左旋轉

　　　　　　2，如果某節點需要進行右旋轉平衡時（left heavy），檢查其左子節點的平衡因子，若左子節點為right heavy，先對左子節點左旋轉，然后對該節點右旋轉

　　　　AVL tree用python實現的代碼如下：

#coding:utf-8

from binarySearchTree import TreeNode, BinarySearchTree

# class AVLTreeNode(TreeNode):
#
#     def __init__(self,*args,**kwargs):
#         self.balanceFactor = 0
#         super(AVLTreeNode,self).__init__(*args,**kwargs)

class AVLTree(BinarySearchTree):

    def _put(self,key,value,currentNode):
        if currentNode.key<key:
            if currentNode.hasRightChild():
                self._put(key,value,currentNode.rightChild)
            else:
                currentNode.rightChild=TreeNode(key,value,parent=currentNode)
                self.updateBalance(currentNode.rightChild)
        elif currentNode.key>key:
            if currentNode.hasLeftChild():
                self._put(key,value,currentNode.leftChild)
            else:
                currentNode.leftChild=TreeNode(key,value,parent=currentNode)
                self.updateBalance(currentNode.leftChild)
        else:
            currentNode.replaceNodeData(key,value)

    def _del(self,currentNode):
        if currentNode.isLeaf():
            if currentNode.isLeftChild():
                currentNode.parent.leftChild = None
                currentNode.parent.balanceFactor -=1
            elif currentNode.isRightChild():
                currentNode.parent.rightChild = None
                currentNode.parent.balanceFactor += 1
            if currentNode.parent.balanceFactor>1 or currentNode.parent.balanceFactor<-1:
                self.reblance(currentNode.parent)
        elif currentNode.hasBothChildren():
            successor = currentNode.findSuccessor()  #此處successor為其右子樹的最小值，即最左邊的值
            # 先更新parent的balanceFactor
            if successor.isLeftChild():
                successor.parent.balanceFactor -= 1
            elif successor.isRightChild():
                successor.parent.balanceFactor += 1
            successor.spliceOut()
            currentNode.key = successor.key
            currentNode.value = successor.value

            # 刪除后，再判斷是否需要再平衡，然后進行再平衡操作
            if successor.parent.balanceFactor>1 or successor.parent.balanceFactor<-1:
                self.reblance(successor.parent)
        elif currentNode.hasAnyChildren():

            #先更新parent的balanceFactor
            if currentNode.isLeftChild():
                currentNode.parent.balanceFactor -= 1
            elif currentNode.isRightChild():
                currentNode.parent.balanceFactor += 1

            if currentNode.hasLeftChild():
                if currentNode.isLeftChild():
                    currentNode.parent.leftChild = currentNode.leftChild
                    currentNode.leftChild.parent = currentNode.parent
                elif currentNode.isRightChild():
                    currentNode.parent.rightChild = currentNode.leftChild
                    currentNode.leftChild.parent = currentNode.parent
                else:  # currentNode has no parent (is root)
                    currentNode.replaceNodeData(currentNode.leftChild.key,
                                        currentNode.leftChild.value,
                                        currentNode.leftChild.leftChild,
                                        currentNode.leftChild.rightChild)
            elif currentNode.hasRightChild():
                if currentNode.isLeftChild():
                    currentNode.parent.leftChild = currentNode.rightChild
                    currentNode.rightChild.parent = currentNode.parent
                elif currentNode.isRightChild():
                    currentNode.parent.rightChild = currentNode.rightChild
                    currentNode.rightChild.parent = currentNode.parent
                else:  # currentNode has no parent (is root)
                    currentNode.replaceNodeData(currentNode.rightChild.key,
                                        currentNode.rightChild.value,
                                        currentNode.rightChild.leftChild,
                                        currentNode.rightChild.rightChild)
             #刪除后，再判斷是否需要再平衡，然后進行再平衡操作
            if currentNode.parent!=None: #不是根節點
                if currentNode.parent.balanceFactor>1 or currentNode.parent.balanceFactor<-1:
                    self.reblance(currentNode.parent)

    def updateBalance(self,node):
        if node.balanceFactor>1 or node.balanceFactor<-1:
            self.reblance(node)
            return
        if node.parent!=None:
            if node.isLeftChild():
                node.parent.balanceFactor +=1
            elif node.isRightChild():
                node.parent.balanceFactor -=1
            if node.parent.balanceFactor!=0:
                self.updateBalance(node.parent)

    def reblance(self,node):
        if node.balanceFactor>1:
            if node.leftChild.balanceFactor<0:
                self.rotateLeft(node.leftChild)
            self.rotateRight(node)
        elif node.balanceFactor<-1:
            if node.rightChild.balanceFactor>0:
                self.rotateRight(node.rightChild)
            self.rotateLeft(node)

    def rotateLeft(self,node):
        newroot = node.rightChild
        node.rightChild = newroot.leftChild
        if newroot.hasLeftChild():
            newroot.leftChild.parent = node
        newroot.parent = node.parent
        if node.parent!=None:
            if node.isLeftChild():
                node.parent.leftChild = newroot
            elif node.isRightChild():
                node.parent.rightChild = newroot
        else:
            self.root = newroot
        newroot.leftChild = node
        node.parent = newroot
        node.balanceFactor = node.balanceFactor+1-min(newroot.balanceFactor,0)
        newroot.balanceFactor = newroot.balanceFactor+1+max(node.balanceFactor,0)

    def rotateRight(self,node):
        newroot = node.leftChild
        node.leftChild = newroot.rightChild
        if newroot.rightChild!=None:
            newroot.rightChild.parent = node
        newroot.parent = node.parent
        if node.parent!=None:
            if node.isLeftChild():
                node.parent.leftChild = newroot
            elif node.isRightChild():
                node.parent.rightChild = newroot
        else:
            self.root = newroot
        newroot.rightChild = node
        node.parent = newroot
        node.balanceFactor = node.balanceFactor-1-max(newroot.balanceFactor,0)
        newroot.balanceFactor = newroot.balanceFactor-1+min(node.balanceFactor,0)

if __name__ == '__main__':
    
    mytree = AVLTree()
    mytree[8]="red"
    mytree[4]="blue"
    
    mytree[6]="yellow"
    
    mytree[5]="at"
    
    mytree[9]="cat"
    
    mytree[11]="mat"
    
    print(mytree[6])
    print(mytree[5])
    
    print '-'*12
    print ('key','value','balanceFactor')
    for x in mytree:
        print x
    print 'root:',mytree.root.key


    del mytree[6]
    print '-'*12
    print ('key','value','balanceFactor')
    for x in mytree:
        print x
    print 'root:',mytree.root.key

View Code

　　　　AVL Tree繼承了二叉搜索樹，對其插入和刪除方法進行了重寫，另外對TreeNode增加了balanceFactor屬性。再進行左旋轉和右旋轉時，對於balanceFactor的需要計算一下，如圖的左旋轉過程中，D成為了新的根節點，只有B和D的平衡因子發生了變化，需要對其進行更新。（右旋轉和左旋轉類似）

　　　　　　B的平衡因子計算過程如下：（newBal(B)為左旋轉后B的平衡因子，oldBal(B)為原來的節點B的平衡因子，h為節點的高度）

　　　　　　D的平衡因子計算過程如下：

　　　　由於AVL Tree總是保持平衡，其put和get操作的復雜度能保持為O(log n)

7.總結

　　　　到目前為止，對於map（字典）數據結構，用二叉搜索樹和AVL樹實現了，也用有序列表和哈希表實現過，對應操作的復雜度如下：

8. 其他樹形結構

　　8.1 哈夫曼樹及哈夫曼編碼

　　　　參考：http://www.cnblogs.com/mcgrady/p/3329825.html

　　　　哈夫曼樹：哈夫曼樹是一種帶權路徑長度最短的二叉樹，也稱為最優二叉樹。（權：葉子節點的權重；路徑：根節點到葉子節點經過的線段）

　　　　　　　　　下圖中的帶權路徑長度分別為：

　　　　　　　　　　　圖a： WPL=5*2+7*2+2*2+13*2=54

　　　　　　　　　　　圖b： WPL=5*3+2*3+7*2+13*1=48

　　　　　　　　　可見，圖b的帶權路徑長度較小，我們可以證明圖b就是哈夫曼樹(也稱為最優二叉樹)。

　　　　構建哈夫曼樹步驟：　　　　　

　　　　　　　　1，將所有左，右子樹都為空的作為根節點。

　　　　　　　　2，在森林中選出兩棵根節點的權值最小的樹作為一棵新樹的左，右子樹，且置新樹的附加根節點的權值為其左，右子樹上根節點的權值之和。注意，左子樹的權值應小於右子樹的權值。

　　　　　　　　3，從森林中刪除這兩棵樹，同時把新樹加入到森林中。

　　　　　　　　4，重復2，3步驟，直到森林中只有一棵樹為止，此樹便是哈夫曼樹。

　　　　　　　　下面是構建哈夫曼樹的圖解過程：

　　　　哈夫曼編碼：利用哈夫曼樹求得的用於通信的二進制編碼稱為哈夫曼編碼。樹中從根到每個葉子節點都有一條路徑，對路徑上的各分支約定指向左子樹的分支表示”0”碼，指向右子樹的分支表示“1”碼，取每條路徑上的“0”或“1”的序列作為各個葉子節點對應的字符編碼，即是哈夫曼編碼。

上圖A，B，C，D對應的哈夫曼編碼分別為：111，10，110，0。用圖說明如下：

　　　　利用哈夫曼樹編碼字符竄和解碼： 首先統計字符竄中每個字符出現的頻率，以字符頻率為權重建立哈夫曼樹，得到每個字符的哈夫曼碼，最后對字符竄編碼。下面代碼利用哈夫曼樹對字符竄進行了編碼和解碼　　　

#哈夫曼樹節點
class HaffmanNode(object):

    def __init__(self,value=None,weight=None,leftchild=None,rightchild=None):  #value為統計字符，weight為字符出現頻率
        self.value = value
        self.weight = weight
        self.leftchild=leftchild
        self.rightchild = rightchild
        
    def is_leaf(self):   #判斷是否為葉子節點
        return not self.leftchild and not self.rightchild
        
    def __lt__(self,other):   #用於兩個對象間大小比較
        return self.weight<other.weight

#根據哈夫曼樹獲得哈夫曼碼        
def get_haffman_code(root,code,code_dict1,code_dict2):
    if root.is_leaf():
        code_dict1[root.value]=code     #進行編碼時使用
        code_dict2[code]=root.value     #進行解碼時使用
    else:
        get_haffman_code(root.leftchild, code+'0',code_dict1,code_dict2)
        get_haffman_code(root.rightchild, code+'1',code_dict1,code_dict2)

#根據字符頻率構建哈夫曼樹
import heapq
def build_haffman_tree(weight_dict):    
    hp=[]
    for value,weight in weight_dict.items():   #value為字符，weight為字符出現頻率
        heapq.heappush(hp,HaffmanNode(value,weight))
    while len(hp)>1:
        left = heapq.heappop(hp)
        right = heapq.heappop(hp)
        parent = HaffmanNode(weight=left.weight+right.weight,leftchild=left,rightchild=right)
        heapq.heappush(hp,parent)
    return hp[0]   #剩下最后元素即為haffman tree


weight_dict = {}
code_dict1={}    
code_dict2={}
#對字符竄astr進行哈夫曼編碼
def haff_encode(astr):
    for i in astr:
        if i not in weight_dict:
            weight_dict[i]=1
        else:
            weight_dict[i]+=1
    haffman_tree = build_haffman_tree(weight_dict)
    get_haffman_code(haffman_tree,'',code_dict1,code_dict2)
    encoded_astr = ''
    for i in astr:
        encoded_astr+=code_dict1[i]
    return encoded_astr

#解碼哈夫曼編碼后的字符竄
def haff_decode(encoded_astr,code_dict2):
    code = ''
    astr=''
    for i in encoded_astr:
        code = code+i
        if code in code_dict2:
            astr+=code_dict2[code]
            code=''
    return astr

astr="This is my big fancy house"
encoded_astr=haff_encode(astr)
print(encoded_astr)
decoded_astr = haff_decode(encoded_astr,code_dict2)
print(decoded_astr)

編碼和解碼字符串

　利用哈夫曼樹壓縮文件和解壓縮：

　　　　參考：https://www.jianshu.com/p/4cbbfed4160b

　　　　　　https://github.com/gg-z/huffman_coding

　　　　　　https://gist.github.com/Arianxx/603dc688a4b68f207ada2c4534758637

　　8.2 Trie樹（字典樹）

　　　　Trie樹：又稱字典樹或前綴樹，儲存單詞字符，方便用來進行詞頻統計和前綴匹配。Trie tree如圖所示：　　　　

　　Trie樹的特點：

　　　　　　除根節點外每個節點都包含字符

　　　　　　從根節點到葉子節點路徑上的字符組成一個完成單詞，

　　　　　　多個單詞的共同路徑節點即為公共前綴

　　Trie作用：

　　　　　　節約儲存內存；

　　　　　　前綴匹配時，搜索更快，時間復雜度為O(n), (n為單詞的長度)

　　　下面代碼用python實現了一個簡單Trie Tree

#Trie樹，字典樹
class TrieNode(object):
    def __init__(self,char):
        self.char = char
        self.child=[]
        self.is_leaf = False  #是否是葉子節點，即是否為一個完整單詞的最后一個字母
        self.counter = 1      #多少單詞有這個共同前綴
        
class TrieTree(object):
    def __init__(self):
        self.root = TrieNode(None)    
    
    #將一個單詞加入到Trie樹中    
    def add_trie_word(self,word):
        root = self.root
        for char in word:
            found = False
            for node in root.child:
                if node.char==char:
                    node.counter+=1
                    root = node
                    found = True
                    break
            if not found:
                temp = TrieNode(char)
                root.child.append(temp)
                root = temp
        root.is_leaf=True
    
    #查找某個單詞前綴是否在Trie樹，並返回有多少個單詞有這個共同前綴
    def search_trie_prefix(self,prefix):
        root = self.root
        if not root.child:
            return False,0
        for char in prefix:
            found=False
            for node in root.child:
                if node.char==char:
                    root=node
                    found=True
                    break
            if not found:
                return False,0
        return True,root.counter
        
trie_tree = TrieTree()
trie_tree.add_trie_word("hammer")
trie_tree.add_trie_word("ham")
trie_tree.add_trie_word("had")
print(trie_tree.search_trie_prefix("ha"))
print(trie_tree.search_trie_prefix("ham"))
print(trie_tree.search_trie_prefix("had"))
print(trie_tree.search_trie_prefix("b"))

Trie tree

Trie tree參考： https://www.cnblogs.com/huangxincheng/archive/2012/11/25/2788268.html

　　　https://towardsdatascience.com/implementing-a-trie-data-structure-in-python-in-less-than-100-lines-of-code-a877ea23c1a1

參考：http://interactivepython.org/runestone/static/pythonds/Trees/toctree.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 機器學習算法 --- Decision Trees Algorithms [數據結構]——二叉樹（Binary Tree）、二叉搜索樹（Binary Search Tree）及其衍生算法 [翻譯] 提升樹算法的介紹（Introduction to Boosted Trees）優化算法(Optimization algorithms) 決策樹(Decision Trees) 【分類算法】決策樹（Decision Tree） Decision tree(決策樹)算法初探分類算法之決策樹(Decision tree) 遞推算法，AI衍生樹（Tree）