DNA序列組裝(貪婪算法)


生物信息學原理作業第四彈:DNA序列組裝(貪婪算法)

原理:生物信息學(孫嘯)

大致思想:

      1. 找到權值最大的邊;

      2. 除去以最大權值邊的起始頂點為起始頂點的邊;

      3. 除去以最大權值邊為終點為終點的邊;

      4. 重復上述步驟,得到所有符合條件的邊;

      5. 拼接得到的邊;

      6. 加入孤立點(如果有)。

附上Python代碼,如果有問題我會及時更正(確實不太熟算法)

DNA序列組裝(貪婪算法)

轉載請保留出處!

 1 # -*- coding: utf-8 -*-
 2 """
 3 Created on Mon Dec 4 15:04:39 2017  4 @author: zxzhu  5 python3.6  6 """
 7 from functools import reduce  8 def get_weight(s1,s2):              #通過兩條序列的overlap計算出權值
 9     l = min(len(s1),len(s2))  10     while l>0:  11         if s2[:l] == s1[-l:]:  12             return l  13         else:  14             l-=1
 15     return 0  16 
 17 def print_result(s1,s2):           #將兩條序列去除首尾overlap后合並
 18     weight = get_weight(s1,s2)  19     s = s1 + s2[weight:]  20     #print(s)
 21     return s  22 
 23 def dir_graph(l,t=3):             #得到滿足條件的有向圖(權值大於等於t)
 24     graph = {}  25     for i in l:  26         for j in l:  27             if i!=j:  28                 weight = get_weight(i,j)  29                 if weight >= t:  30                     key = (i,j)  31                     graph[key] = weight  32     return graph  33 
 34 def rm_path(graph,path):           #貪婪算法加入一條邊后應該去除與該邊首尾頂點相同的邊
 35     key = graph.keys()  36     rm_key = []  37     for i in key:  38         if i[1] == path[1] or i[0] == path[0]:  39  rm_key.append(i)  40     for i in rm_key:  41  graph.pop(i)  42     return graph  43 
 44 def get_path(graph,path = []):     #得到滿足條件的所有邊
 45     while graph:  46         max_weight = 0  47         for i in graph.keys():  48             if graph[i] > max_weight:  49                 max_weight = graph[i]  50                 cur_path = i  51  path.append(cur_path)  52         graph = rm_path(graph,cur_path)  53  get_path(graph,path)  54     return path  55 
 56 def out_num(path,V):             #計算某頂點的出度
 57     count = 0  58     for i in path:  59         if i[0] == V:  60             count+=1
 61     return count  62 
 63 def get_last_V(path,last_V = None):           #得到最后一條邊
 64     index = 0  65     if last_V:                                #非隨機尋找出度為0的頂點
 66         for i in path:  67             if i[1] == last_V:  68                 return i,index  69             else:  70                 index+=1
 71         return None                           #沒有找到指向last_V的頂點(一條路徑結束)
 72     else:                                     #隨機尋找出度為0的頂點
 73         for i in path:  74             if out_num(path,i[1]) == 0:  75                 return i,index  76             else:  77                 index+=1
 78         return -1                             #首尾相連
 79         
 80            
 81 def assemble(cur_V,path,new_path = []):       #給滿足條件的邊排序
 82     while path:  83         path.pop(cur_V[1])  84  new_path.insert(0,cur_V[0])  85         cur_V = get_last_V(path,last_V = cur_V[0][0])  86         if cur_V:  87  assemble(cur_V,path,new_path)  88         else:  89             cur_V = get_last_V(path)  90  assemble(cur_V,path,new_path)  91     return new_path  92 
 93 def align_isolated(path,sequence):          #加入孤立頂點
 94     new_path = reduce(lambda x,y:x+y,path)  95     for i in sequence:  96         if i not in new_path:  97  new_path.append(i)  98     return new_path  99 
100 x = 'CCTTTTGG'
101 y = 'TTGGCAATCACT'
102 w = 'AGTATTGGCAATC'
103 u = 'ATGCAAACCT'
104 z = 'AATCGATG'
105 v = 'TCACTCCTTTT'
106 a = w 107 b = y 108 c = 'TCACTAGTA'
109 sequence = [x,y,w,u,z] 110 sequence1 = [a,b,c] 111 graph = dir_graph(sequence1,t=3) 112 print(graph) 113 path = get_path(graph) 114 path = [list(i) for i in path]              #將path中的tuple元素換成list
115 #print(path)
116 start = get_last_V(path)                    #起始出度為0的頂點所在的邊
117 if start == -1:                             #序列首尾相連
118     new_path = reduce(lambda x,y:x+y, path) 119     new_path = new_path[:-1] 120     result = reduce(print_result,new_path) 121 else: 122     new_path = assemble(start,path)             #排序后的邊
123     new_path = align_isolated(new_path,sequence1)   #加入孤立頂點
124     #print(new_path)
125     result = reduce(print_result,new_path)      #組裝
126 #print(new_path)
127 print(result)

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM