Needleman–Wunsch 算法的代碼實現

本文轉載自查看原文 2019-07-30 23:06 389 算法/ 動態規划/ Python

# -*- coding: utf-8 -*-
"""   
    :Author: huangsh
    :Date: 19-7-28 下午19:17
    :Description: 使用bidu Needleman–Wunsch 算法來計算兩條序列的最大相似得分  
    如果您對此算法不熟悉，可以去看看我寫的一篇拙文：https://www.jianshu.com/p/002bbebcaaef
"""

from collections import namedtuple

F = namedtuple('F', ('score', 'pointer'))

## 初始化二維矩陣， # 生成 x行，y列的二維矩陣，初始化第0行，0列的元素
def init_array(x, y):  
    array = [[0] * (y) for _ in range(x)]
    array[0][0] = F(0, None)
    for j in range(1, y):
        array[0][j] = F((-5)*j, [0, j-1])
    for i in range(1, x):
        array[i][0] = F((-5)*i, [i-1, 0])
    return array

## 一行一行的計算矩陣中的每個各自中的最優結果。當前格子中的最優結果由它的三個來源推出
def compute(array, seq1, seq2):
    row, col = len(seq2), len(seq1)
    for i in range(1, row+1):
        for j in range(1, col+1):
            if seq1[j-1] == seq2[i-1]:  # 這里簡化了得分矩陣，完全匹配得10分，不完全得5分，有gap減5分
                s = 10
            else:
                s = 5  
            lu = [array[i-1][j-1].score+s, [i-1, j-1]] # idx 0：最大得分，idx 1：來源坐標
            left = [array[i-1][j].score-5, [i-1, j]]
            up = [array[i][j-1].score-5, [i, j-1]]
            max_choice = max([lu,left, up], key=lambda x: x[0])
            score= max_choice[0]
            pointer = max_choice[1]
            array[i][j] = F(score, pointer)  # 在當前保存最大得分，和來源坐標，方便回溯。
    return array

## 回溯。從（m,n）一直回溯到（0，0）
def backtrack(array, seq1, seq2):
    s1 = []
    s2 = []
    row, col = len(seq2), len(seq1)
    while array[row][col].score != 0:
        i, j = array[row][col].pointer # pointer 指向來源方的坐標
        if i+1 == row and j+1 == col: # 左上方
            s1.append(seq1[col-1])
            s2.append(seq2[row-1])
            row, col = i, j
        elif row == i+1 and col == j: # 來源：上方
            s1.append("-")
            s2.append(seq2[i])
            row, col = i, j
        elif row == i and col == j+1: # 左方
            s1.append(seq1[j])
            s2.append("-")
            row, col = i, j
    s1 = ''.join(s1[::-1])  #因為是從最后往前回溯的，需要將逆轉一下list
    s2 = ''.join(s2[::-1])
    return s1, s2

def main(seq1, seq2):
    x, y = len(seq2)+1 , len(seq1)+1 # x是矩陣行數，y是矩陣列數

    array = init_array(x, y)
    array = compute(array, seq1, seq2)
    s1, s2 = backtrack(array, seq1, seq2)
    max_score = array[x-1][y-1].score

    print("最大得分：", max_score)
    print(s1)
    print(s2)

if __name__ == '__main__':
    seq1 = "ATCGCGCAACTGCGCGC"
    seq2 = "ACGCGCACTGCGGC"
    main(seq1, seq2)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Global Alignment（全局比對）--從算法（Needleman-Wunsch）到python實現字符串與模式匹配算法（六）：Needleman–Wunsch算法 KMP算法的代碼實現 ecc算法的代碼實現 KNN算法原理以及代碼實現 CatBoost算法和GPU測試（python代碼實現） kmeans算法理解及代碼實現幾行代碼實現負載均衡輪詢算法國密算法Java代碼的標准實現感知機算法（PLA）代碼實現