Bowtie2用來快速比對短reads(50-100bp)與參考基因組,與常規的比對軟件不同的是(如blast),Bowtie在比對比較短的reads(less than 1024 base) 與 較大的參考(基因組) 時效果更好,也更快。
許多其他的軟件經常會調用Bowtie ,如常見的 TopHat , Cufflinks 等
- Read: GACTGGGCGATCTCGACTTCG
- ||||| |||||||||| |||
- Reference: GACTG--CGATCTCGACATCG
與Bowtie1的區別
-
For reads longer than about 50 bp Bowtie 2 is generally faster, more sensitive, and uses less memory than Bowtie 1. For relatively short reads (e.g. less than 50 bp) Bowtie 1 is sometimes faster and/or more sensitive. B
-
Bowtie 2 supports gapped alignment with affine gap penalties. Number of gaps and gap lengths are not restricted, except by way of the configurable scoring scheme. Bowtie 1 finds just ungapped alignments.
-
Bowtie 2 supports local alignment, which doesn't require reads to align end-to-end. Local alignments might be "trimmed" ("soft clipped") at one or both extremes in a way that optimizes alignment score. Bowtie 2 also supports end-to-end alignment which, like Bowtie 1, requires that the read align entirely.
-
There is no upper limit on read length in Bowtie 2. Bowtie 1 had an upper limit of around 1000 bp.
-
Bowtie 2 allows alignments to overlap ambiguous characters (e.g.
N
s) in the reference. Bowtie 1 does not. -
Bowtie 2 does away with Bowtie 1's notion of alignment "stratum", and its distinction between "Maq-like" and "end-to-end" modes. In Bowtie 2 all alignments lie along a continuous spectrum of alignment scores where the scoring scheme, similar to Needleman-Wunsch and Smith-Waterman.
-
Bowtie 2's paired-end alignment is more flexible. E.g. for pairs that do not align in a paired fashion, Bowtie 2 attempts to find unpaired alignments for each mate.
-
Bowtie 2 reports a spectrum of mapping qualities, in contrast for Bowtie 1 which reports either 0 or high.
-
Bowtie 2 does not align colorspace reads.
Bowtie2的參數與基因組索引(index of genome)的格式都與Bowtie1不同
Bowtie的一些參數解釋(常見的),具體的見官方手冊
End to end alignment versus local alignment
End to end (全局比對)舉例:
local alignment example (局部比對)舉例
默認情況下,Bowtie2進行全局比對,也稱作 "untrimmed " or "unclopped" alignment
也可以使用參數 --local 進行局部比對,此時Bowtie2 可能會 "trim" or "clip" 短序列的首部或者尾部來最大化比對分數,分數越高,相似度越高。
軟件有默認的分數閾值,當一個比對的分數達到或超過這個閾值時,則認為是一個“有效” 的比對
全局比對默認值為:-0.6 + -0.6×read length
局部比對: 20 + 8.0 × ln(read length)
可以使用 --score-min 來設定閾值
Mapping quality : higher = more unique
因為基因組中存在着大量的重復序列,所以當一個read來自與多個重復或者相似的基因時,Bowtie2無法確定這個read到底來自於哪個基因。
所以Bowtie2用 mapping quality 來代表一個read來自於某個基因的確信度 :Q = -10 log 10p
在SAM文件中后綴為 MAPQ
align paired-end inputs
Bowtie2支持常見的由測序儀產生的paired-end or mate-pair reads,使用參數 -1 -2 來表示一對pair-end 也就是雙端測序的reads,同時產生2個SAM文件。
參數 --ff --fr --rf用來指雙端測序兩個reads的方向
參數 -I /-X 來設定雙端測序兩個reads之間的距離(該設定會使Bowtie2的速度變慢),也叫作(outer distance)
By default, Bowtie 2 searches for both concordant and discordant alignments, though searching for discordant alignments can be disabled with the --no-discordant
option.
所以當pair-end 沒有匹配時,會將reads當做非paired-end來再次進行比對
使用參數 --no-mixed 來取消這一默認設定
結果解讀
通常情況下,Bowtie2在尋找到一個有效比對后,還會繼續尋找分值相等或者更高的比對(貪婪),reads可能map到多個不同位置,而Bowtie2只會輸出分值最高的一個。當有多個分值相同的比對時,使用產生“偽隨機數”的方法來決定輸出哪一個
參數:-D 設置動態規划問題的上限
參數:
-R 設置Bowtie2 繼續尋找的最大時間 (一般不要修改,可能會錯失比對)
參數 -k 會報告每一個找到的有效比對,后加整數可以規定數目,找到的比對沒有特定的順序
參數 -a 報告每一個找到的有效比對,不加上限
Ambiguous characters
除了"ACGT"意外的任意非空白字符都被認為是 "ambiguous"。
"N"是參考基因組常見的一個 ambiguous字符,Bowtie2將參考基因組的所有ambiguous字符都當做"N"
參數 --np/ --n-ceil 設置允許ambiguous字符的上限
Bowtie2本身包含了許多預設,見documentation for the preset options