Bowtie2的安裝與使用

Bowtie2用來快速比對短reads(50-100bp)與參考基因組，與常規的比對軟件不同的是（如blast），Bowtie在比對比較短的reads（less than 1024 base) 與較大的參考（基因組）時效果更好，也更快。

許多其他的軟件經常會調用Bowtie ，如常見的 TopHat ， Cufflinks 等

Read: GACTGGGCGATCTCGACTTCG
||||| |||||||||| |||
Reference: GACTG--CGATCTCGACATCG

與Bowtie1的區別

For reads longer than about 50 bp Bowtie 2 is generally faster, more sensitive, and uses less memory than Bowtie 1. For relatively short reads (e.g. less than 50 bp) Bowtie 1 is sometimes faster and/or more sensitive. B
Bowtie 2 supports gapped alignment with affine gap penalties. Number of gaps and gap lengths are not restricted, except by way of the configurable scoring scheme. Bowtie 1 finds just ungapped alignments.
Bowtie 2 supports local alignment, which doesn't require reads to align end-to-end. Local alignments might be "trimmed" ("soft clipped") at one or both extremes in a way that optimizes alignment score. Bowtie 2 also supports end-to-end alignment which, like Bowtie 1, requires that the read align entirely.
There is no upper limit on read length in Bowtie 2. Bowtie 1 had an upper limit of around 1000 bp.
Bowtie 2 allows alignments to overlap ambiguous characters (e.g. Ns) in the reference. Bowtie 1 does not.
Bowtie 2 does away with Bowtie 1's notion of alignment "stratum", and its distinction between "Maq-like" and "end-to-end" modes. In Bowtie 2 all alignments lie along a continuous spectrum of alignment scores where the scoring scheme, similar to Needleman-Wunsch and Smith-Waterman.
Bowtie 2's paired-end alignment is more flexible. E.g. for pairs that do not align in a paired fashion, Bowtie 2 attempts to find unpaired alignments for each mate.
Bowtie 2 reports a spectrum of mapping qualities, in contrast for Bowtie 1 which reports either 0 or high.
Bowtie 2 does not align colorspace reads.

Bowtie2的參數與基因組索引（index of genome)的格式都與Bowtie1不同

Bowtie的一些參數解釋（常見的），具體的見官方手冊

End to end alignment versus local alignment

End to end (全局比對）舉例：

local alignment example （局部比對）舉例

默認情況下，Bowtie2進行全局比對，也稱作 "untrimmed " or "unclopped" alignment

也可以使用參數 --local 進行局部比對，此時Bowtie2 可能會 "trim" or "clip" 短序列的首部或者尾部來最大化比對分數，分數越高，相似度越高。

比對的具體計分規則

軟件有默認的分數閾值，當一個比對的分數達到或超過這個閾值時，則認為是一個“有效” 的比對

全局比對默認值為：-0.6 + -0.6×read length

局部比對： 20 + 8.0 × ln(read length)

可以使用 --score-min 來設定閾值

Mapping quality : higher = more unique

因為基因組中存在着大量的重復序列，所以當一個read來自與多個重復或者相似的基因時，Bowtie2無法確定這個read到底來自於哪個基因。

所以Bowtie2用 mapping quality 來代表一個read來自於某個基因的確信度 ：Q = -10 log 10p

在SAM文件中后綴為 MAPQ

align paired-end inputs

Bowtie2支持常見的由測序儀產生的paired-end or mate-pair reads，使用參數 -1 -2 來表示一對pair-end 也就是雙端測序的reads，同時產生2個SAM文件。

參數 --ff --fr --rf用來指雙端測序兩個reads的方向

參數 -I /-X 來設定雙端測序兩個reads之間的距離（該設定會使Bowtie2的速度變慢），也叫作（outer distance）

By default, Bowtie 2 searches for both concordant and discordant alignments, though searching for discordant alignments can be disabled with the --no-discordant option.

所以當pair-end 沒有匹配時，會將reads當做非paired-end來再次進行比對

使用參數 --no-mixed 來取消這一默認設定

結果解讀

通常情況下，Bowtie2在尋找到一個有效比對后，還會繼續尋找分值相等或者更高的比對（貪婪），reads可能map到多個不同位置，而Bowtie2只會輸出分值最高的一個。當有多個分值相同的比對時，使用產生“偽隨機數”的方法來決定輸出哪一個

參數：-D 設置動態規划問題的上限

參數：-R 設置Bowtie2 繼續尋找的最大時間（一般不要修改，可能會錯失比對）

參數 -k 會報告每一個找到的有效比對，后加整數可以規定數目，找到的比對沒有特定的順序

參數 -a 報告每一個找到的有效比對，不加上限

Ambiguous characters

除了"ACGT"意外的任意非空白字符都被認為是 "ambiguous"。

"N"是參考基因組常見的一個 ambiguous字符，Bowtie2將參考基因組的所有ambiguous字符都當做"N"

參數 --np/ --n-ceil 設置允許ambiguous字符的上限

Bowtie2本身包含了許多預設，見documentation for the preset options