RNA-seq數據的比對結果怎么解讀?網上有很多人問,這里做一個大致的總結。
Hisat2和bowtie2比對后產生的Alignment summary的格式是一樣的,如下:
Alignment summary
When HISAT2 finishes running, it prints messages summarizing what happened. These messages are printed to the "standard error" ("stderr") filehandle. For datasets consisting of unpaired reads, the summary might look like this:
單端數據比對的結果如下:
20000 reads; of these:
20000 (100.00%) were unpaired; of these:
1247 (6.24%) aligned 0 times
18739 (93.69%) aligned exactly 1 time
14 (0.07%) aligned >1 times
93.77% overall alignment rate
For datasets consisting of pairs, the summary might look like this:
雙端數據比對的結果如下:
10000 reads; of these:
10000 (100.00%) were paired; of these:
650 (6.50%) aligned concordantly 0 times
8823 (88.23%) aligned concordantly exactly 1 time
527 (5.27%) aligned concordantly >1 times
----
650 pairs aligned concordantly 0 times; of these:
34 (5.23%) aligned discordantly 1 time
----
616 pairs aligned 0 times concordantly or discordantly; of these:
1232 mates make up the pairs; of these:
660 (53.57%) aligned 0 times
571 (46.35%) aligned exactly 1 time
1 (0.08%) aligned >1 times
96.70% overall alignment rate
(copy自:Hisat2官網)
單端的就沒什么好說的了,主要看雙端序列比對:
650 (6.50%) aligned concordantly 0 times 是什么意思? 其實第一部分描述的是pair-end模式下的一致比對結果,aligned concordantly就是read1和read2同時合理的比對到了基因組/轉錄組上。
8823 (88.23%) aligned concordantly exactly 1 time,exactly 1 time 就是只有一種比對結果。>1 times 就是read1和read2可以同時比對到多個地方。
第二部分,pair-end模式下不一致的比對結果。
650 pairs aligned concordantly 0 times; of these: 34 (5.23%) aligned discordantly 1 time
concordantly 0 times就是read1和read2不能同時合理的同時比對到基因組/轉錄組上,aligned discordantly 1 time 最難理解。
以下是bowtie2官網對 aligned discordantly 的解釋:
Concordant pairs match pair expectations, discordant pairs don’t
A pair that aligns with the expected relative mate orientation and with the expected range of distances between mates is said to align “concordantly”. If both mates have unique alignments, but the alignments do not match paired-end expectations (i.e. the mates aren’t in the expected relative orientation, or aren’t within the expected distance range, or both), the pair is said to align “discordantly”. Discordant alignments may be of particular interest, for instance, when seeking structural variants.
The expected relative orientation of the mates is set using the --ff
, --fr
, or --rf
options. The expected range of inter-mates distances (as measured from the furthest extremes of the mates; also called “outer distance”) is set with the -I
and -X
options. Note that setting -I
and -X
far apart makes Bowtie 2 slower. See documentation for -I
and -X
.
To declare that a pair aligns discordantly, Bowtie 2 requires that both mates align uniquely. This is a conservative threshold, but this is often desirable when seeking structural variants.
By default, Bowtie 2 searches for both concordant and discordant alignments, though searching for discordant alignments can be disabled with the --no-discordant
option.
Mixed mode: paired where possible, unpaired otherwise
If Bowtie 2 cannot find a paired-end alignment for a pair, by default it will go on to look for unpaired alignments for the constituent mates. This is called “mixed mode.” To disable mixed mode, set the --no-mixed
option.
Bowtie 2 runs a little faster in --no-mixed
mode, but will only consider alignment status of pairs per se, not individual mates.
重點:If both mates have unique alignments, but the alignments do not match paired-end expectations (i.e. the mates aren’t in the expected relative orientation, or aren’t within the expected distance range, or both), the pair is said to align “discordantly”.
discordantly比對就是read1和read2都能比對上,但是不合理,
1. 比對方向不對,pair-end測序的方向是固定的;
2.read1和read2的插入片段長度是有限的
第三部分就是對剩余reads(既不能concordantly,也不能discordantly 1 time)的單端模式的比對,沒什么好說的。
以上的理解來自SEQanswers:
16182999 reads; of these:
16182999 (100.00%) were paired; of these:
5731231 (35.42%) aligned concordantly 0 times
4522376 (27.95%) aligned concordantly exactly 1 time
5929392 (36.64%) aligned concordantly >1 times
----
5731231 pairs aligned concordantly 0 times; of these:
2381431 (41.55%) aligned discordantly 1 time
----
3349800 pairs aligned 0 times concordantly or discordantly; of these:
6699600 mates make up the pairs; of these:
3814736 (56.94%) aligned 0 times
1883429 (28.11%) aligned exactly 1 time
1001435 (14.95%) aligned >1 times
88.21% overall alignment rate
the Bowtie2 result summary is divided in 3 sections:
Concordant alignment - In your data (4522376 + 5929392) reads align concordantly. Which is 64.59% of reads
Discordant alignment - So now 5731231 reads remain which is 35.41% (100-64.59). Of these, 2381431 reads align discordantly. That is to say, of the non-concordant fraction, 41.55% of reads (2381431 reads) align discordantly.
The rest - Now, remember that alignment whether concord. or discord., but both are aligned in paired-end mode. The rest of the reads either align as singles (i.e. Read1 in one locus & Read2 in completely different locus or one mate aligned and the other unaligned) or may not align at all. So the reads that are in this section is Total -(Concord.+Discord.). That is 16182999 -(10451768+2381431) = 3349800 reads.
Since alignment, if any, here is in single fashion so we calculate in mates (readsx2).
Now to reach the overall alignment, count the mates in total (i.e. mates aligned in paired and mates aligned in single fashion). That would be -
(10451768 x2)+(2381431 x2)+1883429+1001435 = 28551262 mates
That is 28551262 mates aligned of total (16182999 x2) mates, which is 88.21%.
為什么要區分這么多比對類型,簡單點不好嗎?no
在具體項目分析時你就會體會到這些比對類型信息的重要性,有時間再講。see you next time.