An illustration of relationships between alignment methods.
The applications / corresponding computational restrictions shown are (green) short pairwise alignment / detailed edit model;
(yellow) database search / divergent homology detection;
(red) whole genome alignment / alignment of long sequences with structural rearrangements;
and (blue) short read mapping / rapid alignment of massive numbers of short sequences. Although solely illustrative, methods with more similar data structures or algorithmic approaches are on closer branches.
The BLASR method combines data structures from short read alignment with optimization methods from whole genome alignment.
用過的比對軟件不多,只知道簡單的全局比對和局部比對算法,比對軟件的原理基本是不知道的。
現在用過的比對軟件:bwa、bowtie、blasr、SHRiMP、DALIGNER、MHAP、blast、blat、SOAP、Subread、NovoAlign、Maq
還有:MEGABLAST、Mummer、GMAP、STAR、DIAMOND、ELAND、RMAP、ZOOM、SeqMap、CloudBurst
慢慢積累,比較這些軟件的不同,因為生物信息最底層的就是比對,測序拿到一堆序列,第一件要做得事情就是比對。
先看一篇好文:Aligner tutorial: GMAP, STAR, BLAT, and BLASR
常用的核酸序列比對到底有哪幾種?
- 二代短reads比對到genome
- 三代長reads比對到genome
- 剪切體比對
- 二代reads與三代reads比
- genome之間比
- 多序列比對
- 數據庫比對
BWA
Burrows-Wheeler Aligner
適用范圍:二代測序數據快速比對到genome上
bwa作為序列比對界的模式軟件,短小精悍,適用於多種場合,很有必要搞懂他內部的比對算法,最好也搞懂它是如何實現的。
Fast and accurate short read alignment with Burrows–Wheeler transform - 2009 在線pdf 原文
lh3/bwa – Github Burrow-Wheeler Aligner for pairwise alignment between DNA sequences
- BWA-backtrack:illumina reads比對,最長支持100bp(aln/samse/sampe)
- BWA-SW:long-read比對,長度為70bp-1Mbp;支持剪切性比對(bwasw)
- BWA-MEM:最新,最常用,同SW,但更准更快,與backtrack相比在70-100bp更具性能優勢(mem)
BWA方面主要有三篇學術論文:
- Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-1760. [PMID: 19451168]. (if you use the BWA-backtrack algorithm)
- Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26, 589-595. [PMID: 20080505]. (if you use the BWA-SW algorithm)
- Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997v2 [q-bio.GN]. (if you use the BWA-MEM algorithm or the fastmap command, or want to cite the whole BWA package)
新一代測序技術中的短序列比對和組裝算法 - 碩士論文
Program: bwa (alignment via Burrows-Wheeler transformation) Version: 0.7.15-r1140 Contact: Heng Li <lh3@sanger.ac.uk> Usage: bwa <command> [options] Command: index index sequences in the FASTA format mem BWA-MEM algorithm fastmap identify super-maximal exact matches pemerge merge overlapping paired ends (EXPERIMENTAL) aln gapped/ungapped alignment samse generate alignment (single ended) sampe generate alignment (paired ended) bwasw BWA-SW for long queries shm manage indices in shared memory fa2pac convert FASTA to PAC format pac2bwt generate BWT from PAC pac2bwtgen alternative algorithm for generating BWT bwtupdate update .bwt to the new format bwt2sa generate SA from BWT and Occ Note: To use BWA, you need to first index the genome with `bwa index'. There are three alignment algorithms in BWA: `mem', `bwasw', and `aln/samse/sampe'. If you are not sure which to use, try `bwa mem' first. Please `man ./bwa.1' for the manual.
bwa mem
bwa現在大家基本只用其mem比對算法了
還是單獨開一片筆記吧
SOAPaligner/soap2
soap2 - 官方
SOAP系列的沒有公布源碼,都是二進制執行程序,所以免除了安裝,同bwa一樣,也是要先建索引再比對
SOAP不是很吃內存,把人的3G的基因組讀到內存大概也就需要7G的內存,后面的比對都是不耗內存的。
./2bwt-builder ~/human_genome.fa ./soap –a <reads_a> -D <index.files> -o <output></output> ./soap –a <reads_a> -b <reads_b> -D <index.files> -o <PE_output> -2 <SE_output> -m <min_insert_size> -x <max_insert_size>
之前對SOAP一點印象都沒有,但是不少同事都在用SOAP系列的軟件。
主要是看了一個PPT,SOAP是有其比對上的優勢的
可以看出,SOAP對錯誤率的容忍較高,對indel的容忍也很好,這就是我現在需要的,可以嘗試一下用SOAP將二代比對到三代上。Mapping.ppt
BLASR
Basic Local Alignment with Successive Refinement
Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory - BMC Bioinformatics
待續~