比對軟件 - 專題


image

An illustration of relationships between alignment methods.

The applications / corresponding computational restrictions shown are (green) short pairwise alignment / detailed edit model;

(yellow) database search / divergent homology detection;

(red) whole genome alignment / alignment of long sequences with structural rearrangements;

and (blue) short read mapping / rapid alignment of massive numbers of short sequences. Although solely illustrative, methods with more similar data structures or algorithmic approaches are on closer branches.

The BLASR method combines data structures from short read alignment with optimization methods from whole genome alignment.

用過的比對軟件不多,只知道簡單的全局比對和局部比對算法,比對軟件的原理基本是不知道的。

現在用過的比對軟件:bwa、bowtie、blasr、SHRiMP、DALIGNER、MHAP、blast、blat、SOAP、Subread、NovoAlign、Maq

還有:MEGABLAST、Mummer、GMAP、STAR、DIAMOND、ELAND、RMAP、ZOOM、SeqMap、CloudBurst

慢慢積累,比較這些軟件的不同,因為生物信息最底層的就是比對,測序拿到一堆序列,第一件要做得事情就是比對。

先看一篇好文:Aligner tutorial: GMAP, STAR, BLAT, and BLASR

常用的核酸序列比對到底有哪幾種?

  1. 二代短reads比對到genome
  2. 三代長reads比對到genome
  3. 剪切體比對
  4. 二代reads與三代reads比
  5. genome之間比
  6. 多序列比對
  7. 數據庫比對

BWA


Burrows-Wheeler Aligner

適用范圍:二代測序數據快速比對到genome上

bwa作為序列比對界的模式軟件,短小精悍,適用於多種場合,很有必要搞懂他內部的比對算法,最好也搞懂它是如何實現的。

Fast and accurate short read alignment with Burrows–Wheeler transform  - 2009  在線pdf    原文

lh3/bwa – Github    Burrow-Wheeler Aligner for pairwise alignment between DNA sequences

  1. BWA-backtrack:illumina reads比對,最長支持100bp(aln/samse/sampe
  2. BWA-SW:long-read比對,長度為70bp-1Mbp;支持剪切性比對(bwasw
  3. BWA-MEM:最新,最常用,同SW,但更准更快,與backtrack相比在70-100bp更具性能優勢(mem

BWA方面主要有三篇學術論文:

  1. Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-1760. [PMID: 19451168]. (if you use the BWA-backtrack algorithm)
  2. Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26, 589-595. [PMID: 20080505]. (if you use the BWA-SW algorithm)
  3. Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997v2 [q-bio.GN]. (if you use the BWA-MEM algorithm or the fastmap command, or want to cite the whole BWA package)

BWA的設計思想

新一代測序技術中的短序列比對和組裝算法 - 碩士論文

image

Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.15-r1140
Contact: Heng Li <lh3@sanger.ac.uk>

Usage:   bwa <command> [options]

Command: index         index sequences in the FASTA format
         mem           BWA-MEM algorithm
         fastmap       identify super-maximal exact matches
         pemerge       merge overlapping paired ends (EXPERIMENTAL)
         aln           gapped/ungapped alignment
         samse         generate alignment (single ended)
         sampe         generate alignment (paired ended)
         bwasw         BWA-SW for long queries

         shm           manage indices in shared memory
         fa2pac        convert FASTA to PAC format
         pac2bwt       generate BWT from PAC
         pac2bwtgen    alternative algorithm for generating BWT
         bwtupdate     update .bwt to the new format
         bwt2sa        generate SA from BWT and Occ

Note: To use BWA, you need to first index the genome with `bwa index'.
      There are three alignment algorithms in BWA: `mem', `bwasw', and
      `aln/samse/sampe'. If you are not sure which to use, try `bwa mem'
      first. Please `man ./bwa.1' for the manual.

實用算法實現-第8篇 后綴樹和后綴數組 [1簡介]

bwa mem

bwa現在大家基本只用其mem比對算法了

還是單獨開一片筆記吧

 

SOAPaligner/soap2

soap2 - 官方

SOAP系列的沒有公布源碼,都是二進制執行程序,所以免除了安裝,同bwa一樣,也是要先建索引再比對

SOAP不是很吃內存,把人的3G的基因組讀到內存大概也就需要7G的內存,后面的比對都是不耗內存的。

./2bwt-builder ~/human_genome.fa
./soap –a <reads_a> -D <index.files> -o <output></output>
./soap –a <reads_a> -b <reads_b> -D <index.files> -o <PE_output> -2 <SE_output> -m <min_insert_size> -x <max_insert_size>

之前對SOAP一點印象都沒有,但是不少同事都在用SOAP系列的軟件。

主要是看了一個PPT,SOAP是有其比對上的優勢的

imageimage

可以看出,SOAP對錯誤率的容忍較高,對indel的容忍也很好,這就是我現在需要的,可以嘗試一下用SOAP將二代比對到三代上。Mapping.ppt

 

 

BLASR


Basic Local Alignment with Successive Refinement

Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory - BMC Bioinformatics

 

待續~


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM