SNP鑒定的標准流程有三個步驟:
一、mapping
使用BWA MEM將每個樣本的測序數據比對到組裝的參考序列。建立索引和比對:
$ bwa index ref.fa
$ bwa mem –t 40 ref.fa read1.fq read2.fq >aln-pe.sam
二、轉換
$ samtools view -bS aln-pe.sam > aln-pe.bam
$ samtools sort -@ 40 aln-6.bam > aln-6.sorted.bam
$ samtools sort –o aln-pe-sorted.bam aln-pe.bam
$ samtools index aln-pe.sorted.bam
三、SNP calling
$ samtools mpileup -uf Trinity-longest.fasta aln-6.sorted.bam | bcftools call -Ou -mv > var.raw.bcf
$ bcftools view var.raw.bcf | bcftools filter -e 'QUAL<30 || DP<20' > val.filter.vcf
$ bcftools view val.filter.vcf | bcftools filter -i 'TYPE="snp"' > val.snp.vcf
$ bcftools mpileup –Ou –f ref.fa aln-pe.sorted.bam | bcftools call –Ou –mv | bcftools filter -s LowQual -e ‘%QUAL<20 || DP<10’ > var.flt.vcf
https://wikis.utexas.edu/display/bioiteam/variant+calling+using+samtools
The basic Command line
Suppose we have reference sequences in ref.fa, indexed by samtools faidx, and position sorted alignment files aln1.bam and aln2.bam, the following command lines call SNPs and short INDELs:
- samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > var.raw.bcf
- bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf
where the -D option sets the maximum read depth to call a SNP. SAMtools acquires sample information from the SM tag in the @RG header lines. One alignment file can contain multiple samples; reads from one sample can also be distributed in different alignment files. SAMtools will regroup the reads anyway. In addition, if no @RG lines are present, each alignment file is taken as one sample.
Since r865, it is possible to generate the consensus sequence with
- samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq