最近接觸的數據都是靶向測序,或者全外測序的數據。對數據的覆蓋深度及靶向捕獲效率的評估成為了數據質量監控中必不可少的一環。
以前都是用samtools depth 算出單鹼基的深度后,用perl來進行深度及捕獲效率的計算。今天無意中看到了bamdst(https://github.com/shiquan/bamdst)這個軟件,用起來也很方便,參考GitHub,在此記錄使用方法。
下載並安裝:下載安裝包並解壓后,
cd ./bamdst-master make
安裝好后,需要准備.bed文件及.bam文件,以軟件提供的MT-RNR1.bed和test.bam為例:
./bamdst-master/bamdst -p ./bamdst-master/example/MT-RNR1.bed -o ./test ./bamdst-master/example/test.bam
輸出的結果包含7個文件,為:
-coverage.report -cumu.plot -insert.plot -chromosome.report -region.tsv.gz -depth.tsv.gz -uncover.bed
其中coverage.report提供的信息很多,具體可參照如下:
[Total] Raw Reads (All reads) // All reads in the bam file(s). [Total] QC Fail reads // Reads number failed QC, this flag is marked by other software,like bwa. See flag in the bam structure. [Total] Raw Data(Mb) // Total reads data in the bam file(s). [Total] Paired Reads // Paired reads numbers. [Total] Mapped Reads // Mapped reads numbers. [Total] Fraction of Mapped Reads // Ratio of mapped reads against raw reads. [Total] Mapped Data(Mb) // Mapped data in the bam file(s). [Total] Fraction of Mapped Data(Mb) // Ratio of mapped data against raw data. [Total] Properly paired // Paired reads with properly insert size. See bam format protocol for details. [Total] Fraction of Properly paired // Ratio of properly paired reads against mapped reads [Total] Read and mate paired // Read (read1) and mate read (read2) paired. [Total] Fraction of Read and mate paired // Ratio of read and mate paired against mapped reads [Total] Singletons // Read mapped but mate read unmapped, and vice versa. [Total] Read and mate map to diff chr // Read and mate read mapped to different chromosome, usually because mapping error and structure variants. [Total] Read1 // First reads in mate paired sequencing [Total] Read2 // Mate reads [Total] Read1(rmdup) // First reads after remove duplications. [Total] Read2(rmdup) // Mate reads after remove duplications. [Total] forward strand reads // Number of forward strand reads. [Total] backward strand reads // Number of backward strand reads. [Total] PCR duplicate reads // PCR duplications. [Total] Fraction of PCR duplicate reads // Ratio of PCR duplications. [Total] Map quality cutoff value // Cutoff map quality score, this value can be set by -q. default is 20, because some variants caller like GATK only consider high quality reads. [Total] MapQuality above cutoff reads // Number of reads with higher or equal quality score than cutoff value. [Total] Fraction of MapQ reads in all reads // Ratio of reads with higher or equal Q score against raw reads. [Total] Fraction of MapQ reads in mapped reads // Ratio of reads with higher or equal Q score against mapped reads. [Target] Target Reads // Number of reads covered target region (specified by bed file). [Target] Fraction of Target Reads in all reads // Ratio of target reads against raw reads. [Target] Fraction of Target Reads in mapped reads // Ratio of target reads against mapped reads. [Target] Target Data(Mb) // Total bases covered target region. If a read covered target region partly, only the covered bases will be counted. [Target] Target Data Rmdup(Mb) // Total bases covered target region after remove PCR duplications. [Target] Fraction of Target Data in all data // Ratio of target bases against raw bases. [Target] Fraction of Target Data in mapped data // Ratio of target bases against mapped bases. [Target] Len of region // The length of target regions. [Target] Average depth // Average depth of target regions. Calculated by "target bases / length of regions". [Target] Average depth(rmdup) // Average depth of target regions after remove PCR duplications. [Target] Coverage (>0x) // Ratio of bases with depth greater than 0x in target regions, which also means the ratio of covered regions in target regions. [Target] Coverage (>=4x) // Ratio of bases with depth greater than or equal to 4x in target regions. [Target] Coverage (>=10x) // Ratio of bases with depth greater than or equal to 10x in target regions. [Target] Coverage (>=30x) // Ratio of bases with depth greater than or equal to 30x in target regions. [Target] Coverage (>=100x) // Ratio of bases with depth greater than or equal to 100x in target regions. [Target] Coverage (>=Nx) // This is addtional line for user self-defined cutoff value, see --cutoffdepth [Target] Target Region Count // Number of target regions. In normal practise,it is the total number of exomes. [Target] Region covered > 0x // The number of these regions with average depth greater than 0x. [Target] Fraction Region covered > 0x // Ratio of these regions with average depth greater than 0x. [Target] Fraction Region covered >= 4x // Ratio of these regions with average depth greater than or equal to 4x. [Target] Fraction Region covered >= 10x // Ratio of these regions with average depth greater than or equal to 10x. [Target] Fraction Region covered >= 30x // Ratio of these regions with average depth greater than or equal to 30x. [Target] Fraction Region covered >= 100x // Ratio of these regions with average depth greater than or equal to 100x. [flank] flank size // The flank size will be count. 200 bp in default. Oligos could also capture the nearby regions of target regions. [flank] Len of region (not include target region) // The length of flank regions (target regions will not be count). [flank] Average depth // Average depth of flank regions. [flank] flank Reads // The total number of reads covered the flank regions. Note: some reads covered the edge of target regions, will be count in flank regions also. [flank] Fraction of flank Reads in all reads // Ratio of reads covered in flank regions against raw reads. [flank] Fraction of flank Reads in mapped reads // Ration of reads covered in flank regions against mapped reads. [flank] flank Data(Mb) // Total bases in the flank regions. [flank] Fraction of flank Data in all data // Ratio of total bases in the flank regions against raw data. [flank] Fraction of flank Data in mapped data // Ratio of total bases in the flank regions against mapped data. [flank] Coverage (>0x) // Ratio of flank bases with depth greater than 0x. [flank] Coverage (>=4x) // Ratio of flank bases with depth greater than or equal to 4x. [flank] Coverage (>=10x) // Ratio of flank bases with depth greater than or equal to 10x. [flank] Coverage (>=30x) // Ratio of flank bases with depth greater than or equal to 30x.