做完基因组组装后,通常要评估基因组的质量,目前可以采用二三代数据比对回基因组看比对率和coverage,BUSCO,LAI等。转录组数据比对率也是我们常用的方法,通常可以排除转录组数据是否存在问题,也方便在后续注释中排除转录组数据样本的问题。
可以用HiSAT2比对来查看比对率:
pe1=5_1.clean.fq.gz pe2=5_2.clean.fq.gz ref=/path/assembly_final.fasta ref_index=/path/assembly_final.fasta dir=/path/2.hisat2 hisat2-build -p 4 $ref $ref_index cd $dir hisat2 -p 8 -t -x $ref_index -1 $pe1 -2 $pe2 --no-unal --un-conc $dir --dta 2>output.GeneMapStat.xls | samtools view -bS - > $dir/output.bam samtools sort -@ 8 -o output.sorted.bam output.bam samtools index output.sorted.bam
结果如下:
cmd>cat output.GeneMapStat.xls Time loading forward index: 00:00:00 Time loading reference: 00:00:00 Multiseed full-index search: 01:10:44 24141982 reads; of these: 24141982 (100.00%) were paired; of these: 2457907 (10.18%) aligned concordantly 0 times 19668819 (81.47%) aligned concordantly exactly 1 time 2015256 (8.35%) aligned concordantly >1 times ---- 2457907 pairs aligned concordantly 0 times; of these: 280399 (11.41%) aligned discordantly 1 time ---- 2177508 pairs aligned 0 times concordantly or discordantly; of these: 4355016 mates make up the pairs; of these: 2320566 (53.28%) aligned 0 times 1297150 (29.79%) aligned exactly 1 time 737300 (16.93%) aligned >1 times 95.19% overall alignment rate Time searching: 01:10:44 Overall time: 01:10:44
通常比对率大于90%表明比对率较好。