做完基因組組裝后,通常要評估基因組的質量,目前可以采用二三代數據比對回基因組看比對率和coverage,BUSCO,LAI等。轉錄組數據比對率也是我們常用的方法,通常可以排除轉錄組數據是否存在問題,也方便在后續注釋中排除轉錄組數據樣本的問題。
可以用HiSAT2比對來查看比對率:
pe1=5_1.clean.fq.gz pe2=5_2.clean.fq.gz ref=/path/assembly_final.fasta ref_index=/path/assembly_final.fasta dir=/path/2.hisat2 hisat2-build -p 4 $ref $ref_index cd $dir hisat2 -p 8 -t -x $ref_index -1 $pe1 -2 $pe2 --no-unal --un-conc $dir --dta 2>output.GeneMapStat.xls | samtools view -bS - > $dir/output.bam samtools sort -@ 8 -o output.sorted.bam output.bam samtools index output.sorted.bam
結果如下:
cmd>cat output.GeneMapStat.xls Time loading forward index: 00:00:00 Time loading reference: 00:00:00 Multiseed full-index search: 01:10:44 24141982 reads; of these: 24141982 (100.00%) were paired; of these: 2457907 (10.18%) aligned concordantly 0 times 19668819 (81.47%) aligned concordantly exactly 1 time 2015256 (8.35%) aligned concordantly >1 times ---- 2457907 pairs aligned concordantly 0 times; of these: 280399 (11.41%) aligned discordantly 1 time ---- 2177508 pairs aligned 0 times concordantly or discordantly; of these: 4355016 mates make up the pairs; of these: 2320566 (53.28%) aligned 0 times 1297150 (29.79%) aligned exactly 1 time 737300 (16.93%) aligned >1 times 95.19% overall alignment rate Time searching: 01:10:44 Overall time: 01:10:44
通常比對率大於90%表明比對率較好。