用哪個版本的基因組和注釋文件好?| 親測


What Ensembl genome version should I use for alignments? (e.g. toplevel.fa vs. primary_assembly.fa)

這是一個很細節也很實際的問題,到底用哪個版本?

參考:

What Ensembl genome version should I use for alignments? (e.g. toplevel.fa vs. primary_assembly.fa)

Results differ when using different ensembl versions

 

First part options:

  • dna_sm - Repeats soft-masked (converts repeat nucleotides to lowercase)
  • dna_rm - Repeats masked (converts repeats to to N's)
  • dna - No masking

Second part options:

  • .toplevel - Includes haplotype information (not sure how aligners deal with this)

  • .primary_assembly - Single reference base per position

 

大部分都推薦使用soft-mask版本的,也就是沒有把repeat替換為N。

 

下載hg19基因組:http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/

參考:基因組各種版本對應關系

從genecode下載hg19注釋文件:ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_27/

UCSC也可以下載,不過只能從網頁導出。http://genome.ucsc.edu/cgi-bin/hgTables

注:genecode貌似出了問題,https://www.gencodegenes.org/releases/26lift37.html,里面ebi的鏈接無法下載了。

參考:http://www.biotrainee.com/thread-2035-1-1.html

 

基因組不是越新越好的,看看最新的CNS,里面很少有用最新版本的基因組,為什么?因為注釋沒跟上,你做出來的東西可能和別人對不上。

 

親測

用不同版本的基因組效果會怎么樣?

我做了轉錄組的測試,用的hg19和GRCh38

結論如下:

1. reads比對到基因組上的情況大致相同,基本沒有差別;

2. 用不同的注釋文件,基因表達的結果差距非常大。同樣都是用featureCounts

GRCh38的結果:

Assigned        306852
Unassigned_Unmapped     0
Unassigned_MappingQuality       0
Unassigned_Chimera      0
Unassigned_FragmentLength       0
Unassigned_Duplicate    0
Unassigned_MultiMapping 36280
Unassigned_Secondary    0
Unassigned_Nonjunction  0
Unassigned_NoFeatures   56950
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity    19771
//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file /home/lizhixin/databases/ensembl/release91/Homo_s ... ||
||    Features : 1199851                                                      ||
||    Meta-features : 58302                                                   ||
||    Chromosomes/contigs : 47                                                ||
||                                                                            ||
|| Process BAM file /home/lizhixin/project/scRNA-seq/reanalyze/first_five ... ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||
||    Total fragments : 419853                                                ||
||    Successfully assigned fragments : 306852 (73.1%)                        ||
||    Running time : 0.05 minutes                                             ||

  

hg19的結果:

Assigned        586467
Unassigned_Unmapped     0
Unassigned_MappingQuality       0
Unassigned_Chimera      0
Unassigned_FragmentLength       0
Unassigned_Duplicate    0
Unassigned_MultiMapping 66997
Unassigned_Secondary    0
Unassigned_Nonjunction  0
Unassigned_NoFeatures   133437
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity    47278
//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file /home/lizhixin/databases/cellranger_ref/refdata-c ... ||
||    Features : 1130716                                                      ||
||    Meta-features : 32738                                                   ||
||    Chromosomes/contigs : 45                                                ||
||                                                                            ||
|| Process BAM file /home/lizhixin/project/scRNA-seq/reanalyze/first_five ... ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||    Total fragments : 834179                                                ||
||    Successfully assigned fragments : 586467 (70.3%)                        ||
||    Running time : 0.05 minutes                                             ||

 

不同的注釋文件千萬不要亂用!!!  

  

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM