GO | KEGG的注釋是怎么來的?


但凡是做過基因表達數據分析的(芯片、RNA-seq,scRNA-seq),肯定是跑過基因集功能注釋和通路富集的,因為它是研究未知基因集的利器。

但跑過之后老板肯定會給反饋,通常得到的注釋都是沒有太多意義的,偶爾能隨緣得到一些滿意的注釋,所以常見的注釋數據庫是有顯而易見的缺點的。

而往往我們是在驗證時才使用注釋,這種拿不准確數據來驗證新的數據的方法確實值得思考。

 

那么GO和KEGG常見注釋庫到底有些什么缺點呢?

 

那就不得不去了解GO、KEGG是怎么來的

The Gene Ontology Consortium (GOC) uses two further evidence codes to describe experimental support for an annotation: 

IMP (Inferred by mutant phenotype),

and IPI (Inferred by physical interaction).

The consortium uses other evidence codes to describe inferences used in annotations that are not supported by direct experimental evidence, but these will not be considered in this discussion (http://www.geneontology.org/GO.evidence.shtml). 

 

First, each KO record is re-examined and associated with protein sequence data used in experiments of functional characterization.

Second, the GENES database now includes viruses, plasmids, and the addendum category for functionally characterized proteins that are not represented in complete genomes.

Third, new automatic annotation servers, BlastKOALA and GhostKOALA, are made available utilizing the non-redundant pangenome data set generated from the GENES database.

 

我的答案:

顯然生物體內的所有基因表達是一個動態的網絡

像GO這種靜態的樹狀結構是會丟失大部分信息,樹結構和網絡結構有天壤之別。

像KEGG這種雖然是網狀結構,但是也只是一個小的局部靜態網絡,必然會丟失一些全局的、動態的信息。

也就是對基因的划分不能靜態,實際上我們也很難真正研究一個基因的功能,因為牽一發而動全身,這就是為什么僅僅敲除一個基因會帶來如此大的連鎖效應!

 

看文章:Gene Ontology annotations: what they mean and where they come from

KEGG as a reference resource for gene and protein annotation

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM