topGO

本文轉載自查看原文 2018-09-11 09:59 772 Bioconductor

前面我們講過GO.db這個包，現在接着延伸topGO包，該包是用來協助GO富集分析

1）安裝

if("topGO" %in% rownames(installed.packages()) == FALSE) {source("http://bioconductor.org/biocLite.R");biocLite("topGO")}
suppressMessages(library(topGO))
ls("package:topGO")

2）使用方法

該包主要有三個使用步驟：

2.1、Data preparation:准備數據集，用於構建 topGOdata.對象。

2.1.1、包括gene標識符(List of genes identiﬁers)及相應的分值gene scores(例如p值等)
2.1.2、差異表達基因list或經一定標准按照分值篩選的基因集用於后續分析(list of diﬀerentially expressed genes or a criteria for selecting genes
based on their scores)；
2.1.3、identifier和GO term間的map，即GOterm表:（gene-to-GO annotations ） ####例如測試文件中的geneid2go.map
2.1.4、GO的層級結構，由GO.db提供，目前這個包只支持GO.db提供的結構

goterm表示例（gene-to-GO annotations ）：

2.2、Running enrichment tests：進行富集分析，用任何可行的混合統計測試和方法來處理 GO拓撲結構(GO topology)

2.3、Analysis results：用 summary functions 和 visualisation tools對第二步進行統計和可視化

3）簡單示例(guide)

3.1.1、准備輸入文件

library(ALL)
data(ALL)
data(geneList)   ##文件1：基因list，
affyLib <- paste(annotation(ALL), "db", sep = ".")          #####"hgu95av2.db"
library(package = affyLib, character.only = TRUE)       ########GO term表
sum(topDiffGenes(geneList)     ###選擇差異基因集，

3.1.2、構建 topGOdata對象(核心步驟)：

sampleGOdata <- new("topGOdata",     
                     description = "Simple session",  ##topGOdata的描述，可選
　　　　　　　　        ontology = "BP",                 ##可指定要分析的GO term的類型，即BP、CC之類
                     allGenes = geneList,             ##基因identifier的原始列表
　　　　　　　　        geneSel = topDiffGenes,          ##geneSelectionFun聯合作用，篩選出后續參與分析的基因
                     nodeSize = 10,                   ##富集的GO term轄下基因的最小數目，這里選擇10.即最少10個
                     annot = annFUN.db,               ##提取gene-to-GO mappings 的對應關系
　　　　　　　         affyLib = affyLib)   
sampleGOdata

3.2 Performing the enrichment tests

有了topGOdata對象，接下來就可以用來進行富集分析。這里用兩種檢驗方法：Fisher’s exact test (基於 gene counts)和Kolmogorov-Smirnov like test (computes enrichment based on gene scores)。
其中用runTest函數來進行這些檢驗，該函數含有3個參數：第一個是topGOdata對象、第二個是algorithm(用於指定處理 GO graph structured的方法）、第三個是statistic(用於指定檢驗方法)

resultFisher <- runTest(sampleGOdata, algorithm = "classic", statistic = "fisher")   ##Fisher’s exact test

resultKS <- runTest(sampleGOdata, algorithm = "classic", statistic = "ks")  #Kolmogorov-Smirnov test,classic method
resultKS.elim <- runTest(sampleGOdata, algorithm = "elim", statistic = "ks")#Kolmogorov-Smirnov test, elim method

3.3 Analysis of results

當富集檢驗結束后，我們就可以分析並解析結果。

runTest()這個函數用來分析顯著富集的 GO terms及其相應的p值。

allRes <- GenTable(sampleGOdata,                 ##之前構建的topGOdata實例
　　　　　          classicFisher = resultFisher,  ##生成GO graphde的方法
                  classicKS = resultKS,          ##生成GO graphde的方法
　　　　　          elimKS = resultKS.elim,        ##生成GO graphde的方法
                  orderBy = "elimKS",            
　　　　　          ranksOf = "classicFisher", 
　　　　　          topNodes = 10)                 ##這里顯示前10個顯著結果

用 score()函數來測評topGO結果對象中 GO term的 p-values ，並用散點圖來說明。

pValue.classic <- score(resultKS)
pValue.elim <- score(resultKS.elim)[names(pValue.classic)]
gstat <- termStat(sampleGOdata, names(pValue.classic))
gSize <- gstat$Annotated / max(gstat$Annotated) * 4
plot(pValue.classic, pValue.elim, xlab = "p-value classic", ylab = "p-value elim",pch = 19, cex = gSize)

差看顯著富集的GO terms在 GO graph中的分布.

showSigOfNodes(sampleGOdata, score(resultKS.elim), firstSigNodes = 5, useInfo = 'all')

4)實戰

4.1 原始數據集的准備（上面的4個文件）

library(topGO)
library(ALL) ##准備數據集
data(ALL)    ##文件1：原始數據集
BPterms <- ls(GOBPTerm)
MFterms <- ls(GOMFTerm)
CCterms <- ls(GOCCTerm)
head(BPterms)
head(MFterms)
head(CCterms)

library(genefilter)   ##對原始數據進行過濾
selProbes <- genefilter(ALL, filterfun(pOverA(0.20, log2(100)), function(x) (IQR(x) > 0.25)))#數據清洗
eset <- ALL[selProbes, ]   ##數據清洗：這里去掉及其低表達的基因，及探針在每個樣品中表達變化不大的的基因

myInterestingGenes <- sample(geneNames, length(geneNames) / 10) #文件二：經一定標准對p值等篩選獲取感興趣基因集用於后續分析
geneList <- factor(as.integer(geneNames %in% myInterestingGenes))
names(geneList) <- geneNames
str(geneList)

geneID2GO <- readMappings(file = system.file("examples/geneid2go.map", package = "topGO"))##文件三：goterm的map文件
str(head(geneID2GO))

GO2geneID <- inverseList(geneID2GO)  ###額外知識：用inverseList()函數實現gene-to-GOs與 GO-to-genes 之間的轉換
str(head(GO2geneID))     ##

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 10、差異基因topGO富集