goseq

本文轉載自查看原文 2017-07-19 21:27 1552 常用生信軟件的使用和問題

goseq是一個R包，用於尋找GO terms，即基因富集分析。

GO terms是標准化描述基因或基因產物的詞匯，包括三方面，cellular component,molecular funciton,biological process。

每個GO term都有一個GO ID,比如 GO:006260,每個GO term背后都有一系列的相關基因。

GO分析的目的：在差異性基因分析后，我們可能得到很多差異基因，這些基因里的一部分可能跟某個生物過程相關，或幾個生物過程相關。經過GO分析后，我們就能將差異性基因具體的生物功能展示出來，為下一步研究做准備。

GOseq需要輸入的文件：

1.所有有count的genes。

2.差異性表達的genes。

3.genome信息，基因長度信息。#對於許多模式基因組來說，這些內容都被做成了獨立的R包。

4.GO terms包。

>source("http://bioconductor.org/biocLite.R")
>biocLite("goseq")
>biocLite("geneLenDataBase")#genome，genes信息
>biocLite("org.Dm.eg.db")#果蠅的GO categories， （org,<Genome>,<GeneID>,db)

>library("goseq")
>library("geneLenDataBase")
>library("org.Dm.eg.db")

>DEG<-read.table("DEG",header=FALSE）
>ALL<-read.table("ALL"，header=FALSE)
#DEG:差異性基因表 ALL:所有基因表（數據框格式）


>DEG.vector<-c(t(DEG))
>ALL.vector<-c(t(ALL))
#把數據格式轉化為vector，便於下步操作

>gene.vector=as.integer(ALL.vector%in%DEG.vector)
#生成二進制的gene vector（1代表差異性基因，0代表非差異性基因）
>names(gene.vector)<-ALL.vector

>pwf=nullp(gene.vector,"dm3","ensGene")
#生成probability weighting function."dm3"是基因組，"ensgGene"是基因IDs。

>GO.wall=goseq(pwf,"dm3","ensGene")
#生成GO terms ID 。這邊的疑問：genes 沒有mapping 到GO categories。 goseq函數有一個選項：gene2cat，如果gene2cat=NULL，則goseq會自動調用getgo函數實現mapping功能，並將輸出值gene2cat。

>enriched.GO=GO.wall$category[GO.wall$over_represented_pvalue<.05]
#生成差異性 GO terms ID

>library(GO.db)
>capture.output(for(go in enriched.GO[1:length(enriched.GO)]){
print(GOTERM[go])
cat("___________")
}
,file="SigGo.txt")
#生成具體的GO TERM詳解

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 GO分析-GOseq的使用教程