org.Hs.eg.db包簡介(轉換NCBI、ensemble等數據庫中基因ID,symbol等之間的轉換)


1)安裝載入

-------------------------------------------

if("org.Hs.eg.db" %in% rownames(installed.packages()) == FALSE) {source("http://bioconductor.org/biocLite.R");biocLite("org.Hs.eg.db")}
suppressMessages(library(org.Hs.eg.db))

2)查看該包所有的對象

--------------------------------------------

ls("package:org.Hs.eg.db")

功能:可以用來進行基因ID的轉換

org.Hs.egACCNUM:Map Entrez Gene identifiers to GenBank Accession Numbers(Entrez Gene identifiers 和genbank
org.Hs.egALIAS2EG:Map between Common Gene Symbol Identifiers and Entrez Gene
org.Hs.eg.db:Bioconductor annotation data package
org.Hs.egCHR:Map Entrez Gene IDs to Chromosomes
org.Hs.egCHRLENGTHS:A named vector for the length of each of the chromosomes
org.Hs.egCHRLOCEntrez Gene IDs to Chromosomal Location
org.Hs.egENSEMBL:Map Ensembl gene accession numbers with Entrez Gene identifiers
org.Hs.egENSEMBLPROT:Map Ensembl protein acession numbers with Entrez Gene identifiers
org.Hs.egENSEMBLTRANS:Map Ensembl transcript acession numbers with Entrez Gene identifiers
org.Hs.egENZYME:Map between Entrez Gene IDs and Enzyme Commission (EC) Numbers
org.Hs.egGENENAME:Map between Entrez Gene IDs and Genes
org.Hs.egGOMaps between Entrez Gene IDs and Gene Ontology (GO) IDs
org.Hs.egMAP:Map between Entrez Gene Identifiers and cytogenetic:Maps/bands
org.Hs.egMAPCOUNTS Number of:Mapped keys for the:Maps in package org.Hs.eg.db
org.Hs.egOMIM:Map between Entrez Gene Identifiers and Mendelian Inheritance in Man (MIM) identifiers
org.Hs.egORGANISMThe Organism for org.Hs.eg
org.Hs.egPATHMappings between Entrez Gene identifiers and KEGG pathway identifiers
org.Hs.egPFAMMaps between Manufacturer Identifiers and PFAM Identifiers
org.Hs.egPMIDMap between Entrez Gene Identifiers and PubMed Identifiers
org.Hs.egPROSITE:Maps between Manufacturer Identifiers and PROSITE Identifiers
org.Hs.egREFSEQ:Map between Entrez Gene Identifiers and RefSeq Identifiers
org.Hs.egSYMBOL:Map between Entrez Gene Identifiers and Gene Symbols
org.Hs.egUNIGENE:Map between Entrez Gene Identifiers and UniGene cluster identifiers
org.Hs.egUNIPROT:Map Uniprot accession numbers with Entrez Gene identifiers
org.Hs.eg_dbconn:Collect information about the package annotation DB

 示例:

(用mget函數):
myEIDs <- c("1", "10", "100", "1000", "37690")
mySymbols <- mget(myEIDs, org.Hs.egSYMBOL, ifnotfound=NA)     ####myEID是自己的ID,org.Hs.egSYMBOL是其中的一個對象
mySymbols <- unlist(mySymbols)

(用select函數):
myEIDs <- c("ENSG00000130720", "ENSG00000103257", "ENSG00000156414")
cols <- c("SYMBOL", "GENENAME")
select(org.Hs.eg.db, keys=myEIDs, columns=cols, keytype="ENSEMBL")#生成數據框,

原理:例如將 Entrez Gene identifiers( https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene) 與 GenBank accession numbers進行簡單的mapping。該map依據的數據庫是Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA

 

 

以DATA其中的一個gene2ensembl文件為例來感受其實如何實現的:

 wget ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2ensembl.gz

解壓后查看:

其中第一列是物種id,第二列是GeneID, 第三列是Ensemble_geneID,第四列是RNA_id,第五列是Ensemble_RNAid,第六列是protein_id。因此這些R包的功能極有可能就是利用NCBI或ensem等數據庫中的這些文件信息,通過一系列的腳本實現了基因ID之間進行轉換,因此如果對NCBI、Ensemble等網絡架構熟悉的話,自己又會寫腳本,就可以自己處理,而不用這些R包進行。當然別人寫好了,為什么自己造輪子呢?自己造輪子是為了深刻的理解

3)各個對象的簡單使用

-----------------------------------------------------------

3.1)org.Hs.egACCNUM(將Entrez Gene identifiers 與 GenBank Accession Numbers進行map

x <- org.Hs.egACCNUM ### Bimap interface
mapped_genes <- mappedkeys(x) ## Get the entrez gene identifiers that are mapped to an ACCNUM
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5] # Get the ACCNUM for the first five genes
xx[[1]] # Get the first one
}
#For the reverse map ACCNUM2EG:
xx <- as.list(org.Hs.egACCNUM2EG) # Convert to a list
if(length(xx) > 0){
xx[1:5] # Gets the entrez gene identifiers for the first five Entrez Gene IDs
xx[[1]] # Get the first one
}

 3.2)org.Hs.egALIAS2EG(將 Common Gene Symbol Identifiers 和 Entrez Gene進行轉換)

x <- org.Hs.egACCNUM  ## Bimap interface:org.Hs.egALIAS2EG
xx <- as.list(org.Hs.egALIAS2EG)   # Convert the object to a list
xx <- xx[!is.na(xx)] # Remove pathway identifiers that do not map to any entrez gene id
if(length(xx) > 0){
xx[1:2]   # The entrez gene identifiers for the first two elements of XX
xx[[1]]   # Get the first one
}

3.3) org.Hs.egCHR (將Entrez Gene IDs 和Chromosomes進行map)

x <- org.Hs.egCHR        ## Bimap interface
mapped_genes <- mappedkeys(x) #Get entrez gene that are mapped to a chromosome
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5]         # Get the CHR for the first five genes
xx[[1]]         # Get the first one
}

3.4)org.Hs.egCHRLENGTHS (每個染色體的長度)

tt <- org.Hs.egCHRLENGTHS  ## Bimap interface:
tt["1"]       # Length of chromosome 1
for (i in c(1:22,'X','Y')){print(tt[i])}   #####打印每一個染色體的長度

3.5) org.Hs.egCHRLOC (Entrez Gene IDs在Chromosomal 上的定位)

x <- org.Hs.egCHRLOC  ### Bimap interface
mapped_genes <- mappedkeys(x) #Get the entrez gene identifiers that are mapped to chromosome locations
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5]   # Get the CHRLOC for the first five genes
xx[[1]]   # Get the first one
}

3.6)org.Hs.egENSEMBL (將Ensembl gene accession numbers 與 Entrez Gene identifiers進行map

x <- org.Hs.egENSEMBL  ## Bimap interface
mapped_genes <- mappedkeys(x)# Get the entrez gene IDs that are mapped to an Ensembl ID
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5]       # Get the Ensembl gene IDs for the first five genes
xx[[1]]   # Get the first one
}
#For the reverse map ENSEMBL2EG:
xx <- as.list(org.Hs.egENSEMBL2EG) # Convert to a list
if(length(xx) > 0){              
xx[1:5]       # Gets the entrez gene IDs for the first five Ensembl IDs
xx[[1]]       # Get the first one
}

3.7) org.Hs.egENSEMBLPROT (將Ensembl protein acession numbers 和 Entrez Gene identifiers進行map)

x <- org.Hs.egENSEMBLPROT   ## Bimap interface
mapped_genes <- mappedkeys(x) #Get the entrez gene IDs that are mapped to an Ensembl ID
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {  
xx[1:5]   # Get the Ensembl gene IDs for the first five proteins
xx[[1]]     # Get the first one
}
#For the reverse map ENSEMBLPROT2EG:
xx <- as.list(org.Hs.egENSEMBLPROT2EG) # Convert to a list
if(length(xx) > 0){
xx[1:5] # Gets the entrez gene IDs for the first five Ensembl IDs
xx[[1]] # Get the first one
}

3.8) org.Hs.egENSEMBLTRANS (將 Ensembl transcript acession numbers 與 Entrez Gene identifiers進行mapping)

x <- org.Hs.egENSEMBLTRANS   ## Bimap interface:
mapped_genes <- mappedkeys(x) #entrez gene IDs that are mapped to an Ensembl ID
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5]   # Get the Ensembl gene IDs for the first five proteins
xx[[1]] # Get the first one
}
#For the reverse map ENSEMBLTRANS2EG:
xx <- as.list(org.Hs.egENSEMBLTRANS2EG) # Convert to a list
if(length(xx) > 0){
xx[1:5] # Gets the entrez gene IDs for the first five Ensembl IDs
xx[[1]] # Get the first one
}

3.9)org.Hs.egGENENAME(將 Entrez Gene IDs 與 Genes進行mapping)

x <- org.Hs.egGENENAME    ## Bimap interface
mapped_genes <- mappedkeys(x) #gene names that are mapped to an entrez gene identifier
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5] # Get the GENE NAME for the first five genes
xx[[1]] # Get the first one
}

3.10)org.Hs.egGO (Entrez Gene IDs與 Gene Ontology (GO) IDs進行mapping)

x <- org.Hs.egGO  ## Bimap interface:
mapped_genes <- mappedkeys(x) # entrez gene identifiers that are mapped to a GO ID
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
got <- xx[[1]] # Try the first one
got[[1]][["GOID"]]
got[[1]][["Ontology"]]
got[[1]][["Evidence"]]
}
# For the reverse map:
xx <- as.list(org.Hs.egGO2EG) # Convert to a list
if(length(xx) > 0){
goids <- xx[2:3] # Gets the entrez gene ids for the top 2nd and 3nd GO identifiers
goids[[1]] # Gets the entrez gene ids for the first element of goids
names(goids[[1]]) # Evidence code for the mappings
}
# For org.Hs.egGO2ALLEGS
xx <- as.list(org.Hs.egGO2ALLEGS)
if(length(xx) > 0){

goids <- xx[2:3] # Entrez Gene identifiers for the top 2nd and 3nd GO identifiers
goids[[1]] # Gets all the Entrez Gene identifiers for the first element of goids
names(goids[[1]]) # Evidence code for the mappings
}

3.11)org.Hs.egPATH (將Entrez Gene identifiers 與KEGG pathway identifiers進行mapping)

x <- org.Hs.egPATH  ## Bimap interface:
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}
# For the reverse map:
xx <- as.list(org.Hs.egPATH2EG)
xx <- xx[!is.na(xx)] # Remove pathway identifiers that do not map to any entrez gene id
if(length(xx) > 0){
xx[1:2]
xx[[1]]
}

3.12)org.Hs.egREFSEQ(將Entrez Gene Identifiers 與 RefSeq Identifiers進行mapping)

x <- org.Hs.egREFSEQ
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}
# For the reverse map:
x <- org.Hs.egREFSEQ2EG
mapped_seqs <- mappedkeys(x)
xx <- as.list(x[mapped_seqs])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}

3.13)org.Hs.egSYMBOL(將 Entrez Gene Identifiers 與Gene Symbols進行mapping)

x <- org.Hs.egSYMBOL
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}
x <- org.Hs.egSYMBOL2EG
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}

3.14)org.Hs.egUNIGENE (Entrez Gene Identifiers 與 UniGene cluster identifiers進行mapping)

x <- org.Hs.egUNIGENE
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}
# For the reverse map:
x <- org.Hs.egUNIGENE2EG
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}

3.15)org.Hs.egUNIPROT (Uniprot accession numbers與 Entrez Gene identifiers進行mapping)

x <- org.Hs.egUNIPROT
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}
 希望大家通過上述教程的解析,能夠理解,基因ID,名稱等之間是如何轉換,並通過這些對NCBI、ensemble、pfam等數據庫有相應的一定認識。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM