org.Hs.eg.db包簡介(轉換NCBI、ensemble等數據庫中基因ID，symbol等之間的轉換)

本文轉載自查看原文 2018-09-09 10:15 1873 Bioconductor

1）安裝載入

-------------------------------------------

if("org.Hs.eg.db" %in% rownames(installed.packages()) == FALSE) {source("http://bioconductor.org/biocLite.R");biocLite("org.Hs.eg.db")}
suppressMessages(library(org.Hs.eg.db))

2)查看該包所有的對象

--------------------------------------------

ls("package:org.Hs.eg.db")

功能：可以用來進行基因ID的轉換

org.Hs.egACCNUM：Map Entrez Gene identiﬁers to GenBank Accession Numbers（Entrez Gene identiﬁers 和genbank）
org.Hs.egALIAS2EG：Map between Common Gene Symbol Identiﬁers and Entrez Gene
org.Hs.eg.db：Bioconductor annotation data package
org.Hs.egCHR：Map Entrez Gene IDs to Chromosomes
org.Hs.egCHRLENGTHS：A named vector for the length of each of the chromosomes
org.Hs.egCHRLOC：Entrez Gene IDs to Chromosomal Location
org.Hs.egENSEMBL：Map Ensembl gene accession numbers with Entrez Gene identiﬁers
org.Hs.egENSEMBLPROT：Map Ensembl protein acession numbers with Entrez Gene identiﬁers
org.Hs.egENSEMBLTRANS：Map Ensembl transcript acession numbers with Entrez Gene identiﬁers
org.Hs.egENZYME：Map between Entrez Gene IDs and Enzyme Commission (EC) Numbers
org.Hs.egGENENAME：Map between Entrez Gene IDs and Genes
org.Hs.egGO：Maps between Entrez Gene IDs and Gene Ontology (GO) IDs
org.Hs.egMAP：Map between Entrez Gene Identiﬁers and cytogenetic：Maps/bands
org.Hs.egMAPCOUNTS Number of：Mapped keys for the：Maps in package org.Hs.eg.db
org.Hs.egOMIM：Map between Entrez Gene Identiﬁers and Mendelian Inheritance in Man (MIM) identiﬁers
org.Hs.egORGANISM：The Organism for org.Hs.eg
org.Hs.egPATH：Mappings between Entrez Gene identiﬁers and KEGG pathway identiﬁers
org.Hs.egPFAM：Maps between Manufacturer Identiﬁers and PFAM Identiﬁers
org.Hs.egPMID：Map between Entrez Gene Identiﬁers and PubMed Identiﬁers
org.Hs.egPROSITE：Maps between Manufacturer Identiﬁers and PROSITE Identiﬁers
org.Hs.egREFSEQ：Map between Entrez Gene Identiﬁers and RefSeq Identiﬁers
org.Hs.egSYMBOL：Map between Entrez Gene Identiﬁers and Gene Symbols
org.Hs.egUNIGENE：Map between Entrez Gene Identiﬁers and UniGene cluster identiﬁers
org.Hs.egUNIPROT：Map Uniprot accession numbers with Entrez Gene identiﬁers
org.Hs.eg_dbconn：Collect information about the package annotation DB

示例：

(用mget函數)：
myEIDs <- c("1", "10", "100", "1000", "37690")
mySymbols <- mget(myEIDs, org.Hs.egSYMBOL, ifnotfound=NA) ####myEID是自己的ID，org.Hs.egSYMBOL是其中的一個對象
mySymbols <- unlist(mySymbols)

(用select函數)：
myEIDs <- c("ENSG00000130720", "ENSG00000103257", "ENSG00000156414")
cols <- c("SYMBOL", "GENENAME")
select(org.Hs.eg.db, keys=myEIDs, columns=cols, keytype="ENSEMBL")#生成數據框，

原理：例如將 Entrez Gene identiﬁers( https://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene) 與 GenBank accession numbers進行簡單的mapping。該map依據的數據庫是Entrez Gene ftp://ftp.ncbi.nlm.nih.gov/gene/DATA

以DATA其中的一個gene2ensembl文件為例來感受其實如何實現的：

wget ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2ensembl.gz

解壓后查看：

其中第一列是物種id，第二列是GeneID, 第三列是Ensemble_geneID,第四列是RNA_id,第五列是Ensemble_RNAid,第六列是protein_id。因此這些R包的功能極有可能就是利用NCBI或ensem等數據庫中的這些文件信息，通過一系列的腳本實現了基因ID之間進行轉換，因此如果對NCBI、Ensemble等網絡架構熟悉的話，自己又會寫腳本，就可以自己處理，而不用這些R包進行。當然別人寫好了，為什么自己造輪子呢？自己造輪子是為了深刻的理解

3）各個對象的簡單使用

-----------------------------------------------------------

3.1）org.Hs.egACCNUM(將Entrez Gene identiﬁers 與 GenBank Accession Numbers進行map

x <- org.Hs.egACCNUM ### Bimap interface
mapped_genes <- mappedkeys(x) ## Get the entrez gene identifiers that are mapped to an ACCNUM
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5] # Get the ACCNUM for the first five genes
xx[[1]] # Get the first one
}
#For the reverse map ACCNUM2EG:
xx <- as.list(org.Hs.egACCNUM2EG) # Convert to a list
if(length(xx) > 0){
xx[1:5] # Gets the entrez gene identifiers for the first five Entrez Gene IDs
xx[[1]] # Get the first one
}

3.2）org.Hs.egALIAS2EG(將 Common Gene Symbol Identiﬁers 和 Entrez Gene進行轉換)

x <- org.Hs.egACCNUM  ## Bimap interface:org.Hs.egALIAS2EG
xx <- as.list(org.Hs.egALIAS2EG)   # Convert the object to a list
xx <- xx[!is.na(xx)] # Remove pathway identifiers that do not map to any entrez gene id
if(length(xx) > 0){
xx[1:2]   # The entrez gene identifiers for the first two elements of XX
xx[[1]]   # Get the first one
}

3.3) org.Hs.egCHR (將Entrez Gene IDs 和Chromosomes進行map)

x <- org.Hs.egCHR        ## Bimap interface
mapped_genes <- mappedkeys(x) #Get entrez gene that are mapped to a chromosome
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5]         # Get the CHR for the first five genes
xx[[1]]         # Get the first one
}

3.4）org.Hs.egCHRLENGTHS (每個染色體的長度)

tt <- org.Hs.egCHRLENGTHS  ## Bimap interface:
tt["1"]       # Length of chromosome 1
for (i in c(1:22,'X','Y')){print(tt[i])}   #####打印每一個染色體的長度

3.5） org.Hs.egCHRLOC （Entrez Gene IDs在Chromosomal 上的定位)

x <- org.Hs.egCHRLOC  ### Bimap interface
mapped_genes <- mappedkeys(x) #Get the entrez gene identifiers that are mapped to chromosome locations
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5]   # Get the CHRLOC for the first five genes
xx[[1]]   # Get the first one
}

3.6）org.Hs.egENSEMBL （將Ensembl gene accession numbers 與 Entrez Gene identiﬁers進行map）

x <- org.Hs.egENSEMBL  ## Bimap interface
mapped_genes <- mappedkeys(x)# Get the entrez gene IDs that are mapped to an Ensembl ID
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5]       # Get the Ensembl gene IDs for the first five genes
xx[[1]]   # Get the first one
}
#For the reverse map ENSEMBL2EG:
xx <- as.list(org.Hs.egENSEMBL2EG) # Convert to a list
if(length(xx) > 0){              
xx[1:5]       # Gets the entrez gene IDs for the first five Ensembl IDs
xx[[1]]       # Get the first one
}

3.7) org.Hs.egENSEMBLPROT (將Ensembl protein acession numbers 和 Entrez Gene identiﬁers進行map)

x <- org.Hs.egENSEMBLPROT   ## Bimap interface
mapped_genes <- mappedkeys(x) #Get the entrez gene IDs that are mapped to an Ensembl ID
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {  
xx[1:5]   # Get the Ensembl gene IDs for the first five proteins
xx[[1]]     # Get the first one
}
#For the reverse map ENSEMBLPROT2EG:
xx <- as.list(org.Hs.egENSEMBLPROT2EG) # Convert to a list
if(length(xx) > 0){
xx[1:5] # Gets the entrez gene IDs for the first five Ensembl IDs
xx[[1]] # Get the first one
}

3.8) org.Hs.egENSEMBLTRANS (將 Ensembl transcript acession numbers 與 Entrez Gene identiﬁers進行mapping)

x <- org.Hs.egENSEMBLTRANS   ## Bimap interface:
mapped_genes <- mappedkeys(x) #entrez gene IDs that are mapped to an Ensembl ID
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5]   # Get the Ensembl gene IDs for the first five proteins
xx[[1]] # Get the first one
}
#For the reverse map ENSEMBLTRANS2EG:
xx <- as.list(org.Hs.egENSEMBLTRANS2EG) # Convert to a list
if(length(xx) > 0){
xx[1:5] # Gets the entrez gene IDs for the first five Ensembl IDs
xx[[1]] # Get the first one
}

3.9)org.Hs.egGENENAME(將 Entrez Gene IDs 與 Genes進行mapping)

x <- org.Hs.egGENENAME    ## Bimap interface
mapped_genes <- mappedkeys(x) #gene names that are mapped to an entrez gene identifier
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
xx[1:5] # Get the GENE NAME for the first five genes
xx[[1]] # Get the first one
}

3.10)org.Hs.egGO (Entrez Gene IDs與 Gene Ontology (GO) IDs進行mapping)

x <- org.Hs.egGO  ## Bimap interface:
mapped_genes <- mappedkeys(x) # entrez gene identifiers that are mapped to a GO ID
xx <- as.list(x[mapped_genes]) # Convert to a list
if(length(xx) > 0) {
got <- xx[[1]] # Try the first one
got[[1]][["GOID"]]
got[[1]][["Ontology"]]
got[[1]][["Evidence"]]
}
# For the reverse map:
xx <- as.list(org.Hs.egGO2EG) # Convert to a list
if(length(xx) > 0){
goids <- xx[2:3] # Gets the entrez gene ids for the top 2nd and 3nd GO identifiers
goids[[1]] # Gets the entrez gene ids for the first element of goids
names(goids[[1]]) # Evidence code for the mappings
}
# For org.Hs.egGO2ALLEGS
xx <- as.list(org.Hs.egGO2ALLEGS)
if(length(xx) > 0){

goids <- xx[2:3] # Entrez Gene identifiers for the top 2nd and 3nd GO identifiers
goids[[1]] # Gets all the Entrez Gene identifiers for the first element of goids
names(goids[[1]]) # Evidence code for the mappings
}

3.11)org.Hs.egPATH (將Entrez Gene identiﬁers 與KEGG pathway identiﬁers進行mapping)

x <- org.Hs.egPATH  ## Bimap interface:
mapped_genes <- mappedkeys(x) 
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}
# For the reverse map:
xx <- as.list(org.Hs.egPATH2EG)
xx <- xx[!is.na(xx)] # Remove pathway identifiers that do not map to any entrez gene id
if(length(xx) > 0){
xx[1:2]
xx[[1]]
}

3.12)org.Hs.egREFSEQ(將Entrez Gene Identiﬁers 與 RefSeq Identiﬁers進行mapping)

x <- org.Hs.egREFSEQ
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}
# For the reverse map:
x <- org.Hs.egREFSEQ2EG
mapped_seqs <- mappedkeys(x)
xx <- as.list(x[mapped_seqs])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}

3.13)org.Hs.egSYMBOL(將 Entrez Gene Identiﬁers 與Gene Symbols進行mapping)

x <- org.Hs.egSYMBOL
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}
x <- org.Hs.egSYMBOL2EG
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}

3.14）org.Hs.egUNIGENE (Entrez Gene Identiﬁers 與 UniGene cluster identiﬁers進行mapping)

x <- org.Hs.egUNIGENE
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}
# For the reverse map:
x <- org.Hs.egUNIGENE2EG
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}

3.15）org.Hs.egUNIPROT (Uniprot accession numbers與 Entrez Gene identiﬁers進行mapping)

x <- org.Hs.egUNIPROT
mapped_genes <- mappedkeys(x)
xx <- as.list(x[mapped_genes])
if(length(xx) > 0) {
xx[1:5]
xx[[1]]
}

 希望大家通過上述教程的解析，能夠理解，基因ID，名稱等之間是如何轉換，並通過這些對NCBI、ensemble、pfam等數據庫有相應的一定認識。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 加載org.Hs.eg.db包失敗問題 R 語言 select函數在org.Hs.eg.db上的運用 Ensemble ID及轉換使用"biomaRt"包進行基因ID轉換如何將DB2的數據庫轉換到mySQL中？基因id轉換 bioMart進行基因id的轉換 NCBI SRA數據庫 NCBI的gene id, ENTREZID 與Ensembl Gene ID的轉換 gene ID轉換（gene ID轉為protein ID） pathway注釋 string數據庫的方法 UniProt