Entrez是一個搜索引擎,國家生物技術信息中心(NCBI)網站集成了 幾個健康科學的數據庫,如:如“科學文獻,DNA和蛋白質序列數據庫, 蛋白質三維結構,蛋白質結構域的數據,表達數據,基因組完整拼接本等。
Entrez的編程工具”(eUtils):通過它把搜索的結果返回到自己編寫的程序里面, 需要提供URL,並且自己解析XML文件。 Entrez模塊,利用該模塊可以省去提供URL和解析XML的步驟。
Entrez模塊中的函數, 同時也是eUtils中具有的一些函數:
從pubmed中查找相關文獻, 所有返回的結果用Entrez.read()解析
from Bio import Entrez
my_em = 'user@example.com' db = "pubmed"
# Search Entrez website using esearch from eUtils
# esearch returns a handle (called h_search) 主要用來返回id,
h_search = Entrez.esearch(db=db, email=my_em,
term="python and bioinformatics")
record = Entrez.read(h_search) # Parse the result with Entrez.read(),record是字典
res_ids = record[“IdList”] # Get the list of Ids returned by previous search. 該鍵的值是列表
# For each id in the list
for r_id in res_ids:
# Get summary information for each id
h_summ = Entrez.esummary(db=db, id=r_id, email=my_em)
# Parse the result with Entrez.read()
summ = Entrez.read(h_summ) #返回一個列表,第一個元素是字典,不同的數據庫返回的數據的結構不一樣
print(summ[0]['Title'])
print(summ[0]['DOI'])
print('==============================================')
結果:
do_x3dna: A tool to analyze structural fluctuations of dsDNA or dsRNA from molecular dynamics simulations. 10.1093/bioinformatics/btv190
==============================================
RiboTools: A Galaxy toolbox for qualitative ribosome profiling analysis.
10.1093/bioinformatics/btv174
==============================================
Identification of cell types from single-cell transcriptomes using a novel clustering method. 10.1093/bioinformatics/btv088
==============================================
Efficient visualization of high-throughput targeted proteomics experiments: TAPIR.
10.1093/bioinformatics/btv152