PPI | protein-protein interaction | 蛋白互作分析 | gene interaction

PPI | protein-protein interaction | 蛋白互作分析 | gene interaction | 基因互作

本文轉載自查看原文 2019-05-15 13:34 3059 轉錄調控

前言

做RNA-seq基因表達數據分析挖掘，我們感興趣的其實是“基因互作”，哪些基因影響了我們這個基因G，我們的基因G又會去影響哪些基因，從而得到基因調控的機制。

直覺確實是很明確的，但是細節處卻有很多問題。

我們討論的到底是基因表達的互作，還是基因產物的互作？

------------

對於蛋白編碼基因，它翻譯產生蛋白，如果此蛋白不參與轉錄過程，理論上不可能會影響另一個基因的表達，那也就不存在基因表達的互作的，它們的基因表達被很好的隔離起來了，相互獨立，互不影響。

但現在鑒定出了很多調控基因或其他在基因組上的調控序列，比如miRNA、lncRNA等，它們也都需要從基因組上轉錄出來，然后轉錄產物會去影響其他基因的表達（影響轉錄）。這才是基因表達互作，雖然MiRNA、lncRNA不能被稱作基因。

------------

基因產物的互作就普遍了，那就是蛋白互作，也就是STRING等數據庫里收集的信息。

蛋白互作也容易直觀理解些，復雜的多細胞生命體，幾乎所有的功能都是靠蛋白來實現的，所以有很多蛋白要互相結合（空間上）在一起來行使自己的功能。

------------

還有一個就是遺傳學領域的基因互作，這與生物學的基因互作完全不同，遺傳學考慮的是宏觀的基因互作，站在表型的基礎上。 Novel phenotypes often result from the interactions of two genes。

遺傳學的基因互作是生物學基因產物互作的結果。

Defining genetic interaction

GENE INTERACTIONS

STRING database的挖掘

這個數據庫絕對是做實驗人的寶藏，里面包含了各種蛋白互作關系，不用做實驗就有一大堆證據。

IPA了解一下，收費的高端分析軟件，大部分就是整合的這個數據庫，很多大佬喜歡用IPA來找明星基因，再來講故事，實例請看之前解讀的CSC paper。

首先了解一下STRING里面有哪些文件可以下載：

https://string-db.org/cgi/download.pl?sessionId=yMNmD7s36wS8

選你的物種，減少文件大小，常用的就是互作數據：

一般我們想知道某個蛋白會與哪些其他蛋白互作，以及互作的類型，然后做下游分析，信息都在這幾個文件里了。

注：有哪些互作關系需要好好搞清楚，移步help，https://string-db.org/cgi/help.pl?sessionId=yMNmD7s36wS8

Docs » User documentation » Getting started » Evidence

Conserved Neighborhood
Co-occurrence
Fusion
Co-expression
Experiments
Databases
Text mining

每一個PPI關系的證據來源是不同的，選擇你需要的證據。我覺得里面最可靠的就是Experiments, Databases和Text mining了。

當然，我們是高手，能用更簡單的方法絕不用復雜的，那么STRING的API了解一下。

用任意腳本語言讀以下格式化地址：

https://string-db.org/api/[output-format]/interaction_partners?identifiers=[your_identifiers]&[optional_parameters]

就能得到一個dataframe結果，不用下載，不用篩選，速度更快，隨調隨用。

實例，我想知道HDAC4的互作蛋白，可以這么抓：

老鼠：Mus%20musculus

url <- "https://string-db.org/api/tsv/interaction_partners?identifiers=HDAC4&species=Homo%20sapiens"
webDf <- read.table(url, header=T)
head(webDf)

       stringId_A      stringId_B preferredName_A preferredName_B ncbiTaxonId score
1 ENSP00000264606 ENSP00000080059           HDAC4           HDAC7        9606 0.934
2 ENSP00000264606 ENSP00000202967           HDAC4           SIRT4        9606 0.809
3 ENSP00000264606 ENSP00000209873           HDAC4            AAAS        9606 0.901
4 ENSP00000264606 ENSP00000209875           HDAC4            CBX5        9606 0.779
5 ENSP00000264606 ENSP00000212015           HDAC4           SIRT1        9606 0.988
6 ENSP00000264606 ENSP00000215832           HDAC4           MAPK1        9606 0.572
  nscore fscore pscore ascore escore dscore   tscore
1      0      0      0  0.061  0.320   0.90 0.061985
2      0      0      0  0.052  0.166   0.00 0.778000
3      0      0      0  0.058  0.000   0.90 0.000000
4      0      0      0  0.062  0.463   0.54 0.159000
5      0      0      0  0.052  0.415   0.90 0.812000
6      0      0      0  0.000  0.433   0.00 0.276000

結果解讀：

Output fields (TSV and JSON formats):

Field	Description
stringId_A	STRING identifier (protein A)
stringId_B	STRING identifier (protein B)
preferredName_A	common protein name (protein A)
preferredName_B	common protein name (protein B)
ncbiTaxonId	NCBI taxon identifier
score	combined score
nscore	gene neighborhood score
fscore	gene fusion score
pscore	phylogenetic profile score
ascore	coexpression score
escore	experimental score
dscore	database score
tscore	textmining score

抓其他信息改下API就行了

還有很多工具是基於STRING做富集分析的，也可以了解一下，主要看自己需求。

待續~

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 DEPICT實現基因優化（gene prioritization）、gene set富集分析（geneset enrichment）、組織富集分析（tissue enrichment）直系同源基因（orthologous gene）和旁系同源基因（paralogous gene）批量注釋基因到基因座上(map gene to locus) PTBP1 | Polypyrimidine Tract Binding Protein | 聚嘧啶區結合蛋白 | 關鍵調控因子基因家族分析之同源基因的尋找兩對等位基因控制一對相對性狀的規律（基因互作） GEO（Gene Expression Omnibus）：高通量基因表達數據庫一文搞懂基因融合（gene fusion）的定義、產生機制及鑒定方法使用VEGAS2（Versatile Gene-based Association Study）進行gene based的關聯分析研究 MAGMA——做gene analysis的GWAS數據分析軟件