目錄
什么是eQTL?是通過哪些數據計算得來的,數據格式是什么?
eQTL一般都富集在基因組的什么區域?
幾個常見的eQTL數據庫
什么是GTEx?目前第幾版了?GTEx里面有哪些數據?
GTEx有哪幾篇里程碑文章?
大部分課題組是如何利用GTEx數據的?
GTEx/eQTLGen數據下載download GTEx files
小知識
一個SNP與一個gene,一般就選TSS上下游的gene,blood是金標准。
因為染色體是線性的,LD的存在讓所有的genetic的分析都變復雜了,找到的SNP可能不是causal的,它的鄰居才是。這對eQTL來說也是一樣的。【如果一個region里LD=1,那它們就可以看做是一個點,即使它們功能不同】
GWAS用的是common的SNP,causal SNP是未知的,肯定是有function的【肯定能知道起點是如何到達終點的】。
risk allele富集在了minor allele,Our statistical results revealed that risk alleles were enriched in minor alleles, especially for variants with low minor allele frequencies (MAFs < 0.1).
腦洞大開
如果是單倍體會如何遺傳和發育?沒有有性生殖,就沒有重組重排,無性生殖,多樣性無法保證,只能靠體細胞突變。genotype就是allele,GWAS和eQTL的計算單位都是allele了。
什么是eQTL?是通過哪些數據計算得來的,數據格式是什么?
google eQTL直接看圖片
標准圖形,三個genotype,然后就是某個基因的表達水平,近距離的就是cis,遠距離的就是trans。
核心三要素:SNP、gene、tissue。
eQTL一般都富集在基因組的什么區域?
類似ATAC-seq的信號分布,主要富集在TSS上下游50kbp的范圍內,在TSS附近有峰值。
在不同組織中,同一個位點的genotype和基因表達可能有相反的關系,突出了eQTL的組織特異性。
eQTL的另一個亮點,非編碼區。most of the susceptible loci were found in non-coding regions of the genome
Here we describe “opposite eQTL effects”, i.e., gene expression effects of eQTLs that are in the opposite direction between different tissues, as the biologically meaningful annotations of genes and genetic variants for understanding the GWAS loci.
幾個常見的eQTL數據庫
GTEx
Blood eQTL
eQTLGen
什么是GTEx?目前第幾版了?GTEx里面有哪些數據?
The Genotype-Tissue Expression (GTEx) project is an ongoing effort to build a comprehensive public resource to study tissue-specific gene expression and regulation. Samples were collected from 54 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including WGS, WES, and RNA-Seq. Remaining samples are available from the GTEx Biobank. The GTEx Portal provides open access to data including gene expression, QTLs, and histology images.
翻譯一下:tissue-specific gene expression and regulation,組織特異性基因表達和調控。54 non-diseased tissue sites across nearly 1000 individuals,千人、54種組織,測了WGS, WES, and RNA-Seq。gene expression, QTLs,主要數據就是基因表達和eQTL。
截至2020年09月23日,已經是v8了。
post-mortem tissues 屍體解剖的組織,全部是人的數據。
complex trait heritability/complex trait genetics
Majority of trait-associated variation is non-coding. 【coding基因只占genome 1-5%】
Using expression and epigenetic data to inform missing heritability【大部分trait的heritability很低,如何找那些missing的部分】
一般你有大量同一個個體的genotype和gene expression數據,你自然就會想到要做eQTL分析,即鑒定某個SNP的genotype是否與附近的基因表達是否有關聯,如果找到感興趣的基因,我們就可以深入挖掘。【想想很常見的genotype差異表達的boxplot】
如果樣本量不夠大,那么只能做簡單的allelic expression,看某個SNP的某個allele是否在病人中特異或高度表達,從而繼續深度挖掘。【很常見的GWAS下游分析,看risk allele是否在某個tissue里特異表達】
GTEx有哪幾篇里程碑文章?
https://gtexportal.org/home/publicationsPage
The GTEx Consortium atlas of genetic regulatory effects across human tissues - Science 11 Sep 2020:
Cell type–specific genetic regulation of gene expression across human tissues - Science 11 Sep 2020:
新鮮出爐的文章,測了各種cell type的數據,根據統計學的deconvolution方法,鑒定出來了更多的eQTL。
大部分課題組是如何利用GTEx數據的?
參考:Mulin Jun Li
eQTLGen數據下載
新手建議先用這個數據庫練練手,數據格式比較簡單。
cis-eQTLs
This page contains the cis-eQTL results. The statistically significant cis-eQTLs and SMR-prioritised genes for several traits are browsable, the other files can be downloaded.
下載Significant cis-eQTLs文件
Pvalue SNP SNPChr SNPPos AssessedAllele OtherAllele Zscore Gene GeneSymbol GeneChr GenePos NrCohorts NrSamples FDR BonferroniP 3.2717E-310 rs12230244 12 10117369 T A 200.7534 ENSG00000172322 CLEC12A 12 10126104 34 30596 0.0 4.1662E-302 3.2717E-310 rs12229020 12 10117683 G C 200.6568 ENSG00000172322 CLEC12A 12 10126104 34 30596 0.0 4.1662E-302 3.2717E-310 rs61913527 12 10116198 T C 200.2654 ENSG00000172322 CLEC12A 12 10126104 34 30598 0.0 4.1662E-302
Files ----- File with full cis-eQTL results: 2019-12-11-cis-eQTLsFDR-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz File with significant (FDR<0.05) cis-eQTL results: 2019-12-11-cis-eQTLsFDR0.05-ProbeLevel-CohortInfoRemoved-BonferroniAdded.txt.gz Column Names ------------ Pvalue - P-value SNP - SNP rs ID SNPChr - SNP chromosome SNPPos - SNP position AssessedAllele - Assessed allele, the Z-score refers to this allele OtherAllele - Not assessed allele Zscore - Z-score Gene - ENSG name (Ensembl v71) of the eQTL gene GeneSymbol - HGNC name of the gene GeneChr - Gene chromosome GenePos - Centre of gene position NrCohorts - Total number of cohorts where this SNP-gene combination was tested NrSamples - Total number of samples where this SNP-gene combination was tested FDR - False discovery rate estimated based on permutations BonferroniP - P-value after Bonferroni correction Additional information ---------------------- These files contain all cis-eQTL results from eQTLGen, accompanying the article. 19,250 genes that showed expression in blood were tested. Every SNP-gene combination with a distance <1Mb from the center of the gene and tested in at least 2 cohorts was included. Associations where SNP/proxy positioned in Illumina probe were not removed from combined analysis.
GTEx數據下載download GTEx files
Data available include:
- BAM files for RNA-Seq, Whole Exome Seq, and Whole Genome Seq
- Genotype Calls (.vcf) for OMNI SNP Arrays, WES, and WGS
- OMNI SNP Array Intensity files (.idat and .gtc)
- Affymetrix Expression Array Intensity files (.cel)
- Allele Specific Expression (ASE) tables
- All expression matrices from the Portal, including samples that did not pass the Analysis Freeze QC
- Sample Attributes
- Subject Phenotypes
數據格式
下載GTEx_Analysis_v8_eQTL_EUR.tar,某個population的數據
解壓后有三個文件夾:
eqtls expression_matrices expression_covariates
eqtls:按組織分文件存儲,每個組織兩個文件
eqtls/Vagina.v8.EUR.egenes.txt.gz:
eqtls/Vagina.v8.EUR.signif_pairs.txt.gz:
Adipose_Subcutaneous.v8.EUR.egenes.txt.gz Esophagus_Gastroesophageal_Junction.v8.EUR.signif_pairs.txt.gz Adipose_Subcutaneous.v8.EUR.signif_pairs.txt.gz Esophagus_Mucosa.v8.EUR.egenes.txt.gz Adipose_Visceral_Omentum.v8.EUR.egenes.txt.gz Esophagus_Mucosa.v8.EUR.signif_pairs.txt.gz Adipose_Visceral_Omentum.v8.EUR.signif_pairs.txt.gz Esophagus_Muscularis.v8.EUR.egenes.txt.gz Adrenal_Gland.v8.EUR.egenes.txt.gz Esophagus_Muscularis.v8.EUR.signif_pairs.txt.gz Adrenal_Gland.v8.EUR.signif_pairs.txt.gz Heart_Atrial_Appendage.v8.EUR.egenes.txt.gz Artery_Aorta.v8.EUR.egenes.txt.gz Heart_Atrial_Appendage.v8.EUR.signif_pairs.txt.gz Artery_Aorta.v8.EUR.signif_pairs.txt.gz Heart_Left_Ventricle.v8.EUR.egenes.txt.gz Artery_Coronary.v8.EUR.egenes.txt.gz Heart_Left_Ventricle.v8.EUR.signif_pairs.txt.gz Artery_Coronary.v8.EUR.signif_pairs.txt.gz Kidney_Cortex.v8.EUR.egenes.txt.gz Artery_Tibial.v8.EUR.egenes.txt.gz Kidney_Cortex.v8.EUR.signif_pairs.txt.gz Artery_Tibial.v8.EUR.signif_pairs.txt.gz Liver.v8.EUR.egenes.txt.gz Brain_Amygdala.v8.EUR.egenes.txt.gz Liver.v8.EUR.signif_pairs.txt.gz Brain_Amygdala.v8.EUR.signif_pairs.txt.gz Lung.v8.EUR.egenes.txt.gz Brain_Anterior_cingulate_cortex_BA24.v8.EUR.egenes.txt.gz Lung.v8.EUR.signif_pairs.txt.gz Brain_Anterior_cingulate_cortex_BA24.v8.EUR.signif_pairs.txt.gz Minor_Salivary_Gland.v8.EUR.egenes.txt.gz Brain_Caudate_basal_ganglia.v8.EUR.egenes.txt.gz Minor_Salivary_Gland.v8.EUR.signif_pairs.txt.gz Brain_Caudate_basal_ganglia.v8.EUR.signif_pairs.txt.gz Muscle_Skeletal.v8.EUR.egenes.txt.gz Brain_Cerebellar_Hemisphere.v8.EUR.egenes.txt.gz Muscle_Skeletal.v8.EUR.signif_pairs.txt.gz Brain_Cerebellar_Hemisphere.v8.EUR.signif_pairs.txt.gz Nerve_Tibial.v8.EUR.egenes.txt.gz Brain_Cerebellum.v8.EUR.egenes.txt.gz Nerve_Tibial.v8.EUR.signif_pairs.txt.gz Brain_Cerebellum.v8.EUR.signif_pairs.txt.gz Ovary.v8.EUR.egenes.txt.gz Brain_Cortex.v8.EUR.egenes.txt.gz Ovary.v8.EUR.signif_pairs.txt.gz Brain_Cortex.v8.EUR.signif_pairs.txt.gz Pancreas.v8.EUR.egenes.txt.gz Brain_Frontal_Cortex_BA9.v8.EUR.egenes.txt.gz Pancreas.v8.EUR.signif_pairs.txt.gz Brain_Frontal_Cortex_BA9.v8.EUR.signif_pairs.txt.gz Pituitary.v8.EUR.egenes.txt.gz Brain_Hippocampus.v8.EUR.egenes.txt.gz Pituitary.v8.EUR.signif_pairs.txt.gz Brain_Hippocampus.v8.EUR.signif_pairs.txt.gz Prostate.v8.EUR.egenes.txt.gz Brain_Hypothalamus.v8.EUR.egenes.txt.gz Prostate.v8.EUR.signif_pairs.txt.gz Brain_Hypothalamus.v8.EUR.signif_pairs.txt.gz Skin_Not_Sun_Exposed_Suprapubic.v8.EUR.egenes.txt.gz Brain_Nucleus_accumbens_basal_ganglia.v8.EUR.egenes.txt.gz Skin_Not_Sun_Exposed_Suprapubic.v8.EUR.signif_pairs.txt.gz Brain_Nucleus_accumbens_basal_ganglia.v8.EUR.signif_pairs.txt.gz Skin_Sun_Exposed_Lower_leg.v8.EUR.egenes.txt.gz Brain_Putamen_basal_ganglia.v8.EUR.egenes.txt.gz Skin_Sun_Exposed_Lower_leg.v8.EUR.signif_pairs.txt.gz Brain_Putamen_basal_ganglia.v8.EUR.signif_pairs.txt.gz Small_Intestine_Terminal_Ileum.v8.EUR.egenes.txt.gz Brain_Spinal_cord_cervical_c-1.v8.EUR.egenes.txt.gz Small_Intestine_Terminal_Ileum.v8.EUR.signif_pairs.txt.gz Brain_Spinal_cord_cervical_c-1.v8.EUR.signif_pairs.txt.gz Spleen.v8.EUR.egenes.txt.gz Brain_Substantia_nigra.v8.EUR.egenes.txt.gz Spleen.v8.EUR.signif_pairs.txt.gz Brain_Substantia_nigra.v8.EUR.signif_pairs.txt.gz Stomach.v8.EUR.egenes.txt.gz Breast_Mammary_Tissue.v8.EUR.egenes.txt.gz Stomach.v8.EUR.signif_pairs.txt.gz Breast_Mammary_Tissue.v8.EUR.signif_pairs.txt.gz Testis.v8.EUR.egenes.txt.gz Cells_Cultured_fibroblasts.v8.EUR.egenes.txt.gz Testis.v8.EUR.signif_pairs.txt.gz Cells_Cultured_fibroblasts.v8.EUR.signif_pairs.txt.gz Thyroid.v8.EUR.egenes.txt.gz Cells_EBV-transformed_lymphocytes.v8.EUR.egenes.txt.gz Thyroid.v8.EUR.signif_pairs.txt.gz Cells_EBV-transformed_lymphocytes.v8.EUR.signif_pairs.txt.gz Uterus.v8.EUR.egenes.txt.gz Colon_Sigmoid.v8.EUR.egenes.txt.gz Uterus.v8.EUR.signif_pairs.txt.gz Colon_Sigmoid.v8.EUR.signif_pairs.txt.gz Vagina.v8.EUR.egenes.txt.gz Colon_Transverse.v8.EUR.egenes.txt.gz Vagina.v8.EUR.signif_pairs.txt.gz Colon_Transverse.v8.EUR.signif_pairs.txt.gz Whole_Blood.v8.EUR.egenes.txt.gz Esophagus_Gastroesophageal_Junction.v8.EUR.egenes.txt.gz Whole_Blood.v8.EUR.signif_pairs.txt.gz
expression_matrices:bed格式的表達數據,后面每一列就是一個人,數據依舊是按組織分文件存儲。
#chr start end gene_id GTEX-111CU GTEX-111FC GTEX-111VG GTEX-111YS GTEX-1122O GTEX-1128S GTEX-11DXX GTEX-11DZ1 GTEX-11EI6 GTEX-11EM3 GTEX chr1 29552 29553 ENSG00000227232.5 -0.8416212335729142 -0.1573106846101707 -0.6744897501960817 -0.1414683013821586 -0.5244005127080409 0.37970195786468147
expression_covariates:協變量,去掉confounder用的。
參考:
GTEx introduction.pdf - 入門簡介必看
The Genotype-Tissue Expression Project