相關文章:
A Unified Framework for Association Analysis with Multiple Related Phenotypes
太重要了,不得不單獨拿出來分析一下。本review高度總結了GWAS這10年的成績、以及現在的局限性。每個搞統計遺傳的都必須要好好看看。
第一篇GWAS是什么時候?誰提出的?The first successful GWAS published in 2002 studied myocardial infarction. Ozaki
trans-ethnic和meta analysis的區別?不同人種和整合分析,概念不一樣,目的不一樣
Twin study是什么?優勢是什么?用於估計heritability。本質就是控制變量,同卵雙胞胎share了同樣的遺傳物質,所以他們的表型差異完全可以歸結為非genetic的環境變異。"identical" or monozygotic (MZ) twins share essentially 100% of their genes, which means that most differences between the twins (such as height, susceptibility to boredom, intelligence, depression, etc.) are due to experiences that one twin has but not the other twin.
GWAS里面cohort study是什么?The importance of cohort studies in the postGWAS era
LD和correlation的聯系和區別?LD表征的是一個SNP的特征,correlation是兩個對象之間的相關性。LD就是某個SNP的1M區域內,其所有閾值超過0.5的r2的總和。
heritability是什么?在GWAS中如何計算?heritability在不同人群中是會變的,因為GE interaction會變,看wiki定義。
在GWAS實驗設計中,有哪些因素會影響power?如何計算power和控制power?見下文
什么是genetic architecture?the joint distribution of effect size and allele frequency
GWAS的SNP arrays通常包含多少個SNP位點,是如何選擇出這些位點的?minor allele frequency (MAF) typically larger than 1%.
為什么要做SNP imputation,根據什么來做?haplotype
什么情況下WGS能檢測出rare variant與trait的association?sample size足夠,或者effect size足夠,最終還是要power足夠。
burden testing of rare variant是什么?以基因為單位,檢驗rare variant在case和control中的差異是否顯著。
10 Years of GWAS Discovery: Biology, Function, and Translation
之前的第一個五年總結:Five Years of GWAS Discovery
The GWAS is an experimental design used to detect associations between genetic variants and traits in samples from populations. 可以說genetic variants,也可以說gene,或者loci。
GWAS其實是一種包含了實驗設計和分析的整合方法,主要用於complex-trait的控制基因定位。如果是單基因病monogenic disease的話,就沒必要做GWAS了。
對於正常的性狀,比如身高,定位到的就是控制身高的一些loci;對於疾病就是定位到導致疾病的一些variant上。
variant有很多種,目前GWAS主要關注的是SNP,其實還有InDel、CNV和SV。
這就是我的主要工作,學學別人的措辭:
The path from GWAS to biology is not straightforward because an association between a genetic variant at a genomic locus and a trait is not directly informative with respect to the target gene or the mechanism whereby the variant is associated with phenotypic differences.
The statistical power to detect associations between DNA variants and a trait depends on the experimental sample size, the distribution of effect sizes of (unknown) causal genetic variants that are segregating in the population, the frequency of those variants, and the LD between observed genotyped DNA variants and the unknown causal variants.
In addition, other genome-wide scans, such as WES and WGS studies, allow testing for a burden of rare variants across shared functional units (e.g., genes) in a way that is not accessible to GWASs.
這個視頻講得很好,由淺及深:BroadE: Statistical Genetics - Rare variants
以下是本review的框架:
復雜性狀的高度多基因性Complex Traits Are Highly Polygenic
phenotype可以很general,大到身高、小到基因表達、表觀變化。
Polygenic就是控制復雜性狀的基因或loci是很多的,如何整合解釋它們整體的影響非常重要。
多效性是普遍存在的Pleiotropy Is Pervasive
多效性就是一個位點的突變可能影響了多種表現,這也就是為什么很多表型具有高度的相關性。
新的分析方法學New Analysis Methodology Underpinning New Discovery
GWAS的后續研究主要有以下四個方面:
(1) methods of better modeling population structure and relatedness between individuals in a sample during association analyses,28–34
(2) methods of detecting novel variants and gene loci on the basis of GWAS summary statistics, 35–37
(3) methods of estimating and partitioning genetic (co)variance,38,39 and
(4) methods of inferring causality.40–42
常見變異解釋了大部分的累積遺傳變異Common Variants Together Tag a Substantial Proportion of Additive Genetic Variance
Additive Genetic Variance就是指AA、Aa、aa之間的表型是線性的,而不是顯性和隱性的關系。
遺傳預測方法The Utility of GWAS-Derived Genetic Predictors
polygenic risk score (PRS),就是根據每個個體的變異和effect size,給每一個個體一個具體的患某疾病的打分。
公共數據庫的應用The Public Availability of Data Has Enabled Novel Research and Discoveries
GWAS Catalog - EMBL-EBI,最有名的數據庫。
UK Biobank
從GWAS到生物學From GWAS to Biology
如何填補這個gap,於是出現了很多數據庫:ENCODE Epigenome RoadMap, and GTEx
三個成功的GWAS案例Three Exemplars of GWAS Success
值得一看,如何用精簡的語言高度總結一個項目。