人類疾病的遺傳分析 | genetic analysis | GWAS


 

因為沒有在這一行真正干過,所以一些基本概念還不是很明確,這里作一個梳理。

看本領域的遺傳學綜述有感:The Emerging Genetic Landscape of Hirschsprung Disease and Its Potential Clinical Applications

 

問題:

  • 什么是遺傳分析genetic analysis?發展邏輯是什么?
  • 疾病和變異如何分類?mendelian form是什么?syndromic or isolated是什么?de novo和inherited variant?
  • rare damaging or common regulatory risk variants?universal和ancestry-specific risk alleles?
  • effect size是啥?OR是啥?
  • positional cloning是什么?linkage analysis、linkage mapping是什么?基本原理是什么?
  • trio-based和case-control studies的區別?trios分析是什么?

深入:

  • Statistical power和sample size的關系
  • effect size和allele frequency的關系

 

遺傳分析

概念非常大,現在的GWAS,以及WGS、WES等應該都屬於遺傳分析。

Genetic analysis is the overall process of studying and researching in fields of science that involve genetics and molecular biology.

 

發展邏輯

遺傳學三大定律

  1. 基因分離定律
  2. 基因自由組合定律
  3. 基因的連鎖和交換定律

 

第一階段:連鎖分析linkage mapping找疾病相關的coding mutation

背景:此時我們已經知道性狀表型是由基因控制,且認為基因功能受編碼區影響(非編碼區的功能還不清楚),也已經掌握了連鎖分析來定位疾病相關的基因,主要是從家系分析開始。

局限性:rare coding variants in RET appear to play a less prominent role in sporadic and S-HSCR compared to the familial and L-HSCR。

解釋:coding variant首先肯定非常rare,代表性不足,其次一般符合常染色體顯性遺傳的特點,所以在sporadic里不是研究重點。

 

第二階段:基於high-throughput SNP array的GWAS找common variants

背景:coding是YES or NO的問題,無法解釋外顯率的差異,。Altogether, the rare damaging variants identified in RET and EDNRB pathways explain only a small fraction (<30%) of sporadic HSCR cases. 罕見的編碼區變異已經解釋力不足了,找不到新的基因,且無法解釋表達差異和疾病severity。必須解決missing heritability的問題。

如何理解common variants?the problem is largely a result of the small contribution of most variants, either because the variants are too rare to contribute population-wide, or because the effect sizes of common variants are, in general, very small.

如何理解penetrance外顯率?不是有genotype就肯定有phenotype的,因為有多因素在互作。Penetrance refers to the likelihood that a clinical condition will occur when a particular genotype is present. For adult-onset diseases, penetrance is usually described by the individual carrier's age, sex, and organ site.

如何理解haplotype的over-representation?

如何理解a common functional RET intron 1 enhancer variant (RET+3; rs2435357 T/C) that largely increases risk of HSCR (OR~5).

如何理解Epistatic interaction?Epistasis has been used to describe a number of phenomena, including the functional interaction between genes, the genetic outcome of mutations acting within the same genetic pathway, and the statistical deviation from additive gene action.

GWAS的成果:Altogether, these findings implied that common variants can predispose to HSCR in a low penetrance manner by modifying the phenotypic expression, which opened up a new area of genetic research on HSCR, including family-based and population-based association studies by detecting transmission disequilibrium of common singlenucleotide polymorphisms (SNP) from parents to proband and comparing frequencies of SNPs in cases vs. controls, respectively

 

第三階段:基於WES和WGS的GWAS分析鑒定rare variants

背景:基於SNP array的GWAS只能鑒定common variants,WES則連coding區域的rare variant都能鑒定出來,WGS則連non-coding區的variant也能鑒定。

 

第四階段:CNV和SV的鑒定

 

定位克隆

Positional cloning is a laboratory technique used to locate the position of a disease-associated gene along the chromosome. This approach works even when little or no information is available about the biochemical basis of the disease. Positional cloning is used in conjunction with linkage analysis.

也分傳統的和基因組的兩大類

Traditional versus postgenomic positional cloning strategies. A: Before the availability of the genome sequence, positional cloning involved several labor-intensive steps. After genetic mapping to a chromosomal region, the physical portion of the genome was isolated on large insert DNA clones, often requiring dozens of clones to cover the region. Genes residing on the large insert clones were identified experimentally. Once genes were identified, they were evaluated for potential involvement in the disease by gene expression analysis and DNA sequencing to identify the mutation. B: The availability of the genome sequence streamlines the positional cloning approach, supplanting experimental techniques with in silico analysis of physical map position and gene content. Additional resources, such as gene ontology and gene expression databases, help prioritize candidate genes for mutation analysis. 

 

連鎖分析linkage analysis

【不是關聯分析】

Genetic linkage analysis is a powerful tool to detect the chromosomal location of disease genes. It is based on the observation that genes that reside physically close on a chromosome remain linked during meiosis.

Linkage analysis is a statistical genetic method that aims to identify chromosomal regions that cosegregate with a disease of interest through pedigrees.

For many years, linkage analysis was the primary tool used for the genetic mapping of Mendelian and complex traits with familial aggregation. Linkage analysis was largely supplanted by the wide adoption of genome-wide association studies (GWASs). However, with the recent increased use of whole-genome sequencing (WGS), linkage analysis is again emerging as an important and powerful analysis method for the identification of genes involved in disease aetiology, often in conjunction with WGS filtering approaches. Here, we review the principles of linkage analysis and provide practical guidelines for carrying out linkage studies using WGS data.

基本原理

These studies built upon the fundamental idea that disease causal variant and nearby genetic markers tend to be transmitted together due to linkage disequilibrium(LD). Such approach of positional cloning and linkage analysis have been applied to multiplex families where highly informative genetic markers are used to map the disease-associated loci of large effect. Once a locus is linked, a search for rare damaging mutations (i.e., variants with minor allele frequency <1% in general population) in candidate genes within the locus is ensued. This strategy remained very popular especially before the GWAS era.

 

關聯分析association studies

因為技術原因,關聯分析已經等同於全基因組關聯分析了。

In genomics, a genome-wide association study (GWA study, or GWAS), also known as whole genome association study (WGA study, or WGAS), is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant is associated with a trait. GWA studies typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major human diseases, but can equally be applied to any other genetic variants and any other organisms.

 

effect size和allele frequency的關系

  1. Relationship between effect size and allele frequency (adopted from [25, 26]). Extremely rare genetic variants with large effect sizes (upper left, strong red color) are often identified in family-based genome-wide linkage analyses.
  2. Common genetic variants with small effect sizes (lower right, strong green color) have been identified in traditional GWAS (including only common variants).
  3. Rare variants with small effects (lower left) are difficult to identify.
  4. Whereas common genetic variants with large effects (upper right) have been identified using both linkage analysis and GWAS, however these are highly unusual for common diseases

 

Primary research strategies for identification of genetic variants across the allele frequency spectrum (adopted from [27]).

  1. Genome-wide linkage studies are well suited to identification of genetic variants with allele frequencies below 0.3 % with large effect sizes (OR > 5).
  2. Targeted resequencing often leads to identification of genetic variants with allele frequencies between 0.3 and 5 % with moderate effect sizes (2 < OR < 5), but may also be used to identify rare variants with large effects and common variants with modest effects.
  3. Traditional GWAS is suited to identification of common genetic variants with modest effect sizes (OR < 2)

 

裝逼術語

post-genomic era

post-GWAS era

post-single cell era

 

 

參考

  • Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems
  • Rare and common variants: twenty arguments
  • Genetic linkage analysis in the age of whole-genome sequencing
  • Benefits and limitations of genome-wide association studies
  • The Norwegian preeclampsia family cohort study: A new resource for investigating genetic aspects and heritability of preeclampsia and related phenotypes

Statistical power and significance testing in large-scale genetic studies - Pak Sham

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM