Tajima's D 中性進化檢驗


Tajima F (1989) Genetics 123:585-595

1  Average number of pairwise nucleotide differences

\[\hat k = \frac{\sum\sum_{i<j} k_{ij}}{\binom{n}{2}}\]

\(k_{ij}\), the number of nucleotide differences between the \(i\)-th and \(j\)-th DNA sequences

2  D statistic

\[D = \frac{\hat k - \frac{S}{a_1}}{\sqrt{e_1 S + e_2 S (S-1)}}\]

\[a_1 = \sum \limits_{i=1}^{n-1} \frac{1}{i}\]

\[a_2 = \sum \limits_{i=1}^{n-1} \frac{1}{i^2}\]

\[b_1 = \frac{n+1}{3(n-1)}\]

\[b_2 = \frac{2(n^2+n+3)}{9n(n-1)}\]

\[c_1 = b_1 - \frac{1}{a_1}\]

\[c_2 = b_2 - \frac{n+2}{a_1 n} + \frac{a_2}{a_1^2}\]

\[e_1 = \frac{c_1}{a_1}\]

\[e_2 = \frac{c_2}{a_1^2 + a_2}\]

\(S\), the number of segregating (or polymorphic) sites in the sample

3  假設測驗置信限

n 95% 99%
100 -1.781 ~ 2.073 -2.160 ~ 2.704
200 -1.760 ~ 2.100 -2.132 ~ 2.768
300 -1.748 ~ 2.114 -2.114 ~ 2.802
400 -1.740 ~ 2.123 -2.101 ~ 2.824
500 -1.734 ~ 2.130 -2.092 ~ 2.840
600 -1.728 ~ 2.135 -2.084 ~ 2.853
800 -1.721 ~ 2.143 -2.072 ~ 2.873
1000 -1.715 ~ 2.150 -2.062 ~ 2.887

 

4  計算

Tajima F (1989) Genetics 123:585-595

Carlson CS, et al. (2005) Genome Res 15:1553-1565

vcftools --vcf geno.cvf --TajimaD 100000

5  示例

一個 SNP 標記,5 個個體

A A C C C

兩兩個體間基因型差異

i j d
1 2 0
1 3 1
1 4 1
1 5 1
2 3 1
2 4 1
2 5 1
3 4 0
3 5 0
4 5 0

\(\pi = n1*n2 / (n*(n-1)/2) = 2*n1*n2 / (n*(n-1)) = 2*2*3/(4*5) = 0.6\)


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM