Tajima F (1989) Genetics 123:585-595
1 Average number of pairwise nucleotide differences
\[\hat k = \frac{\sum\sum_{i<j} k_{ij}}{\binom{n}{2}}\]
\(k_{ij}\), the number of nucleotide differences between the \(i\)-th and \(j\)-th DNA sequences
2 D statistic
\[D = \frac{\hat k - \frac{S}{a_1}}{\sqrt{e_1 S + e_2 S (S-1)}}\]
\[a_1 = \sum \limits_{i=1}^{n-1} \frac{1}{i}\]
\[a_2 = \sum \limits_{i=1}^{n-1} \frac{1}{i^2}\]
\[b_1 = \frac{n+1}{3(n-1)}\]
\[b_2 = \frac{2(n^2+n+3)}{9n(n-1)}\]
\[c_1 = b_1 - \frac{1}{a_1}\]
\[c_2 = b_2 - \frac{n+2}{a_1 n} + \frac{a_2}{a_1^2}\]
\[e_1 = \frac{c_1}{a_1}\]
\[e_2 = \frac{c_2}{a_1^2 + a_2}\]
\(S\), the number of segregating (or polymorphic) sites in the sample
3 假设测验置信限
n | 95% | 99% |
100 | -1.781 ~ 2.073 | -2.160 ~ 2.704 |
200 | -1.760 ~ 2.100 | -2.132 ~ 2.768 |
300 | -1.748 ~ 2.114 | -2.114 ~ 2.802 |
400 | -1.740 ~ 2.123 | -2.101 ~ 2.824 |
500 | -1.734 ~ 2.130 | -2.092 ~ 2.840 |
600 | -1.728 ~ 2.135 | -2.084 ~ 2.853 |
800 | -1.721 ~ 2.143 | -2.072 ~ 2.873 |
1000 | -1.715 ~ 2.150 | -2.062 ~ 2.887 |
4 计算
Tajima F (1989) Genetics 123:585-595
Carlson CS, et al. (2005) Genome Res 15:1553-1565
vcftools --vcf geno.cvf --TajimaD 100000
5 示例
一个 SNP 标记,5 个个体
A A C C C
两两个体间基因型差异
i | j | d |
1 | 2 | 0 |
1 | 3 | 1 |
1 | 4 | 1 |
1 | 5 | 1 |
2 | 3 | 1 |
2 | 4 | 1 |
2 | 5 | 1 |
3 | 4 | 0 |
3 | 5 | 0 |
4 | 5 | 0 |
\(\pi = n1*n2 / (n*(n-1)/2) = 2*n1*n2 / (n*(n-1)) = 2*2*3/(4*5) = 0.6\)