【概率論與數理統計】小結8 - 三大抽樣分布

本文轉載自查看原文 2018-01-21 18:29 11250 數理統計

注：抽樣分布就是統計量的分布，其特點是不包含未知參數且盡可能多的概括了樣本信息。除了常見的正態分布之外，還有卡方分布、t分布和F分布為最常見的描述抽樣分布的分布函數。這幾個分布函數在數理統計中也非常有名。我們常說的卡方檢驗、t檢驗和F檢驗就跟這三個分布有關。下面分別從定義、性質、函數圖像和分位數等方面介紹三大分布。

0. 分位點/分位數（Fractile）

分位數是一個非常重要的概念，一開始也有點難理解。首先要明確一點，分位數分的是面積。更准確的說，分位數分的是某個特定分布的概率密度函數曲線下的面積。每給定一個分位數，這個概率密度函數曲線就被該點一分為二。

在英語中，表示分位數的有兩個詞，它們的區別如下：

As nouns the difference between fractile and quantile is that fractile is (statistics) the value of a distribution for which some fraction of the sample lies below while quantile is (statistics) one of the class of values of a variate which divides the members of a batch or sample into equal-sized subgroups of adjacent values or a probability distribution into distributions of equal probability.

摘自，https://wikidiff.com/fractile/quantile

因此，從上面的描述可以看出來這里所說的分位點是指fractile。其實還有一個詞，percentile，這個詞好像用的更多。

四分位數（Quartiles）

四分位數是平時用的比較多的概念，屬於quantile的一種。對於一組數據來說，四分位數就是將這組數據排序后，均分為4部分的3個分割點位置的數值。例如1， 3， 5， 7， 9， 11，其3個四分位點分別是3，6，9。分別叫做第一四分位數（Q1），第二四分位數（Q2），第三四分位數（Q3）。

對於概率密度函數來說，四分位點就是將概率密度曲線下的面積均分為4部分的點。

上$\alpha$分位數（Upper Percentile）

上$\alpha$分位數是概率密度函數定義域內的一個數值，這個數值將概率密度函數曲線下的面積沿x軸分成了兩個部分，其中該點右側部分概率密度函數曲線與x軸圍成的面積等於$\alpha$。

圖1，某分布的上$\alpha$分位數，$x_{\alpha}$

由於概率密度函數曲線下的面積就是概率，因此上$\alpha$分位數中的$\alpha$既是該點右側區域的面積，也是在這個分布中取到所有大於該點的值的概率。

參考圖1，$p(x \geq x_{\alpha}) = \alpha$

此時有兩個值，一個是$\alpha$，另一個是$x_{\alpha}$。這兩個值中確定其中一個，另一個值也就確定了。因此我們可以通過一個給定的$\alpha$值，求在某個特定分布中的上$\alpha$分位數，即$x_{\alpha}$，的值；也可以在某個特定分布中，任意給定一個定義域內的點$x$，求取到比該點的值更大的值的概率，即該點的$\alpha$值。

1. 卡方分布（$\chi^2$）

從其名稱中可以看到，卡方分布跟平方有關。事實也是這樣，卡方分布是由服從標准正態分布的隨機變量的平方和組成的。

1.1 定義

設隨機變量 $X_1, X_2, ..., X_n$ 相互獨立，都服從 $N(0, 1)$，則稱

$$\chi^2 = \displaystyle \sum_{ i = 1 }^{ n } X_i^2$$

服從自由度為n的$\chi^2$分布，記為$\chi^2 \sim \chi^2(n)$

自由度是指上式右端包含的獨立變量的個數。

1.2 性質

設 $\chi^2 \sim \chi^2(n)$，則

$E(\chi^2) = n$，$D(\chi^2) = 2n$；
$\chi^2$分布的可加性：設$Y_1 \sim \chi^2(n_1), Y_2 \sim \chi^2(n_2)$，且$Y_1, Y_2$相互獨立，則 $Y_1 + Y_2 \sim \chi^2(n_1 + n_2)$。該性質可以推廣到有限個隨機變量的情形，設 $Y_1, ..., Y_m$ 相互獨立，$Y_i \sim \chi^2(n_i)$，則$\displaystyle \sum_{ i = 1 }^{ m } Y_i = \chi^2(\displaystyle \sum_{ i = 1}^{m} n_i)$

1.3 函數圖像

卡方分布的概率密度曲線如下：

圖2，卡方分布$\chi^2(20)$的PDF圖

Python實現代碼如下：

參考：https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2.html#scipy.stats.chi2

 1 def chi2_distribution(df=1):
 2     """
 3     卡方分布，在實際的定義中只有一個參數df，即定義中的n
 4     :param df: 自由度，也就是該分布中獨立變量的個數
 5     :return:
 6     """
 7 
 8     fig, ax = plt.subplots(1, 1)
 9 
10     # 直接傳入參數, Display the probability density function (pdf)
11     x = np.linspace(stats.chi2.ppf(0.001, df),
12                     stats.chi2.ppf(0.999, df), 200)
13     ax.plot(x, stats.chi2.pdf(x, df), 'r-',
14             lw=5, alpha=0.6, label=r'$\chi^2$ pdf')
15 
16     # 從凍結的均勻分布取值, Freeze the distribution and display the frozen pdf
17     chi2_dis = stats.chi2(df=df)
18     ax.plot(x, chi2_dis.pdf(x), 'k-',
19             lw=2, label='frozen pdf')
20 
21     # 計算ppf分別等於0.001, 0.5, 0.999時的x值
22     vals = chi2_dis.ppf([0.001, 0.5, 0.999])
23     print(vals)  # [ 2.004  4.     5.996]
24 
25     # Check accuracy of cdf and ppf
26     print(np.allclose([0.001, 0.5, 0.999], chi2_dis.cdf(vals)))  # Ture
27 
28     # Generate random numbers
29     r = chi2_dis.rvs(size=10000)
30     ax.hist(r, normed=True, histtype='stepfilled', alpha=0.2)
31     plt.ylabel('Probability')
32     plt.title(r'PDF of $\chi^2$({})'.format(df))
33     ax.legend(loc='best', frameon=False)
34     plt.show()
35 
36 chi2_distribution(df=20)

其實在scipy對卡方分布的說明中，卡方分布還有其他兩個參數，loc和scale，默認情況下，$loc = 0$， $scale = 1$。這時相當於是一個標准化的卡方分布，可以根據loc和scale對函數進行平移和縮放。官方文檔是這樣描述的：

The probability density above is defined in the “standardized” form. To shift and/or scale the distribution use the loc and scale parameters. Specifically, chi2.pdf(x, df, loc, scale) is identically equivalent to chi2.pdf(y, df) / scale with y = (x - loc) / scale.

不同參數的卡方分布

圖3，不同參數下的卡方分布PDF圖像

當自由度df等於1或2時，函數圖像都呈單調遞減的趨勢；當df大於等於3時，呈先增后減的趨勢。從定義上來看，df的值只能取正整數，但是實際上傳入小數也可以做出圖像（此時的df不知道該如何解釋）。

圖3的Python實現代碼如下：

 1 def diff_chi2_dis():
 2     """
 3     不同參數下的卡方分布
 4     :return:
 5     """
 6     # chi2_dis_0_5 = stats.chi2(df=0.5)
 7     chi2_dis_1 = stats.chi2(df=1)
 8     chi2_dis_4 = stats.chi2(df=4)
 9     chi2_dis_10 = stats.chi2(df=10)
10     chi2_dis_20 = stats.chi2(df=20)
11 
12     # x1 = np.linspace(chi2_dis_0_5.ppf(0.01), chi2_dis_0_5.ppf(0.99), 100)
13     x2 = np.linspace(chi2_dis_1.ppf(0.65), chi2_dis_1.ppf(0.9999999), 100)
14     x3 = np.linspace(chi2_dis_4.ppf(0.000001), chi2_dis_4.ppf(0.999999), 100)
15     x4 = np.linspace(chi2_dis_10.ppf(0.000001), chi2_dis_10.ppf(0.99999), 100)
16     x5 = np.linspace(chi2_dis_20.ppf(0.00000001), chi2_dis_20.ppf(0.9999), 100)
17     fig, ax = plt.subplots(1, 1)
18     # ax.plot(x1, chi2_dis_0_5.pdf(x1), 'b-', lw=2, label=r'df = 0.5')
19     ax.plot(x2, chi2_dis_1.pdf(x2), 'g-', lw=2, label='df = 1')
20     ax.plot(x3, chi2_dis_4.pdf(x3), 'r-', lw=2, label='df = 4')
21     ax.plot(x4, chi2_dis_10.pdf(x4), 'b-', lw=2, label='df = 10')
22     ax.plot(x5, chi2_dis_20.pdf(x5), 'y-', lw=2, label='df = 20')
23     plt.ylabel('Probability')
24     plt.title(r'PDF of $\chi^2$ Distribution')
25     ax.legend(loc='best', frameon=False)
26     plt.show()
27 
28 diff_chi2_dis()

1.4 分位數的計算

這里主要說明上分位數的計算，因為這也是用的比較多的情況。按照上分位數的定義，如果要計算$x_0$的$\alpha$值，只需要計算$1 - cdf_x$就可以得到。

參考圖2，我們分別計算在卡方分布$\chi^2(20)$中，$x = 20$以及$x = 40$時對應的$\alpha$值。

>>> from scipy import stats
>>> 1 - stats.chi2.cdf(20, 20)
0.45792971447185216
>>> 1 - stats.chi2.cdf(40, 20)
0.0049954123083075785

上面的cdf函數有兩個參數，第一個位置是$x$的值，第二個位置是$df$的值。從圖2中也可以看到，$20$基本上位於PDF圖像中間的位置，該位置右邊基本上占了整個PDF的一半；$40$這個值非常靠右，該值右邊的面積非常小，計算得出在該分布中只有大約0.5%的值大於40。

由於計算某個分布下特定$x$的$\alpha$值在統計的應用中非常重要，因此有專門的函數來做相關的計算，這個專門的函數在$\alpha$值非常小的情況下（即$x$的值在圖像中非常靠右），計算出來的結果比上面的方法更精確。

下面是官方文檔的說明：

Survival function (also defined as 1 - cdf, but sf is sometimes more accurate).

>>> stats.chi2.sf(20, 20)
0.45792971447185232
>>> stats.chi2.sf(40, 20)
0.0049954123083075785
>>> stats.chi2.sf(100, 20)
1.2596084591660847e-12
>>> 1 - stats.chi2.cdf(100, 20)
1.2596590437397026e-12
>>> 1 - stats.chi2.cdf(1000, 20)
0.0
>>> stats.chi2.sf(1000, 20)
3.9047966391213445e-199

從上面可以看到，當$x = 1000$時，用第一種方法的精度已經不夠用了，但是第二種方法還是可以計算出一個非零的數值。

在介紹分位數時，說過在某個分布中，$x$與$\alpha$知道其中一個，就可以計算出另一個值來。上面的方法是已知$x$計算$\alpha$，下面是根據$\alpha$的值，求對應的$x$。

>>> stats.chi2.isf(0.995, 20)
7.4338442629342243
>>> stats.chi2.isf(0.95, 20)
10.850811394182589
>>> stats.chi2.isf(0.5, 20)
19.337429229428256
>>> stats.chi2.isf(0.05, 20)
31.410432844230929
>>> stats.chi2.isf(0.005, 20)
39.996846312938651

看到0.05這樣的值是不是很熟悉？其實這個過程就是我們學統計時查表的過程，通常概率論與數理統計方面的書后面的附表都會有常見分布的"上側分位點表"。有了Python，我們以后就不需要翻書查表了。

參考這里：https://www.medcalc.org/manual/chi-square-table.php

這里的$\alpha$就相當於假設檢驗中的$p$值。

圖4，統計相關書籍中的附表

2. t分布

t分布的推導最早由大地測量學家Friedrich Robert Helmert於1876年提出，並由數學家Lüroth證明。英國人威廉·戈塞（Willam S. Gosset）於1908年再次發現並發表了t分布，當時他還在愛爾蘭都柏林的吉尼斯（Guinness）啤酒釀酒廠工作。酒廠雖然禁止員工發表一切與釀酒研究有關的成果，但允許他在不提到釀酒的前提下，以筆名發表t分布的發現，所以論文使用了“學生”（Student）這一筆名。之后t檢驗以及相關理論經由羅納德·費雪（Sir Ronald Aylmer Fisher）發揚光大，為了感謝戈塞的功勞，費雪將此分布命名為學生t分布（Student's t）。

2.1 定義

設$X \sim N(0, 1), Y \sim \chi^2(n)$，且X和Y相互獨立，則稱隨機變量$T = \frac{X} {\sqrt{Y/n}}$服從自由度為n的t分布，記為 $T \sim t(n)$。

當n=1的t分布，就是柯西分布

2.2 性質

設$T \sim t(n)$，則

當$n > 1$時，$E(T) = 0$，當$n = 1$時，期望不存在（參考柯西分布的期望，link）
當$n > 2$時，$D(T) = \frac{n} {n - 2}$，當$n \leq 2$時，方差不存在

2.3 函數圖像

圖5，t(20)的PDF圖像

Python代碼如下，更上面的代碼差別不大：

參考：https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.t.html

 1 import numpy as np
 2 from scipy import stats
 3 import matplotlib.pyplot as plt
 4 # http://user.engineering.uiowa.edu/~dbricker/Stacks_pdf1/Sampling_Distns.pdf
 5 # https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.t.html
 6 
 7 
 8 def t_distribution(df=1.0):
 9     """
10     t分布，在實際的定義中只有一個參數df，即定義中的n
11     :param df: 自由度，也就是該分布包含的卡方分布中獨立變量的個數
12     :return:
13     """
14 
15     fig, ax = plt.subplots(1, 1)
16 
17     # 直接傳入參數, Display the probability density function (pdf)
18     x = np.linspace(stats.t.ppf(0.001, df),
19                     stats.t.ppf(0.999, df), 200)
20     ax.plot(x, stats.t.pdf(x, df), 'r-',
21             lw=5, alpha=0.6, label=r't pdf')
22 
23     # 從凍結的t分布取值, Freeze the distribution and display the frozen pdf
24     t_dis = stats.t(df=df)
25     ax.plot(x, t_dis.pdf(x), 'k-',
26             lw=2, label='frozen pdf')
27 
28     # 計算ppf分別等於0.001, 0.5, 0.999時的x值
29     vals = t_dis.ppf([0.001, 0.5, 0.999])
30     print(vals)  # [ -3.55180834e+00   6.72145055e-17   3.55180834e+00]
31 
32     # Check accuracy of cdf and ppf
33     print(np.allclose([0.001, 0.5, 0.999], t_dis.cdf(vals)))  # Ture
34 
35     # Generate random numbers
36     r = t_dis.rvs(size=10000)
37     ax.hist(r, normed=True, histtype='stepfilled', alpha=0.2)
38     plt.ylabel('Probability')
39     plt.title(r'PDF of t({})'.format(df))
40     ax.legend(loc='best', frameon=False)
41     plt.show()
42 
43 t_distribution(df=20)

View Code

圖6，不同參數下的t分布

從圖6中可以看到，$t(1)$與標准正態分布之間的差別還是比較大的，但是當自由度n趨近於無窮大時，t分布與標准正態分布沒有差別（公式上的形式將變得完全相同，這里沒有列出概率密度函數的公式）。較大的區別在於，當自由度n較小時，t分布比標准正態分布的尾部更寬（fatter tails），因此也比正態分布更慢的趨近於0。關於這兩類分布的異同將會在后面的假設檢驗部分詳細闡述。

Python代碼如下：

 1 def diff_t_dis():
 2     """
 3     不同參數下的t分布
 4     :return:
 5     """
 6     norm_dis = stats.norm()
 7     t_dis_1 = stats.t(df=1)
 8     t_dis_4 = stats.t(df=4)
 9     t_dis_10 = stats.t(df=10)
10     t_dis_20 = stats.t(df=20)
11 
12     x1 = np.linspace(norm_dis.ppf(0.000001), norm_dis.ppf(0.999999), 1000)
13     x2 = np.linspace(t_dis_1.ppf(0.04), t_dis_1.ppf(0.96), 1000)
14     x3 = np.linspace(t_dis_4.ppf(0.001), t_dis_4.ppf(0.999), 1000)
15     x4 = np.linspace(t_dis_10.ppf(0.001), t_dis_10.ppf(0.999), 1000)
16     x5 = np.linspace(t_dis_20.ppf(0.001), t_dis_20.ppf(0.999), 1000)
17     fig, ax = plt.subplots(1, 1)
18     ax.plot(x1, norm_dis.pdf(x1), 'r-', lw=2, label=r'N(0, 1)')
19     ax.plot(x2, t_dis_1.pdf(x2), 'b-', lw=2, label='t(1)')
20     ax.plot(x3, t_dis_4.pdf(x3), 'g-', lw=2, label='t(4)')
21     ax.plot(x4, t_dis_10.pdf(x4), 'm-', lw=2, label='t(10)')
22     ax.plot(x5, t_dis_20.pdf(x5), 'y-', lw=2, label='t(20)')
23     plt.ylabel('Probability')
24     plt.title(r'PDF of t Distribution')
25     ax.legend(loc='best', frameon=False)
26     plt.show()
27 
28 diff_t_dis()

View Code

2.4 t分布中上$\alpha$分位數的計算

在Python中的計算方法，參考1.4

3. F分布

F分布由兩個卡方分布構成。

3.1 定義

設$X \sim \chi^2(n_1)$，$Y \sim \chi^2(n_2)$，且$X, Y$相互獨立，則稱隨機變量$F = \frac{X/n_1} {Y/n_2}$服從自由度為$(n_1, n_2)$的F分布，記為$F \sim F(n_1, n_2)$。其中$n_1$稱為第一自由度，$n_2$稱為第二自由度。

3.2 性質

設$F \sim F(n, m)$，則

$E(F) = \frac{m}{m - 2}$，其中$m \geq 2$，否則期望不存在
$D(F) = \frac{2m^2(n + m - 2)} {n(m - 2)^2(m - 4)}$，其中$m \geq 4$，否則方差不存在
$\frac{1} {F} \sim F(m, n)$，即F分布的倒數也是F分布（參數交換）

3.3 函數圖像

圖7，F(4, 10)的PDF圖像

下面是Pyhon實現的代碼，圖7使用Spyder畫出來的圖，與上面使用Pycharm畫出來的圖差別還有點大。

參考：https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f.html

 1 import numpy as np
 2 from scipy import stats
 3 import matplotlib.pyplot as plt
 4 
 5 
 6 def f_distribution(dfn=4, dfd=10):
 7     """
 8     F分布，有兩個參數dfn, dfd，分別表示定義中的n1和n2
 9     :param dfn: 第一自由度，分子中卡方分布的自由度
10     :param dfd: 第二自由度，分母中卡方分布的自由度
11     :return:
12     """
13 
14     fig, ax = plt.subplots(1, 1)
15 
16     # 直接傳入參數, Display the probability density function (pdf)
17     x = np.linspace(stats.f.ppf(0.0001, dfn, dfd),
18                     stats.f.ppf(0.999, dfn, dfd), 200)
19     ax.plot(x, stats.f.pdf(x, dfn, dfd), 'r-',
20             lw=5, alpha=0.6, label=r'f pdf')
21 
22     # 從凍結的均勻分布取值, Freeze the distribution and display the frozen pdf
23     f_dis = stats.f(dfn=dfn, dfd=dfd)
24     ax.plot(x, f_dis.pdf(x), 'k-',
25             lw=2, label='frozen pdf')
26 
27     # 計算ppf分別等於0.001, 0.5, 0.999時的x值
28     vals = f_dis.ppf([0.001, 0.5, 0.999])
29     print(vals)  # [  0.02081053   0.89881713  11.28275151]
30 
31     # Check accuracy of cdf and ppf
32     print(np.allclose([0.001, 0.5, 0.999], f_dis.cdf(vals)))  # Ture
33 
34     # Generate random numbers
35     r = f_dis.rvs(size=10000)
36     ax.hist(r, normed=True, histtype='stepfilled', alpha=0.2)
37     plt.ylabel('Probability')
38     plt.title(r'PDF of F({}, {})'.format(dfn, dfd))
39     ax.legend(loc='best', frameon=False)
40     plt.savefig('f_dist_pdf2.png', dip=200)
41 
42 f_distribution(dfn=4, dfd=10)

View Code

F分布有兩個參數，dfn和dfd，分別代表分子上的第一自由度和分母上的第二自由度。

下面是不同參數下，F分布的概率密度函數圖像：

圖8，不同參數下的F分布

Python實現的代碼：

 1 def diff_f_dis():
 2     """
 3     不同參數下的F分布
 4     :return:
 5     """
 6 #    f_dis_0_5 = stats.f(dfn=10, dfd=1)
 7     f_dis_1_30 = stats.f(dfn=1, dfd=30)
 8     f_dis_30_5 = stats.f(dfn=30, dfd=5)
 9     f_dis_30_30 = stats.f(dfn=30, dfd=30)
10     f_dis_30_100 = stats.f(dfn=30, dfd=100)
11     f_dis_100_100 = stats.f(dfn=100, dfd=100)
12 
13 #    x1 = np.linspace(f_dis_0_5.ppf(0.01), f_dis_0_5.ppf(0.99), 100)
14     x2 = np.linspace(f_dis_1_30.ppf(0.2), f_dis_1_30.ppf(0.99), 100)
15     x3 = np.linspace(f_dis_30_5.ppf(0.00001), f_dis_30_5.ppf(0.99), 100)
16     x4 = np.linspace(f_dis_30_30.ppf(0.00001), f_dis_30_30.ppf(0.999), 100)
17     x6 = np.linspace(f_dis_30_100.ppf(0.0001), f_dis_30_100.ppf(0.999), 100)
18     x5 = np.linspace(f_dis_100_100.ppf(0.0001), f_dis_100_100.ppf(0.9999), 100)
19     fig, ax = plt.subplots(1, 1, figsize=(20, 10))
20 #    ax.plot(x1, f_dis_0_5.pdf(x1), 'b-', lw=2, label=r'F(0.5, 0.5)')
21     ax.plot(x2, f_dis_1_30.pdf(x2), 'g-', lw=2, label='F(1, 30)')
22     ax.plot(x3, f_dis_30_5.pdf(x3), 'r-', lw=2, label='F(30, 5)')
23     ax.plot(x4, f_dis_30_30.pdf(x4), 'm-', lw=2, label='F(30, 30)')
24     ax.plot(x6, f_dis_30_100.pdf(x6), 'c-', lw=2, label='F(30, 100)')
25     ax.plot(x5, f_dis_100_100.pdf(x5), 'y-', lw=2, label='F(100, 100)')
26 
27     plt.ylabel('Probability')
28     plt.title(r'PDF of f Distribution')
29     ax.legend(loc='best', frameon=False)
30     plt.savefig('f_diff_pdf.png', dip=500)
31     plt.show()
32 
33 diff_f_dis()

View Code

3.4 分位數的計算

參考1.4

4. 三大抽樣分布之間的聯系

下面這個例子，可以展示這三大抽樣分布於標准正態分布的聯系，以及它們自身之間的聯系：

$X, Y, Z$相互獨立，且都服從$N(0, 1)$分布，那么：

$X^2 + Y^2 + Z^2 \sim \chi^2(3)$
$\frac{X} {\sqrt{(Y^2 + Z^2)/2}} \sim t(2)$
$\frac{2X^2} {Y^2 + Z^2} \sim F(1, 2)$
若$t \sim t(n)$，那么$t^2 \sim F(1, n)$

圖9，三類抽樣分布於標准正態分布之間的比較

從圖9可以看到，t分布和標准正態分布都是左右對稱的，偏度為0（偏度為0也可能不對稱），但是卡方分布和F分布都不對稱，呈正偏態（右側的尾部更長，分布的主體集中在左側）。

圖9的Python代碼如下：

 1 def three_sampling_dis():
 2     """
 3     三大抽樣分布與標准正態分布
 4     :return:
 5     """
 6     nor_dis = stats.norm()
 7     chi2_dis = stats.chi2(df=4)
 8     t_dis = stats.t(df=5)
 9     f_dis = stats.f(dfn=30, dfd=5)
10 
11     x1 = np.linspace(nor_dis.ppf(0.001), nor_dis.ppf(0.999), 1000)
12     x2 = np.linspace(chi2_dis.ppf(0.001), chi2_dis.ppf(0.999), 1000)
13     x3 = np.linspace(t_dis.ppf(0.001), t_dis.ppf(0.999), 1000)
14     x4 = np.linspace(f_dis.ppf(0.001), f_dis.ppf(0.999), 1000)
15     fig, ax = plt.subplots(1, 1, figsize=(16, 8))
16     ax.plot(x1, nor_dis.pdf(x1), 'r-', lw=2, label=r'N(0, 1)')
17     ax.plot(x2, chi2_dis.pdf(x2), 'g-', lw=2, label=r'$\chi^2$(4)')
18     ax.plot(x3, t_dis.pdf(x3), 'b-', lw=2, label='t(5)')
19     ax.plot(x4, f_dis.pdf(x4), 'm-', lw=2, label='F(30, 10)')
20 
21     plt.ylabel('Probability')
22     plt.title(r'PDF of Three Sampling Distribution')
23     ax.legend(loc='best', frameon=False)
24     plt.savefig('diff_dist_pdf.png', dip=500)
25     plt.show()
26 
27 three_sampling_dis()

歡迎閱讀“概率論與數理統計及Python實現”系列文章

Reference

http://www.auburn.edu/~zengpen/teaching/table-chisq.pdf

https://en.wikipedia.org/wiki/F-distribution

http://mathworld.wolfram.com/F-Distribution.html

https://math.stackexchange.com/questions/1087106/find-the-mean-and-the-variance-of-an-f-random-variable-with-r-1-and-r-2-degr

https://zh.wikipedia.org/wiki/%E5%81%8F%E5%BA%A6

中國大學MOOC：浙江大學&哈工大，概率論與數理統計

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【概率論與數理統計】小結9-2 - 點估計【概率論與數理統計】小結9-3 - 區間估計【概率論與數理統計】小結1 - 概率論中的基本概念概率論與數理統計目錄 MATLAB中的概率論與數理統計概率論與數理統計，筆記概率論與數理統計總結筆記整理：概率論與數理統計【概率論與數理統計】小結7 - 統計學中的基本概念【概率論與數理統計】小結5 - 隨機變量的數字特征