一、樣本方差
設樣本均值為$\bar x$,樣本方差為S2,總體均值為${\rm{\mu }}$,總體方差為${{\rm{\sigma }}^2}$,那么樣本方差
${S^2} = \frac{1}{{n - 1}}\mathop \sum \limits_{i = 1}^n {\left( {{x_i} - \bar x} \right)^2}$
推導:假設樣本數量等於總體數量,應有
${S^2} = \frac{1}{n}\mathop \sum \limits_{i = 1}^n {\left( {{x_i} - \bar x} \right)^2}$
在多次重復抽取樣本過程中,樣本方差會逐漸接近總體方差,假設每次抽取的樣本方差為
(S12,S22,S32…),然后對這些樣本方差求平均值記為E(S2),則
${\rm{E}}\left( {{{\rm{S}}^2}} \right) = {\rm{E}}\left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^n {{\left( {{x_i} - \bar x} \right)}^2}} \right)$
$ = {\rm{E}}\left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^n {{\left( {\left( {{x_i} - \mu } \right) - \left( {\bar x - \mu } \right)} \right)}^2}} \right)$
因為
$\frac{1}{n}\mathop \sum \limits_{i = 1}^n \left( {{x_i} - \mu } \right) = \frac{1}{n}\mathop \sum \limits_{i = 1}^n {x_i} - \mu = \bar x - \mu $
接上式
${\rm{E}}\left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^n {{\left( {\left( {{x_i} - \mu } \right) - \left( {\bar x - \mu } \right)} \right)}^2}} \right) = {\rm{E}}\left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^n {{\left( {{x_i} - \mu } \right)}^2} - \frac{1}{n}\mathop \sum \limits_{i = 1}^n 2({x_i} - \mu )\left( {\bar x - \mu } \right) + \frac{1}{n}\mathop \sum \limits_{i = 1}^n {{\left( {{x_i} - \mu } \right)}^2}} \right)$
$ = {\rm{E}}\left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^n {{\left( {{x_i} - \mu } \right)}^2} - 2\left( {\bar x - \mu } \right)\left( {\bar x - \mu } \right) + {{\left( {\bar x - \mu } \right)}^2}} \right)$
$ = {\rm{\;E}}\left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^n {{\left( {{x_i} - \mu } \right)}^2} - {{\left( {\bar x - \mu } \right)}^2}} \right)$
$ = {\rm{E}}\left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^n {{\left( {{x_i} - \mu } \right)}^2}} \right) - E({\left( {\bar x - \mu } \right)^2}) \le {\sigma ^2}$
所以樣本方差除以n會小於總體方差
${\rm{E}}\left( {\frac{1}{n}\mathop \sum \limits_{i = 1}^n {{\left( {{x_i} - \mu } \right)}^2}} \right) - E({\left( {\bar x - \mu } \right)^2}) = {\sigma ^2} - \frac{1}{n}{\sigma ^2} = \frac{{n - 1}}{n}{\sigma ^2}$
所以樣本方差與總體方差差(n-1)/n倍。
二、協方差
協方差是對兩個隨機變量聯合分布線性相關程度的一種度量。兩個隨機變量越線性相關,協方差越大,完全線性無關,協方差為零。
Cov(x,y) = E[(x-E(x))(y-E(y))]
特殊的當只存在一個變量x,x與自身的協方差等於方差,記作Var(x)
Cov(x,x) =Var(x)= E[(x-E(x))(x-E(x))]
樣本協方差
對於多維隨機變量Q(x1,x2,x3,…,xn),樣本集合為xij=[x1j,x2j,…,xnj](j=1,2,…,m),m為樣本數量,在a,b(a,b=1,2…n)兩個維度內
${\rm{cov}}\left( {{{\rm{x}}_{\rm{a}}},{{\rm{x}}_{\rm{b}}}} \right) = \frac{{\mathop \sum \nolimits_{j = 1}^m \left( {{x_{aj}} - {{\bar x}_a}} \right)\left( {{x_{bj}} - {{\bar x}_b}} \right)}}{{m - 1}}$
三、協方差矩陣
對於多維隨機變量Q(x1,x2,x3,…,xn)我們需要對任意兩個變量(xi,xj)求線性關系,即需要對任意兩個變量求協方差矩陣
Cov(xi,xj)= E[(xi-E(xi))(xj-E(xj))]
\[{\rm{cov}}\left( {{x_i},{x_j}} \right) = \left[ {\begin{array}{*{20}{c}}
{{\rm{cov}}\left( {{x_1},{x_1}} \right)}&{{\rm{cov}}\left( {{x_1},{x_2}} \right)}&{{\rm{cov}}\left( {{x_1},{x_3}} \right)}& \cdots &{{\rm{cov}}\left( {{x_1},{x_{\rm{n}}}} \right)}\\
{{\rm{cov}}\left( {{x_2},{x_1}} \right)}&{{\rm{cov}}\left( {{x_2},{x_2}} \right)}&{{\rm{cov}}\left( {{x_2},{x_3}} \right)}& \cdots &{{\rm{cov}}\left( {{x_2},{x_n}} \right)}\\
{{\rm{cov}}\left( {{x_3},{x_1}} \right)}&{{\rm{cov}}\left( {{x_3},{x_2}} \right)}&{{\rm{cov}}\left( {{x_3},{x_3}} \right)}& \cdots &{{\rm{cov}}\left( {{x_3},{x_n}} \right)}\\
\vdots & \vdots & \vdots & \ddots & \vdots \\
{{\rm{cov}}\left( {{x_m}{x_1}} \right)}&{{\rm{cov}}\left( {{x_m},{x_2}} \right)}&{{\rm{cov}}\left( {{x_m},{x_3}} \right)}& \cdots &{{\rm{cov}}\left( {{x_m},{x_n}} \right)}
\end{array}} \right]\]
【 結束 】