抽樣調查:證明與練習


設計效應抽樣調查 證明與練習

證明部分

證明1

證明:對簡單隨機估計\(\bar{y}\),有\({E}(\bar{y})=\bar{Y}\)\({V}(\bar{y})=\dfrac{1-f}{N}S^2\)

\(a_i\)表示總體中\(Y_i\)入樣這一事件,則\(a_i\)是隨機變量,且

\[{E}(a_i)=f,\quad {V}(a_i)={E}(a_i^2)-[{E}(a_i)]^2=f(1-f),\\ {E}(a_ia_j)=\frac{n(n-1)}{N(N-1)},\\ \mathrm{cov}(a_i,a_j)={E}(a_ia_j)-{E}(a_i){E}(a_j)=\frac{-f(1-f)}{(N-1)}. \]

同時可以對\(\bar{y}\)作變換為

\[\bar{y}=\frac{1}{n}\sum_{i=1}^{n}y_i=\frac{1}{n}\sum_{i=1}^{N}a_iY_i. \]

因此對期望,有

\[\begin{aligned} E(\bar{y})&=\frac{1}{n}E\left(\sum_{i=1}^{N}a_iY_i \right)\\ &=\frac{1}{n}\sum_{i=1}^{N}E(a_i)Y_i\\ &=\frac{f}{n}\sum_{i=1}^{N}Y_i\\ &=\bar{Y}; \end{aligned} \]

對方差,有

\[\begin{aligned} V(\bar{y})&=\frac{1}{n^2}V\left(\sum_{i=1}^{N}a_iY_i \right)\\ &=\frac{1}{n^2}\left[\sum_{i=1}^{N}Y_i^2V(a_i)+2\sum_{i<j}^{N}Y_iY_j\mathrm{cov}(a_i,a_j) \right]\\ &=\frac{1}{n^2}\left[f(1-f)\sum_{i=1}^{N}Y_i^2-2\frac{f(1-f)}{N-1}\sum_{i<j}^{N}Y_iY_j \right]\\ &=\frac{f(1-f)}{n^2}\left[\sum_{i=1}^{N}Y_i^2-\frac{1}{N-1}\sum_{i<j}^{N}2Y_iY_j \right]\\ &=\frac{f(1-f)}{n^2}\left[\sum_{i=1}^{N}Y_i^2-\frac{1}{N-1}\left(\sum_{i=1}^{N}Y_i \right)^2+\frac{1}{N-1}\sum_{i=1}^{N}Y_i^2 \right] \\ &=\frac{f(1-f)}{n^2}\left[\frac{N}{N-1}\sum_{i=1}^{N}Y_i^2-\frac{1}{N-1}\left(\sum_{i=1}^{N}Y_i \right)^2 \right]\\ &=\frac{f(1-f)}{n^2}\frac{N}{N-1}\left[\sum_{i=1}^{N}Y_i^2-N\bar{Y}^2 \right]\\ &=\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{n}(Y_i-\bar{Y})^2\\ &=\frac{1-f}{n}S^2. \end{aligned} \]

證明2

證明:樣本方差是總體方差的無偏估計,即\(E(s^2)=S^2\);樣本協方差是總體協方差的無偏估計,即\(E(s_{yx})=S_{yx}\)

沿用上題的記號,有

\[\begin{aligned} E(s^2)&=E\left[\frac{1}{n-1}\sum_{i=1}^{n}(y_i-\bar{y})^2 \right]\\ &=\frac{1}{n-1}E\left(\sum_{i=1}^{n}y_i^2\right)-\frac{n}{n-1}E(\bar{y}^2)\\ &=\frac{1}{n-1}E\left(\sum_{i=1}^{N}a_iY_i^2 \right)-\frac{n}{n-1}\left[\frac{1-f}{n}S^2+\bar{Y}^2 \right]\\ &=\frac{f}{n-1}\sum_{i=1}^{N}Y_i^2-\frac{1-f}{n-1}S^2+\frac{n}{n-1}\bar{Y}^2\\ &=\frac{f}{n-1}\left[(N-1)S^2+N\bar{Y}^2 \right]-\frac{1-f}{n-1}S^2+\frac{n}{n-1}\bar{Y}^2\\ &=S^2\left[\frac{f(N-1)-(1-f)}{n-1} \right]+\bar{Y}^2\left(\frac{fN-n}{n-1} \right)\\ &=S^2. \end{aligned} \]

為證下一個結論,需要先計算\(\mathrm{cov}(\bar{y},\bar{x})\)。為此,引進變換\(U=Y+X\),類似定義\(u_i\)\(\bar{u}\)\(S_u^2\),於是

\[V(\bar u)=V(\bar y)+V(\bar x)+2\mathrm{cov}(\bar y, \bar x),\\ \begin{aligned} \mathrm{cov}(\bar y,\bar x)&=\frac{1}{2}[V(\bar u)-V(\bar y)-V(\bar x)]\\ &=\frac{1}{2}\frac{1-f}{n}\frac{1}{N-1}\left[\sum_{i=1}^{N}[(U_i-\bar{U})^2-(Y_i-\bar{Y})^2-(X_i-\bar{X})^2 \right]\\ &=\frac{1-f}{2n}\frac{1}{N-1}\cdot \sum_{i=1}^{N}2(Y_i-\bar{Y})(X_i-\bar{X})\\ &=\frac{1-f}{n}S_{yx}. \end{aligned} \]

這時就有

\[\begin{aligned} E(s_{yx})&=E\left[\frac{1}{n-1}\sum_{i=1}^{n}(y_i-\bar{y})(x_i-\bar{x}) \right]\\ &=\frac{1}{n-1}E\left(\sum_{i=1}^{n}y_ix_i \right)-\frac{n}{n-1}E(\bar{y}\bar{x})\\ &=\frac{f}{n-1}\sum_{i=1}^{N}Y_iX_i-\frac{n}{n-1}\bar{Y}\bar{X}-\frac{n}{n-1}\frac{1-f}{n}S_{yx}\\ &=\frac{f}{n-1}\left[(N-1)S_{yx}+N\bar{Y}\bar{X} \right]-\frac{n}{n-1}\bar{Y}\bar{X}-\frac{n}{n-1}\frac{1-f}{n}S_{yx}\\ &=S_{yx}\left[\frac{f(N-1)-n(1-f)}{n-1}\right]+\bar{Y}\bar{X}\left(\frac{fN-n}{n-1}\right)\\ &=S_{yx}. \end{aligned} \]

證明3

證明:比率估計量\(r\)的方差為

\[V(r)\approx \frac{1}{\bar{X}^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}(Y_i-RX_i)^2=\frac{1}{\bar{X}^2}\frac{1-f}{n}(S^2-2RS_{yx}+R^2S_x^2). \]

定義\(G=Y-RX\),類似定義\(g_i\)\(\bar{g}\)\(\bar{G}\),容易驗證\(\bar{G}=0\),從而

\[\begin{aligned} V(r)&\approx E(r-R)^2\\ &=E\left(\frac{\bar{y}-R\bar{x}}{\bar{x}} \right)^2\\ &\approx\frac{1}{\bar{X}^2}E(\bar{y}-R\bar{x})^2\\ &=\frac{1}{\bar{X}^2}E(\bar g^2)=\frac{1}{\bar{X}^2}V(\bar g)\\ &=\frac{1}{\bar{X}^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}(G_i-\bar{G})^2\\ &=\frac{1}{\bar{X}^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}(Y_i-RX_i)^2. \end{aligned} \]

對后面的等式,有

\[\begin{aligned} V(r)&\approx \frac{1}{\bar{X}^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}(Y_i-RX_i)^2\\ &=\frac{1}{\bar{X}^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}[(Y_i-\bar{Y})-R(X_i-\bar{X})]^2\\ &=\frac{1}{\bar{X}^2}\frac{1-f}{n}\left[S^2-2RS_{yx}+R^2S_{x}^2 \right]. \end{aligned} \]

證明4

證明:對\(\bar{y}_{RC}=\dfrac{\bar{y}_{st}}{\bar{x}_{st}}\bar{X}\),有\(E(\bar{y}_{RC})\approx \bar{Y}\)\(\displaystyle{V(\bar{y}_{RC})\approx\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(S_h^2-2RS_{yxh}+R^2S_{xh}^2) }\)

\(E(\bar{x}_{st})\approx \bar{X}\),有

\[E(\bar{y}_{RC})=\bar{X}E\left(\frac{\bar{y}_{st}}{\bar{x}_{st}} \right)\approx E(\bar{y}_{st})=\bar{Y}. \]

作變換\(G=Y-RX\),類似定義\(G_{hi}\)\(\bar{G}_h\)\(\bar{g}_{st}\),我們有\(\bar{G}_h=\bar{Y}_h-R\bar{X}_h\)\(\bar{g}_{st}=\bar{y}_{st}-R\bar{x}_{st}\),故\(E(\bar{g}_{st})=0\)。因此

\[\begin{aligned} V(\bar{y}_{RC})&\approx E(\bar{y}_{RC}-\bar{Y})^2\\ &\approx E(\bar{y}_{st}-R\bar{x}_{st})^2\\ &=V(\bar{g}_{st})\\ &=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}S_{gh}^2\\ &=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}\left[\frac{1}{N_h-1}\sum_{i=1}^{N_h}(G_{hi}-\bar{G}_h)^2 \right]\\ &=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(S_{yh}^2-2RS_{yxh}+R^2S_{xh}^2). \end{aligned} \]

證明5

證明分層抽樣的最優分配比例為

\[n_h\propto\frac{W_hS_h}{\sqrt{c_h}}. \]

這里\(c_h\)為調查第\(h\)層樣本的單位成本。

我們有

\[z=\left(\sum_{h=1}^{L}n_hc_h \right)\left(\sum_{h=1}^{L}\frac{W_hS_h^2}{n_h} \right). \]

由柯西不等式,有

\[z\ge \left(\sum_{h=1}^{L}\sqrt{c_hW_hS_h^2} \right)^2, \]

當且僅當各層都有

\[\frac{n_h^2c_h}{W_hS_h^2}=K, \]

\(K\)為某一常數時等號成立,即

\[n_h\propto \frac{W_hS_h^2}{\sqrt{c_h}}. \]

證明6

證明整群抽樣的設計效應約為

\[deff=\frac{V(\bar{\bar{y}})}{V_{srs}(\bar{\bar{y}})}\approx 1+(M-1)\rho_{c}. \]

這里\(\rho_c\)為群內相關系數,即

\[\rho_c=\frac{2\sum\limits_{i=1}^{N}\sum\limits_{j<k}^{M}(Y_{ij}-\bar{\bar{Y}})(Y_{ik}-\bar{\bar{Y}})}{(M-1)(NM-1)S^2}. \]

我們假設\(N,M\)都很大,這樣\(N-1\approx N\)\(NM-1\approx NM\),於是

\[\begin{aligned} V(\bar{\bar{y}})&=\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}(\bar{Y}_i-\bar{\bar{Y}})^2\\ &=\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}\left(\frac{1}{M}\sum_{j=1}^{M}Y_{ij}-\bar{\bar{Y}} \right)^2\\ &=\frac{1}{M^2}\frac{1-f}{n}\frac{1}{N-1}\sum_{i=1}^{N}\left[\sum_{j=1}^{M}(Y_{ij}-\bar{\bar{Y}}) \right]^2\\ &=\frac{1-f}{nM}\frac{1}{M(N-1)}\sum_{i=1}^{N}\left[\sum_{j=1}^{M}(Y_{ij}-\bar{\bar{Y}})^2+2\sum_{j<k}^{M}(Y_{ij}-\bar{\bar{Y}})(Y_{ik}-\bar{\bar{Y}}) \right]\\ &=\frac{1-f}{nM}\frac{1}{M(N-1)}\left[(NM-1)S^2+(M-1)(NM-1)S^2\rho_c \right]\\ &=\frac{1-f}{nM}\frac{(NM-1)S^2}{M(N-1)}[1+(M-1)\rho_c]\\ &\approx \frac{1-f}{nM}S^2[1+(M-1)\rho_c]. \end{aligned} \]

注意到\(V_{srs}(\bar{\bar{y}})=\dfrac{1-f}{nM}S^2\),所以

\[deff\approx [1+(M-1)\rho_c]. \]

證明7

對於兩階段抽樣,有

\[E(\hat\theta)=E_1E_2(\hat\theta),\\ V(\hat\theta)=V_1[E_2(\hat\theta)]+E_1[V_2(\hat\theta)]. \]

均值公式就是全期望公式。記\(E(\hat\theta)=\theta\),對方差有

\[\begin{aligned} V(\hat\theta)&=E(\hat\theta-\theta)^2\\ &=E_1E_2(\hat\theta-\theta)^2\\ &=E_1[E_2(\hat\theta^2)-2\theta E_2(\hat\theta)+\theta^2]\\ &=E_1[V_2(\hat\theta)+[E_2(\hat\theta)]^2]-\theta^2\\ &=E_1V_2(\hat\theta)+E_1[E_2(\hat\theta)]^2-[E_1E_2(\hat\theta)]^2\\ &=E_1V_2(\hat\theta)+V_1E_2(\hat\theta). \end{aligned} \]

證明8

對於兩階段抽樣,證明:

\[V(\bar{\bar{y}})=\frac{1-f_1}{n}S_1^2+\frac{1-f_2}{nm}S_2^2. \]

我們有\(V(\bar{\bar{y}})=V_1E_2(\bar{\bar{y}}_2)+E_1V_2(\bar{\bar{y}}_2)\),針對兩項分別計算。對第一項,有

\[\begin{aligned} V_1E_2(\bar{\bar{y}})&=V_1E_2\left(\frac{1}{n}\sum_{i=1}^{n}\bar{y}_i \right)\\ &=V_1\left(\frac{1}{n}\sum_{i=1}^{n}\bar{Y}_i \right)\\ &=\frac{1-f_1}{n}\frac{1}{N-1}\sum_{i=1}^{N}(\bar{Y}_i-\bar{\bar{Y}})^2\\ &=\frac{1-f_1}{n}S_1^2, \end{aligned} \]

對第二項,有

\[\begin{aligned} E_1V_2(\bar{\bar{y}})&=E_1V_2\left(\frac{1}{n}\sum_{i=1}^{n}\bar{y}_i \right)\\ &=E_1\left[\frac{1}{n^2}\sum_{i=1}^{n}\frac{1-f_2}{m}\frac{1}{M-1}\sum_{j=1}^{M}(Y_{ij}-\bar{Y}_i)^2 \right]\\ &=\frac{1}{n}E_1\left[\frac{1}{n}\sum_{i=1}^{n}\frac{1-f_2}{m}S_{2i}^2 \right]\\ &=\frac{1-f_2}{nm}E_1\left(\frac{1}{n}\sum_{i=1}^{n}S_{2i}^2 \right)\\ &=\frac{1-f_2}{nm}\left(\frac{1}{N}\sum_{i=1}^{N}S_{2i}^2 \right)\\ &=\frac{1-f_2}{nm}S_2^2. \end{aligned} \]

原式得證。

證明9

對兩階段抽樣,有

\[E(s_1^2)=S_1^2+\frac{1-f_2}{m}S_2^2,\\ E(s_2^2)=S_2^2. \]

\(s_1^2\),有

\[\begin{aligned} E_2[(n-1)s_1^2]&=E_2\left[\sum_{i=1}^{n}(\bar{y}_i-\bar{\bar{y}})^2 \right]\\ &=\sum_{i=1}^{n}E_2(\bar{y}_i^2)-nE_2(\bar{\bar{y}}^2)\\ &=\sum_{i=1}^{n}\{[E_2(\bar{y}_i)]^2+V_2(\bar{y}_i)\}-n\left\{[E_2(\bar{\bar{y}})]^2+V_2(\bar{\bar{y}}) \right\}\\ &=\sum_{i=1}^{n}\bar{Y}_i^2+\sum_{i=1}^{n}\frac{1-f_2}{m}S_{2i}^2-n\left(\frac{1}{n}\sum_{i=1}^{n}\bar{Y}_i \right)^2-\frac{1-f_2}{nm}\sum_{i=1}^{n}S_{2i}^2, \end{aligned} \]

引入\(\bar{Y}_n=\displaystyle{\frac{1}{n}\sum_{i=1}^{n}\bar{Y}_i}\),我們有

\[\begin{aligned} E_2[(n-1)s_2^2]&=\sum_{i=1}^{n}(\bar{Y}_i-\bar{Y}_{n})^2+\frac{(n-1)(1-f_2)}{nm}\sum_{i=1}^{n}S_{2i}^2,\\ E(s_2^2)&=E_1E_2(s_2^2)\\ &=E_1\left[\frac{1}{n-1}\sum_{i=1}^{n}(\bar{Y}_i-\bar{Y}_n)^2+\frac{1-f_2}{nm}\sum_{i=1}^{n}S_{2i}^2 \right]\\ &=\frac{1}{N-1}\sum_{i=1}^{N}(\bar{Y}_i-\bar{\bar{Y}})^2+\frac{1-f_2}{m}E_1\left(\frac{1}{n}\sum_{i=1}^{n}S_{2i}^2 \right)\\ &=S_1^2+\frac{1-f_2}{m}\frac{1}{N}\sum_{i=1}^{N}S_{2i}^2\\ &=S_1^2+\frac{1-f_2}{m}S_{2}^2. \end{aligned} \]

\(s_2^2\),有

\[\begin{aligned} E_2(s_2^2)&=E_2\left(\frac{1}{n}\sum_{i=1}^{n}s_{2i}^2 \right)\\ &=\frac{1}{n}\sum_{i=1}^{n}E_2(s_{2i}^2)\\ &=\frac{1}{n}\sum_{i=1}^{n}S_{2i}^2,\\ E(s_2^2)&=E_1E_2(s_2^2)\\ &=E_1\left(\frac{1}{n}\sum_{i=1}^{n}S_{2i}^2 \right)\\ &=\frac{1}{N}\sum_{i=1}^{N}S_{2i}^2\\ &=S_{2}^2. \end{aligned} \]

得證。

證明10

證明:對\(V(\hat{Y}_{HH})\)的無偏估計為

\[v(\hat{Y}_{HH})=\frac{1}{n}\frac{1}{n-1}\sum_{i=1}^{n}\left(\frac{Y_i}{Z_i}-\hat{Y}_{HH} \right)^2. \]

\(t_i\)\(Y_i\)的入樣次數,則\(\displaystyle{\sum_{i=1}^{N}t_i=n}\),諸\(t_i\)服從多項分布\(B(n;Z_1,Z_2,\cdots,Z_N)\),故

\[E(t_i)=nZ_i,\quad V(t_i)=nZ_i(1-Z_i),\quad \mathrm{cov}(t_i,t_j)=-nZ_iZ_j. \]

注意到\(V(\hat{Y}_{HH})=\dfrac{1}{n}\displaystyle{\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2}\),於是

\[\begin{aligned} E\left[\sum_{i=1}^{n}\left(\frac{Y_i}{Z_i}-\hat{Y}_{HH} \right)^2\right]&=E\left[\sum_{i=1}^{n}\left(\frac{Y_i}{Z_i} \right)^2-n\hat{Y}_{HH}^2 \right]\\ &=E\left[\sum_{i=1}^{n}\left(\frac{Y_i}{Z_i}-Y \right)^2-n(\hat{Y}_{HH}-Y)^2 \right]\\ &=E\left[\sum_{i=1}^{N}t_i\left(\frac{Y_i}{Z_i}-Y \right)^2 \right]-nE(\hat{Y}_{HH}-Y)^2\\ &=\sum_{i=1}^{N}nZ_i\left(\frac{Y_i}{Z_i}-Y \right)^2-nV(\hat{Y}_{HH})\\ &=(n^2-n)V(\hat{Y}_{HH}). \end{aligned} \]

結論得證。

證明11

證明當\(n\)固定時,對HT統計量的方差,有

\[V(\hat{Y}_{HT})=\sum_{i=1}^{N}\frac{1-\pi_i}{\pi_i}Y_i^2+2\sum_{i<j}^{N}\frac{\pi_{ij}-\pi_i\pi_j}{\pi_i\pi_j}Y_iY_j=\sum_{i<j}^{N}(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i}{\pi_i}-\frac{Y_j}{\pi_j} \right)^2. \]

注意到此時對給定的\(i\),總有

\[\sum_{j\ne i}^{N}(\pi_{ij}-\pi_i\pi_j)=\sum_{j\ne i}^{N}\pi_{ij}-\pi_i\sum_{j\ne i}^{N}\pi_j=(n-1)\pi_i-\pi_i(n-\pi_i)=-\pi_i(1-\pi_i), \]

所以

\[\begin{aligned} \sum_{i=1}^{N}\frac{1-\pi_i}{\pi_i}Y_i^2&=\sum_{i=1}^{N}\frac{\pi(1-\pi_i)Y_i^2}{\pi_i^2}\\ &=\sum_{i=1}^{N}\sum_{j\ne i}^{N}(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i^2}{\pi_i^2} \right)\\ &=\sum_{i<j}^{N}(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i^2}{\pi_i^2}+\frac{Y_j^2}{\pi_j^2} \right), \end{aligned} \]

加上第二項,就得到

\[\begin{aligned} V(\hat{Y}_{HT})&=\sum_{i<j}^{N}\left[(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i^2}{\pi_i^2}+\frac{Y_j^2}{\pi_j^2}-2\frac{Y_iY_j}{\pi_i\pi_j} \right) \right]\\ &=\sum_{i<j}^{N}(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i}{\pi_i}-\frac{Y_j}{\pi_j} \right)^2. \end{aligned} \]

證明12

證明Brewer抽樣方法是\(\mathrm{\pi PS}\)的,即

  1. 按照\(\dfrac{Z_i(1-Z_i)}{1-2Z_i}\)的概率抽取第一個單元;
  2. 在剩下的單元中,按照和\(Z_i\)成比例的概率抽取下一個單元。

\(\pi_i=2Z_i\)\(\pi_{ij}=\dfrac{4Z_iZ_j(1-Z_i-Z_j)}{(1-2Z_i)(1-2Z_j(1+\sum\limits_{i=1}^{N}\dfrac{Z_i}{1-2Z_i})}\)

\[\begin{aligned} D&=\sum_{i=1}^{N}\frac{Z_i(1-Z_i)}{1-2Z_i}\\ &=\sum_{i=1}^{N}\left(\frac{Z_i(1-Z_i)}{1-2Z_i}-\frac{1}{2}Z_i\right)+\frac{1}{2}\\ &=\sum_{i=1}^{N}\frac{Z_i}{2(1-2Z_i)}+\frac{1}{2}\\ &=\frac{1}{2}\left(\sum_{i=1}^{N}\frac{Z_i}{1-2Z_i}+1\right), \end{aligned} \]

\[\begin{aligned} \pi_i&=\frac{Z_i(1-Z_i)}{D(1-2Z_i)}+\sum_{j\ne i}^{N}\frac{Z_iZ_j}{D(1-2Z_j)}\\ &=\frac{Z_i}{D}\left[ 1+\frac{Z_i}{1-2Z_i}+\sum_{j\ne i}^{N}\frac{Z_j}{(1-2Z_j)}\right]\\ &=\frac{Z_i}{D}(2D)\\ &=2Z_i. \end{aligned} \]

\[\begin{aligned} \pi_{ij}&=\frac{Z_i(1-Z_i)}{D(1-2Z_i)}\cdot \frac{Z_j}{1-Z_i}+\frac{Z_j(1-Z_j)}{D(1-2Z_j)}\cdot\frac{Z_i}{1-Z_j}\\ &=\frac{Z_iZ_j(1-2Z_j)+Z_iZ_j(1-2Z_j)}{D(1-2Z_i)(1-2Z_i)}\\ &=\frac{2Z_iZ_j(1-Z_i-Z_j)}{(1-2Z_i)(1-2Z_j)\displaystyle{\left(1+\sum_{i=1}^{N}\frac{Z_i}{1-2Z_i} \right)}}. \end{aligned}, \]

得證。

證明13

證明系統抽樣的方差為

\[V(\bar{y}_{sy})=\frac{N-1}{N}S^2-\frac{k(n-1)}{N}S_{wsy}^2, \]

這里

\[S^2=\frac{1}{N-1}\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y})^2,\\ S_{wsy}^2=\frac{1}{k}\sum_{r=1}^{k}\frac{1}{n-1}\sum_{j=1}^{n}(Y_{rj}-\bar{Y}_{r})^2. \]

\(S^2\)進行分解,有

\[\begin{aligned} (N-1)S^2&=\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y})^2\\ &=\sum_{r=1}^{k}\sum_{j=1}^{n}({Y}_{rj}-\bar{Y}_r)^2+\sum_{r=1}^{k}\sum_{j=1}^{n}(\bar{Y}_r-\bar{Y})^2\\ &=\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y}_r)^2+n\sum_{r=1}^{k}(\bar{Y}_{r}-\bar{Y})^2\\ &=\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y}_r)^2+N\left[\frac{1}{k}\sum_{r=1}^{k}(\bar{Y}_r-\bar{Y})^2 \right]\\ &=k(n-1)S_{wsy}^2+NV(\bar{y}_{sy}), \end{aligned} \]

從而

\[V(\bar{y}_{sy})=\frac{N-1}{N}S^2-\frac{k(n-1)}{N}S_{wsy}^2. \]

證明14

對分層二重抽樣,有

\[E(\bar{y}_{stD})=\bar{Y},\\ V(\bar{y}_{stD})=\left(\frac{1}{n'}-\frac{1}{N} \right)S^2+\sum_{h=1}^{L}\frac{W_h^2S_h^2}{n'}\left(\frac{1}{f_{hD}}-1 \right). \]

對均值,注意\(\displaystyle{\sum_{h=1}^{L}w_h'\bar{y}_h'=\bar{y}'}\),且\(\bar{y}'\)是從總體中以抽樣比\(f_1=\dfrac{n'}{N}\)抽取的簡單隨機樣本,所以

\[\begin{aligned} E(\bar{y}_{stD})&=E_1E_2\left(\sum_{h=1}^{L}w_h'\bar{y}_h \right)\\ &=E_1\left(\sum_{h=1}^{L}w_h'\bar{y}_h' \right)\\ &=E_1(\bar{y}')\\ &=\bar{y}. \end{aligned} \]

對方差,有\(V(\bar{y}_{stD})=V_1E_2(\bar{y}_{stD})+E_1V_2(\bar{y}_{stD})\),分別計算(注意\(n_h=n_h'f_{hD}\)\(n_h'=w_h'n'\)),有

\[\begin{aligned} V_1E_2(\bar{y}_{stD})&=V_1\left(\sum_{h=1}^{L}w_h'\bar{y}_h' \right)\\ &=V_1(\bar{y}')\\ &=\left(\frac{1}{n'}-\frac{1}{N} \right)S^2;\\ E_1V_2(\bar{y}_{stD})&=E_1\left[\sum_{h=1}^{L}w_h'^2s_h'^2\left(\frac{1}{n_h}-\frac{1}{n_h'} \right) \right]\\ &=E_1\left[\sum_{h=1}^{L}\frac{w_h's_h'^2}{n'}\left(\frac{1}{f_{hD}}-1 \right) \right]\\ &=\frac{1}{n'}\sum_{h=1}^{L}\left(\frac{1}{f_{hD}}-1 \right)E_1(w_h's_h'^2)\\ &=\frac{1}{n'}\sum_{h=1}^{L}\left(\frac{1}{f_{hD}}-1 \right)E_1[E_1(w_h's_h'^2|w_h')]\\ &=\frac{1}{n'}\sum_{h=1}^{L}\left(\frac{1}{f_{hD}}-1 \right)S_h^2E_1(w_h')\\ &=\sum_{h=1}^{L}\frac{W_hS_h^2}{n'}\left(\frac{1}{f_{hD}}-1 \right). \end{aligned} \]

這里運用到全概率公式,再代回即可得到結果。

證明15

證明分層二重抽樣在成本\(C_{T}^*=c_1+\displaystyle{\sum_{h=1}^{L}c_{2h}n_h}\)下的樣本量最優分配為:

\[f_{hD}=S_h\sqrt{\frac{c_1}{c_{2h}\displaystyle{\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}}},\\ n'=\frac{C_{T}^*}{c_1+\displaystyle{\sum_{h=1}^{L}c_{2h}W_hf_{hD}}}. \]

方差為

\[V(\bar{y}_{stD})=\left(\frac{1}{n'}-\frac{1}{N} \right)S^2+\sum_{h=1}^{L}\frac{W_hS_h^2}{n'}\left(\frac{1}{f_{hD}}-1 \right)=\frac{S^2}{n'}+\sum_{h=1}^{L}\frac{W_hS_h^2}{n'f_{hD}}-\sum_{h=1}^{L}\frac{W_hS_h^2}{n'}-\frac{S^2}{N}, \]

故極小化

\[C_{T}^*\left(V+\frac{S^2}{N} \right)=\left(c_1+\sum_{h=1}^{L}c_{2h}f_{hD}W_h \right)\left[\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)+\sum_{h=1}^{L}\frac{W_hS_h^2}{f_{hD}} \right], \]

由Cauchy不等式,有

\[C_{T}^{*}\left(V+\frac{S^2}{N} \right)\ge \left[\sqrt{c\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}+\sum_{h=1}^{L}\sqrt{c_{2h}}W_hS_h \right]^2, \]

等號成立當且僅當

\[\frac{c_{2h}f_{hD}W_h}{W_hS_h^2/f_{hD}}=\frac{c_1}{\displaystyle{\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}}, \]

\[f_{hD}=S_h\sqrt{\frac{c_1}{c_{2h}\displaystyle{\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}}}. \]

為得到\(n’\),只需代回。

練習題

1. 簡單隨機抽樣

給定如下的數據框,這里\(Y\)是待估變量,\(X\)是輔助變量。

\[\begin{array}{c|cc} \hline Y & 4 & 6 & 8 & 5 & 4 \\ X & 2 & 3 & 3 & 2 & 1 \\ \hline \end{array} \]

且知道\(N=50\)\(n=5\)\(\bar{X}=2\),求:

  1. \(\bar{Y}\)的簡單估計,及其\(95\%\)置信區間。
  2. \(\bar{Y}\)的比估計,及其\(95\%\)的置信區間。
  3. \(\bar{Y}\)的回歸估計,及其\(95\%\)的置信區間。
  1. 對簡單估計,有

    \[\bar{y}=5.4,\quad s^2=2.8 \\ v(\bar{y})=\frac{1-f}{n}s^2=0.504 \]

    計算\(\bar{y}\pm u_{\alpha/2}\sqrt{v(\bar{y})}\),得到置信區間為

    \[[4.0085,6.7915]. \]

  2. 對比估計,先計算得

    \[\bar{x}=2.2,\quad s_x^2=0.7,\quad s_y^2=2.8,\quad s_{xy}=1.15. \]

    所以

    \[r = \frac{\bar{y}}{\bar{x}}=2.4545,\\ \bar{y}_{R}=\frac{\bar{y}}{\bar{x}}\bar{X}=4.9091,\\ v(\bar{y}_{R})=\frac{1-f}{n}(s^2-2rs_{yx}+r^2s_x^2)=0.2469, \]

    計算\(\bar{y}_{R}\pm u_{\alpha/2}\sqrt{v(\bar{y}_{R})}\),得到置信區間為

    \[[4.2779,5.8650]. \]

  3. 對回歸估計,需計算回歸參數,即

    \[b=\frac{s_{yx}}{s_{x}^2}=1.6429,\\ \bar{y}_{lr}=\bar{y}+b(\bar{X}-\bar{x})=5.0714. \]

    為估計其方差,需計算相關系數,即

    \[\hat\rho=\frac{s_{yx}}{s_ys_x}=0.8214,\\ v(\bar{y})=\frac{1-f}{n}s_y^2(1-\hat\rho^2)=0.1639, \]

    計算\(\bar{y}_{lr}\pm u_{\alpha/2}\sqrt{v(\bar{y}_{lr})}\),得到置信區間為

    \[[4.2779,5.8650]. \]

2. 分層隨機抽樣的比估計

已知兩層的總體數據為\(N_1=15\)\(N_2=10\)\(\bar X_1=20\)\(\bar X_2=50\)。從兩層中各抽取\(3\)個樣本,結果是

\[\begin{array}{c|cc} \hline Y_1 & 30 & 35 & 40 \\ X_1 & 18 & 18 & 25 \\ \hline Y_2 & 75 & 82 & 85 \\ X_2 & 55 & 40 & 60 \\ \hline \end{array} \]

  1. 給出\(\bar{Y}\)的分別比估計結果,並估計其方差。
  2. 給出\(\bar{Y}\)的聯合比估計結果,並估計其方差。
  1. 已知\(W_1=0.6\)\(W_2=0.4\)\(f_1=0.2\)\(f_2=0.3\)。對分別比估計,有

    \[\bar{r}_1=1.7213,\\ \bar{r}_2=1.5613,\\ \bar{y}_{RS}=W_1\bar{X}_1\bar{r}_1+W_2\bar{X}_2\bar{r}_2=51.8815. \]

    對其方差,有

    \[v(\bar{y}_{RS})=\sum_{h=1}^{2}W_h^2\frac{1-f_h}{n_h}(s_{yh}^2-2\bar{r}_hs_{yxh}+\bar{r}_h^2s_{xh}^2)=12.0071. \]

  2. 對聯合比估計,有

    \[\bar{y}_{st}=\sum_{h=1}^{2}W_h\bar{y}_h=53.2667,\\ \bar{x}_{st}=\sum_{h=1}^{2}W_h\bar{x}_h=32.8667, \]

    \[r=\frac{\bar{y}_{st}}{\bar{x}_{st}}=1.6207,\quad \bar{X}=32,\\ \bar{y}_{RC}=\frac{\bar{y}_{st}}{\bar{x}_{st}}\bar{X}=51.8620,\\ v(\bar{y}_{RC})=\sum_{h=1}^{2}W_h^2\frac{1-f_h}{n_h}(s_h^2-2rs_{yxh}+r^2s_{xh}^2)=12.5786. \]

3. 分層隨機抽樣的樣本分配

對一個兩層總體調查比率,\(N_1=10\)\(N_2=20\)\(n_1=n_2=5\),算得\(p_1=0.4\)\(p_2=0.2\)

  1. 試使用分層隨機抽樣估計\(P\),並給出\(p_{st}\)的標准差。
  2. 計算Neyman分配時,以及\(c_2=4c_1\)時最優分配時,兩層樣本量的比值。
  1. \(p_{st}\)的估計,有

    \[p_{pst}=\frac{1}{3}p_1+\frac{2}{3}p_2=0.266667. \]

    對方差估計,有

    \[s_h^2=\frac{n_hp_h(1-p_h)}{n_h-1}, \]

    所以

    \[s_1^2=1.25\times 0.4\times 0.6=0.3,\\ s_2^2=1.25\times 0.2\times 0.8=0.2,\\ v(p_{st})=\frac{1}{9}\frac{1-0.5}{5}0.3+\frac{4}{9}\frac{1-0.25}{5}0.2=0.016667,\\ \sigma(p_{st})=0.1291. \]

  2. 對於最優分配,有\(n_h\propto W_hS_h\),所以

    \[\frac{n_1}{n_2}=\frac{1/3\times \sqrt{0.3}}{2/3\times\sqrt{0.2}}=0.6124. \]

    對於一般情況下的最優分配,有\(n_h\propto W_hS_h/\sqrt{c_h}\),所以

    \[\frac{n_1}{n_2}=\frac{1/3\times \sqrt{0.3}}{2/3\times \sqrt{0.2}\times \sqrt{4}}=0.3062. \]

4. 等概率整群抽樣

現有\(10\)個等規模\(M=10\)的群,隨機抽取了\(4\)個整群,調查得到其群總值分別為

\[\begin{array}{c|c} \hline i & y_i & y_{ij}\\ \hline 1 & 19 & 1,2,1,3,3,2,1,4,1,1 \\ 2 & 20 & 1,3,2,2,3,1,4,1,1,2 \\ 3 & 16 & 2,1,1,1,1,3,2,1,3,1 \\ 4 & 20 & 1,1,3,2,1,5,1,2,3,1 \\ \hline \end{array} \]

  1. \(\bar{\bar{y}}\)的估計及其標准差。
  2. 求設計效應。
  1. \(\bar{y}_1=1.9\)\(\bar{y}_{2}=2\)\(\bar{y}_3=1.6\)\(\bar{y}_4=2\)。由簡單隨機抽樣的性質,有

    \[\bar{\bar{y}}=\frac{1}{4}\sum_{i=1}^{4}\bar{y}_i=1.875, \]

    \[v(\bar{\bar{y}})=\frac{1-0.4}{4}\frac{1}{3}\sum_{i=1}^{4}(\bar{y}_i-\bar{\bar{y}})^2=0.005375,\\ \sigma(\bar{\bar{y}})=0.07331. \]

  2. 此時

    \[s_{b}^2=\frac{1}{n-1}\sum_{i=1}^{4}M(\bar y_i-\bar{\bar y}_i)^2=0.358333,\\ s_w^2=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{M-1}\sum_{j=1}^{M}(y_{ij}-\bar{y}_i)^2=1.202778, \]

    所以

    \[\hat \rho_c=\frac{s_b^2-s_w^2}{s_b^2+(M-1)s_w^2}=-0.0755,\\ deff\approx 1+(M-1)\hat\rho^c=0.3204. \]

5. 兩階段抽樣

現有\(N=10\)個等規模的的群,每個群中有\(M=50\)個個體,從中抽取\(3\)個群,每個群抽取\(5\)個樣本,得到的結果如下:

\[\begin{array}{c|cc} \hline 1 & 20 & 25 & 20 & 25 & 20 \\ 2 & 18 & 20 & 22 & 25 & 20 \\ 3 & 25 & 28 & 18 & 15 & 21 \\ \hline \end{array} \]

  1. 試求\(\bar{\bar{Y}}\)的估計量及其方差,並給出\(95\%\)置信區間。
  2. 如抽取一個群的成本為\(c_1\),調查一個個體的成本為\(c_2\),其他字母同教材定義,試導出最優的\(m\)
  1. 先計算以下量:

    \[\bar{y}_1=22,\quad s_{21}=7.5;\\ \bar{y}_2=21,\quad s_{22}=7;\\ \bar{y}_3=21.4,\quad s_{23}=27.3. \]

    所以

    \[\bar{\bar{y}}=\frac{1}{3}\sum_{i=1}^{3}\bar{y}_i=21.4667,\\ s_{1}^2=\frac{1}{2}\sum_{i=1}^{3}(\bar{y}_i-\bar{\bar{y}})^2=0.253333,\\ s_2^2=\frac{1}{3}\sum_{i=1}^{3}s_{2i}^2=13.9333. \]

    得其方差為

    \[v(\bar{\bar{y}})=\frac{1-0.3}{3}s_1^2+\frac{0.3(1-0.1)}{15}s_2^2=0.3099, \]

    從而\(95\%\)置信區間是

    \[[20.3756,22.5578]. \]

  2. 兩階段抽樣的方差為

    \[V=\frac{1}{n}S_1^2-\frac{1}{N}S_1^2+\frac{1}{nm}S_2^2-\frac{1}{n}\frac{S^2_2}{M}, \]

    故對下式進行最小化:

    \[(c_1n+c_2nm)\left(\frac{S_1^2-S_2^2/M}{n}+\frac{S_2^2}{nm} \right)=(c_1+c_2m)\left(S_1^2-\frac{S_2^2}{M}+\frac{S_2^2}{m} \right). \]

    從而

    \[\frac{c_1}{S_1^2-S_2^2/M}=\frac{c_2m^2}{S_2^2},\\ m_{opt}=\sqrt{\frac{c_1S_2^2}{c_2\left(S_1^2-\dfrac{S_2^2}{M}\right)}}. \]

    \[\hat{S}_1^2=s_1^2-\frac{1-f_1}{M}s_2^2,\\ \hat{S}_2^2=s_2^2. \]

    注:若代入本題數據,得出的\(m_{opt}\)將是負值,故請不要代入計算。

6. \(\mathrm{PPS}\)抽樣

對一個\(N=10\)的總體執行不等概抽樣,抽樣結果如下:

\[\begin{array}{c|cc} \hline i & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline Z_i & 0.2 & 0.2 & 0.1 & 0.05 & 0.05 & 0.05 & 0.05 & 0.1 & 0.1 & 0.1 \\ t_i & 2 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0 \\ Y_i & 35 & ? & ? & 40 & ? & ? & ? & 20 & 40 & ? \\ \hline \end{array} \]

求總體均值的估計,並給出相應的方差。

構造漢森-赫維茨統計量為

\[\hat{Y}_{HH}=\frac{1}{5}\sum_{i=1}^{5}\frac{y_i}{Z_i}=350,\\ \bar{y}_{HH}=\frac{\hat{Y}_{HH}}{N}=35. \]

方差有

\[v(\hat{Y}_{HH})=\frac{1}{5\times 4}\sum_{i=1}^{5}\left(\frac{y_i}{Z_i}-\hat{Y}_{HH} \right)^2=14437.5,\\ v(\bar{y}_{HH})=\frac{v(\hat{Y}_{HH})}{N^2}=144.375. \]

7. 兩階段放回不等概抽樣

假設某總體共有\(N=10\)個群,每個群中有\(M=10\)個個體。現進行兩階段放回不等概抽樣,第一階段中抽到了兩次\(Y_1\),一次\(Y_{2}\)與一次\(Y_3\),其抽選概率分別為

\[Z_1=0.5,\quad Z_2=Z_3=0.1. \]

現對\(Y_1\)執行兩次簡單隨機抽樣,對\(Y_2,Y_3\)各執行一次,取\(m=4\),抽樣結果如下:

\[\begin{array}{c|cc} \hline Y_1^{(1)} & 3 & 5 & 8 & 10 \\ Y_1^{(2)} & 3 & 7 & 7 & 9 \\ Y_2 & 6 & 9 & 10 & 12 \\ Y_3 & 10 & 15 & 18 & 20 \\ \hline \end{array} \]

試作\(\bar{\bar{Y}}\)的估計,並求其方差。

對總體總值作估計,有

\[\hat{Y}_1=65,\quad \hat{Y}_2=65,\quad \hat{Y}_3=92.5,\quad \hat{Y}_4=157.5, \]

於是

\[\hat{Y}_{HH}=\frac{1}{4}\sum_{i=1}^{4}\frac{\hat{Y}_i}{Z_i}=690,\quad \bar{y}_{HH}=6.9;\\ v(\hat{Y}_{HH})=\frac{1}{n(n-1)}\sum_{i=1}^{4}\left(\frac{\hat{Y}_i}{Z_i}-\hat{Y}_{HH} \right)^2=122137.5,\quad v(\bar{y}_{HH})=12.21375. \]

8. \(\mathrm{\pi PS}\)抽樣

考慮一個\(N=8\)個體的總體,欲采用Brewer抽樣法獲得兩個樣本:\(y_1=12\)\(y_2=20\),且\(Z_1=0.2\)\(Z_2=0.1\)

  1. 簡述Brewer抽樣方法與實施條件。
  2. 構造霍維茨-湯普森估計量,對總體總值進行估計。
  3. 如果這兩個樣本是按照Yates-Grundy逐個抽取法抽取的,且下一個抽取了\(y_3=15\)\(Z_3=0.05\),構造Raj估計量對總體總值進行估計,並估計其方差。
  1. Brewer抽樣,第一步按與\(\dfrac{Z_i(1-Z_i)}{1-2Z_i}\)成比例的概率抽取第一個樣本,抽到的樣本視為\(j\);第二步按與\(Z_i\)成比例的概率即\(\dfrac{Z_i}{1-Z_j}\)抽取第二個樣本。

    實施條件是\(1-2Z_i>0\),即對每一個\(i\)都有\(Z_i<1/2\)

  2. 對總體總值的估計為

    \[\hat{Y}_{HT}=\frac{1}{2}\left(\frac{y_1}{Z_1}+\frac{y_2}{Z_2} \right)=130. \]

  3. 計算得

    \[t_1=\frac{y_1}{Z_1}=60,\\ t_2=y_1+\frac{y_2}{Z_2}(1-Z_1)=172,\\ t_3=y_1+y_2+\frac{y_3}{Z_3}(1-Z_1-Z_2)=242. \]

    所以

    \[\hat{Y}_{Raj}=\frac{1}{3}\sum_{i=1}^{3}t_i=158,\\ v(\hat{Y}_{Raj})=\frac{1}{3\times 2}\sum_{i=1}^{3}(t_i-\hat{Y}_{Raj})^2=2809.333. \]

9. 系統抽樣

設總體\(N=30\),欲抽取\(10\)個樣本。

  1. 若樣本中包含\(Y_{16}\),求所有樣本。
  2. 在什么情況下,系統抽樣優於簡單隨機抽樣。
  1. \(16\%3=1\),故樣本起點為\(Y_1\),所有樣本是

    \[Y_1,Y_4,Y_7,Y_{10},Y_{13},\\ Y_{16},Y_{19},Y_{22},Y_{25},Y_{28}. \]

  2. \[S^2=\frac{1}{N-1}\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y})^2,\\ S_{wsy}^2=\frac{1}{k}\sum_{r=1}^{k}\frac{1}{n-1}\sum_{j=1}^{n}(Y_{rj}-\bar{Y}_j)^2. \]

    \[\begin{aligned} (N-1)S^2&=\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y})^2\\ &=\sum_{r=1}^{k}\sum_{j=1}^{n}(Y_{rj}-\bar{Y}_r)^2+\sum_{r=1}^{k}n(\bar{Y}_{r}-\bar{Y})^2\\ &=k(n-1)S_{wsy}^2+Nv(\bar{y}_{sy}), \end{aligned} \]

    因此

    \[v(\bar{y}_{sy})=\frac{N-1}{N}S^2-\frac{k(n-1)}{N}S_{wsy}^2. \]

    \[v(\bar{y}_{srs})=\frac{1-f}{n}S^2=\frac{k-1}{N}S^2, \]

    作差得

    \[v(\bar{y}_{srs})-v(\bar{y}_{sy})=\frac{(k-N)S^2+k(n-1)S_{wsy}^2}{N}=\frac{k(n-1)(S_{wsy}^2-S^2)}{N}. \]

    \(S_{wsy}^2>S^2\)時系統抽樣更優。

10. 分層二重抽樣

一個含\(1000000\)個體的總體可分為\(2\)層,由於總體情況未知,先抽取\(n'=10000\)個個體進行預調查,得到結果為\(n_1'=2000\)\(n_2'=8000\)。接下來又抽取了\(n_1=n_2=5\)個個體進行細致調查,得到結果為\(\bar{y}_1=200\)\(\bar{y}_2=80\),其方差分別為\(s_1^2=4500\)\(s_2^2=200\)

  1. 求總體均值\(\bar{Y}\)的估計,並給出方差估計,這里抽樣方差比可忽略。
  2. 求最優方差比\(f_{hD}\)
  1. 分層二重抽樣估計為

    \[\bar{y}_{stD}=w_1'\bar{y}_1+w_2'\bar{y}_2=104,\\ \]

    對其方差,有

    \[v(\bar{y}_{stD})=\sum_{h=1}^{L}\frac{w_h's_h^2}{n_h}+\frac{1}{n'}\sum_{h=1}^{L}w_h'(\bar{y}-\bar{y}_{stD})^2=212.2304. \]

  2. 由於

    \[\begin{aligned} V(\bar{y}_{stD})&=\left(\frac{1}{n'}-\frac{1}{N} \right)S^2+\sum_{h=1}^{L}\frac{W_hS_h^2}{n'}\left(\frac{1}{f_{hD}}-1 \right)\\ &=\frac{1}{n'}\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)+\sum_{h=1}^{L}\frac{W_hS_h^2}{n'f_{hD}}-\frac{S^2}{N}. \end{aligned} \]

    \(C_{T}^*=\displaystyle{c_1n'+n'\sum_{h=1}^{L}c_{2h}W_hf_{hD}}\),所以對下式進行最小優化:

    \[\left(c_1+\sum_{h=1}^{L}c_{2h}W_hf_{hD} \right)\left[\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)+\sum_{h=1}^{L}\frac{W_hS_h^2}{f_{hD}} \right], \]

    因此

    \[\frac{c_1}{S^2-\displaystyle{\sum_{h=1}^{L}W_hS_h^2}}=\frac{c_{2h}f_{hD}^2}{S_h^2},\\ f_{hD}=S_h\sqrt{\frac{c_1}{c_{2h}\displaystyle{\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}}}. \]

11. 二重抽樣比估計

一個\(N\)很大的總體,由於總體情況未知,先抽取\(n'=10000\)個樣本調查輔助變量\(X\),得到\(\bar{x}'=50\)。接下來,第二重抽樣抽取\(10\)個樣本,得到\(\bar{y}=80\)\(\bar{x}=40\)\(s_x^2=1600\)\(s_{yx}=2400\)\(s_{y}^2=8000\)。求二重抽樣比估計\(\bar{y}_{RD}\),並計算其估計量方差。

二重抽樣比估計為

\[\bar{y}_{RD}=\frac{\bar{y}}{\bar{x}}\bar{x}'=100. \]

這里\(\hat{R}=2\),於是方差估計為

\[v(\bar{y}_{RD})=\frac{1}{n}s_y^2+\left(\frac{1}{n}-\frac{1}{n'} \right)(\hat{R}^2s_{x}^2-2\hat{R}s_{yx})=480.32. \]

12. 捕獲再捕獲抽樣

為估計湖中有多少條魚,從中撈出\(1000\)條,標上記號后放回湖中,然后撈出\(150\)條,發現其中有\(10\)條有記號。用Chapman估計給出湖中魚的總數,並給出方差估計,給出\(95\%\)的區間。

計算得

\[\tilde{N}=\frac{1001\times 151}{11}-1=13740,\\ v(\tilde{N})=\frac{1001\times 151\times 990\times 140}{11^2\times 12}=14428050. \]

於是置信區間是

\[[6295,21185]. \]

總述

抽樣方法

  1. 簡單隨機抽樣的簡單估計。

    \[\bar{y}=\frac{1}{n}\sum_{i=1}^{n}y_i,\\ V(\bar{y})=\frac{1-f}{n}S^2,\\ v(\bar{y})=\frac{1-f}{n}s^2. \]

  2. 簡單隨機抽樣的比估計。

    \[\bar{y}_{R}=\frac{\bar{y}}{\bar{x}}\bar{X},\quad r=\frac{\bar{y}}{\bar{x}}, \\ V(\bar{y}_{R})\approx \frac{1-f}{n}(S^2-2RS_{yx}+R^2S_x^2),\\ v(\bar{y}_{R})=\frac{1-f}{n}(s_y^2-2rs_{yx}+r^2s_{x}^2). \]

  3. 簡單隨機抽樣的回歸估計,回歸參數已知。

    \[\bar{y}_{lr}=\bar{y}+\beta_0(\bar{X}-\bar{x}),\\ V(\bar{y}_{lr})\approx \frac{1-f}{n}(S^2-2\beta_0S_{yx}+\beta_0^2S_{x}^2),\\ v(\bar{y}_{lr})=\frac{1-f}{n}(s_y^2-2\beta_0x_{yx}+\beta_0^2s_{x}^2). \]

  4. 簡單隨機抽樣的回歸估計,回歸參數未知。

    \[b=\frac{s_{yx}}{s_{x}^2},\\ \bar{y}_{lr}=\bar{y}+b(\bar{X}-\bar{x}),\\ V(\bar{y}_{lr})\approx \frac{1-f}{n}S^2(1-\rho^2),\\ v(\bar{y}_{lr})\approx \frac{1-f}{n}s_y^2(1-\hat\rho^2). \]

  5. 分層隨機抽樣的簡單估計。

    \[\bar{y}_{st}=\sum_{h=1}^{L}W_h\bar{y}_{h},\\ V(\bar{y}_{st})=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}S_h^2,\\ v(\bar{y}_{st})=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}s_h^2. \]

  6. 分層隨機抽樣的分別比估計。

    \[\bar{y}_{RS}=\sum_{h=1}^{L}W_h\frac{\bar{y}_h}{\bar{x}_h}\bar{X}_h,\quad r_h=\frac{\bar{y}_h}{\bar{x}_j},\\ V(\bar{y}_{RS})\approx \sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(S_{yh}^2-2R_hS_{yxh}+R_h^2S_{xh}^2),\\ v(\bar{y}_{RS})=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(s_{yh}^2-2r_hs_{yxh}+r_h^2s_{xh}^2). \]

  7. 分層隨機抽樣的聯合比估計。

    \[\bar{y}_{RC}=\frac{\bar{y}_{st}}{\bar{x}_{st}}\bar{X},\quad r=\frac{\bar{y}_{st}}{\bar{x}_{st}},\\ V(\bar{y}_{RC})\approx \sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(S_{yh}^2-2RS_{yxh}+R^2S_{xh}^2),\\ v(\bar{y}_{RC})=\sum_{h=1}^{L}W_h^2\frac{1-f_h}{n_h}(s_{yh}^2-2rs_{yxh}+r^2s_{xh}^2). \]

  8. 等概率等規模整群抽樣。

    \[\bar{\bar{y}}=\frac{1}{n}\sum_{i=1}^{n}\bar{Y}_i,\\ V(\bar{\bar{y}})=\frac{1-f}{n}\sum_{i=1}^{n}(\bar{Y}_i-\bar{\bar{Y}})^2,\\ v(\bar{\bar{y}})=\frac{1-f}{n}\sum_{i=1}^{n}(\bar{y}_i-\bar{\bar{y}})^2. \]

  9. 等概率等規模兩階段抽樣。

    \[\bar{\bar{y}}=\frac{1}{n}\sum_{i=1}^{n}\bar{y}_i,\\ V(\bar{\bar{y}})=\frac{1-f_1}{n}S_1^2+\frac{1-f_2}{nm}S_2^2,\\ v(\bar{\bar{y}})=\frac{1-f_1}{n}s_1^2+\frac{f_1(1-f_2)}{nm}s_2^2. \]

  10. 放回不等概抽樣的漢森-赫維茨估計量。

    \[\hat{Y}_{HH}=\frac{1}{n}\sum_{i=1}^{n}\frac{y_i}{Z_i},\\ V(\hat{Y}_{HH})=\frac{1}{n}\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2,\\ v(\hat{Y}_{HH})=\frac{1}{n(n-1)}\sum_{i=1}^{n}\left(\frac{y_i}{Z_i}-\hat{Y}_{HH} \right)^2. \]

  11. 兩階段放回不等概抽樣的漢森-赫維茨估計量。

    \[\hat{Y}_{HH}=\frac{1}{n}\sum_{i=1}^{n}\frac{\hat{Y}_i}{Z_i},\\ V(\hat{Y}_{HH})=\frac{1}{n}\sum_{i=1}^{N}Z_i\left(\frac{Y_i}{Z_i}-Y \right)^2+\frac{1}{n}\sum_{i=1}^{N}\frac{V_2(\hat{Y}_i)}{Z_i},\\ v(\hat{Y}_{HH})=\frac{1}{n(n-1)}\sum_{i=1}^{n}\left(\frac{\hat{Y}_i}{Z_i}-\hat{Y}_{HH} \right)^2. \]

  12. 不放回不等概抽樣中嚴格\(\mathrm{\pi PS}\)的赫維茨-湯普森估計量,\(n\)固定

    \[\hat{Y}_{HT}=\sum_{i=1}^{n}\frac{y_i}{\pi_i},\\ V(\hat{Y}_{HT})=\sum_{i<j}^{N}(\pi_i\pi_j-\pi_{ij})\left(\frac{Y_i}{\pi_i}-\frac{Y_j}{\pi_j} \right)^2,\\ v_{YGS}=\sum_{i<j}^{N}\frac{\pi_i\pi_j-\pi_{ij}}{\pi_{ij}}\left(\frac{y_i}{\pi_i}-\frac{y_j}{\pi_j} \right). \]

  13. 不嚴格\(\mathrm{\pi PS}\)的耶茨-格倫迪抽樣的Raj估計量,\(n\)不固定。

    \[t_i=\sum_{j=1}^{i-1}y_j+\frac{y_i}{Z_i}\left(1-\sum_{j=1}^{i-1}Z_i \right),\\ \hat{Y}_{Raj}=\frac{1}{n}\sum_{i=1}^{n}t_i,\\ v(\hat{Y}_{Raj})=\frac{1}{n(n-1)}\sum_{i=1}^{n}(t_i-\hat{Y}_{Raj})^2. \]

  14. 分層二重抽樣。

    \[\bar{y}_{stD}=\sum_{h=1}^{L}w_h'\bar{y}_h,\\ V(\bar{y}_{stD})=\left(\frac{1}{n'}-\frac{1}{N} \right)S^2+\sum_{h=1}^{L}\frac{W_h^2S_h^2}{n'}\left(\frac{1}{f_{hD}}-1 \right),\\ v(\bar{y}_{stD})\approx \sum_{h=1}^{L}\frac{w_h's_h^2}{n_h}+\frac{1}{n'}\sum_{h=1}^{L}w_h'(\bar{y}-\bar{y}_{stD}). \]

  15. 分層抽樣比估計。

    \[\bar{y}_{RD}=\frac{\bar{y}}{\bar{x}}\bar{x}',\quad r=\frac{\bar{y}}{\bar{x}},\\ V(\bar{y}_{RD})\approx \left(\frac{1}{n'}-\frac{1}{N} \right)S_y^2+\left(\frac{1}{n}-\frac{1}{n'} \right)(S_y^2-2RS_{yx}+R^2S_x^2),\\ v(\bar{y}_{RD})=\frac{1}{n}s_{y}^2+\left(\frac{1}{n}-\frac{1}{n'} \right)(r^2s_{x}^2-2rs_{yx}). \]

  16. 等距等概率系統抽樣。

    \[\bar{y}_{sy}=\frac{1}{n}\sum_{i=1}^{n}\bar{y}_{i},\\ V(\bar{y}_{sy})=\frac{N-1}{N}S^2-\frac{k(n-1)}{N}S_{wsy}^2. \]

  17. 捕獲再捕獲抽樣。

    \[\tilde{N}=\frac{(n_1+1)(n_2+1)}{m+1}-1,\\ v(\tilde{N})=\frac{(n_1+1)(n_2+1)(n_1-m)(n_2-m)}{(m+1)^2(m+2)}. \]

其他公式

  1. 分層抽樣的最優分配與Neyman分配:

    \[n_h\propto\frac{W_hS_h}{\sqrt{c_n}}\xlongequal{c_n=c}W_hS_h. \]

  2. 整群抽樣的三大方差以及相應的估計:

    \[S^2=\frac{1}{NM}\sum_{i=1}^{N}\sum_{j=1}^{M}(Y_{ij}-\bar{\bar{Y}})^2,\\ S_b^2=\frac{1}{N-1}\sum_{i=1}^{N}M(\bar{Y}_i-\bar{\bar{Y}})^2,\\ S_{w}^2=\frac{1}{N(M-1)}\sum_{i=1}^{N}\sum_{j=1}^{M}(Y_{ij}-\bar{Y}_i)^2,\\ s_b^2=\frac{1}{n-1}\sum_{i=1}^{n}M(\bar{y}_i-\bar{\bar{y}})^2,\\ s_w^2=\frac{1}{n(M-1)}\sum_{i=1}^{n}\sum_{j=1}^{M}(Y_{ij}-\bar{Y}_i)^2. \]

  3. 整群抽樣的群內相關系數估計,設計效應:

    \[\hat\rho_c=\frac{s_b^2-s_w^2}{s_b^2+(M-1)s_w^2},\\ deff\approx 1+(M-1)\hat\rho_c. \]

  4. Brewer抽樣方法抽取第一個樣本的概率,入樣概率:

    \[Z_i^*\propto\frac{Z_i(1-Z_i)}{1-2Z_i},\\ \pi_i=2Z_i,\\ \pi_{ij}=\frac{4Z_iZ_j(1-Z_i-Z_i)}{(1-2Z_i)(1-2Z_j)\displaystyle{\left(1+\sum_{i=1}^{N}\frac{Z_i}{1-2Z_i}\right)}}. \]

  5. 水野法抽取第一個樣本的概率:

    \[Z_i^*=\frac{n(N-1)Z_i}{N-n}-\frac{n-1}{N-n}. \]

  6. 分層二重抽樣的最優方差比:

    \[f_{hD}=S_h\sqrt{\frac{c_1}{c_{2h}\displaystyle{\left(S^2-\sum_{h=1}^{L}W_hS_h^2 \right)}}}. \]

  7. 二重抽樣比估計的最優二重抽樣比:

    \[f=\sqrt{\frac{c_1(S_y^2+R^2S_x^2-2RS_{yx})}{c_2(2RS_{yx}-R^2S_x^2)}}. \]


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM