R語言：常用統計檢驗

本文轉載自查看原文 2018-07-31 11:41 1126 R

統計檢驗是將抽樣結果和抽樣分布相對照而作出判斷的工作。主要分5個步驟：

建立假設

求抽樣分布

選擇顯著性水平和否定域

計算檢驗統計量

判定 —— 百度百科

假設檢驗(hypothesis test)亦稱顯著性檢驗(significant test)，是統計推斷的另一重要內容，其目的是比較總體參數之間有無差別。假設檢驗的實質是判斷觀察到的“差別”是由抽樣誤差引起還是總體上的不同，目的是評價兩種不同處理引起效應不同的證據有多強，這種證據的強度用概率P來度量和表示。除t分布外，針對不同的資料還有其他各種檢驗統計量及分布，如F分布、X2分布等，應用這些分布對不同類型的數據進行假設檢驗的步驟相同，其差別僅僅是需要計算的檢驗統計量不同。

數據分布正態性檢驗

夏皮羅-威爾克（Shapiro-Wilk）檢驗法（樣本數必須介於 3 到 5000之間。）

> shapiro.test(rnorm(100, mean = 5, sd = 3))

Shapiro-Wilk normality test

data: rnorm(100, mean = 5, sd = 3)
W = 0.99137, p-value = 0.7738

> shapiro.test(runif(100, min = 2, max = 4))

Shapiro-Wilk normality test

data: runif(100, min = 2, max = 4)
W = 0.94248, p-value = 0.000274

正態總體均值的假設檢驗

t檢驗

注意：理論上t檢驗的數據必須服從正態分布，但在實際操作中需做如下約定：當樣本量小於30時，必須做正態性檢驗，否則不做正態性檢驗（即便做了正態性檢驗，數據不服從正態分布，也可以做t檢驗）。

t.test() => Student's t-Test

require(graphics) t.test(1:10, y = c(7:20)) # P = .00001855 t.test(1:10, y = c(7:20, 200)) # P = .1245 -- 不在顯著

## 經典案例: 學生犯困數據 plot(extra ~ group, data = sleep)

## 傳統表達式 with(sleep, t.test(extra[group == 1], extra[group == 2])) Welch Two Sample t-test data: extra[group == 1] and extra[group == 2] t = -1.8608, df = 17.776, p-value = 0.07939 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.3654832 0.2054832 sample estimates: mean of x mean of y 0.75 2.33 ## 公式形式 t.test(extra ~ group, data = sleep) Welch Two Sample t-test data: extra by group t = -1.8608, df = 17.776, p-value = 0.07939 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.3654832 0.2054832 sample estimates: mean in group 1 mean in group 2 0.75 2.33

單個總體

某種元件的壽命X（小時）服從正態分布N（mu,sigma^2），其中mu、sigma^2均未知，16只元件的壽命如下；問是否有理由認為元件的平均壽命大於255小時。

X<-c(159, 280, 101, 212, 224, 379, 179, 264, 222, 362, 168, 250, 149, 260, 485, 170) t.test(X, alternative = "greater", mu = 225) One Sample t-test  data: X t = 0.66852, df = 15, p-value = 0.257 alternative hypothesis: true mean is greater than 225 95 percent confidence interval: 198.2321 Inf sample estimates: mean of x 241.5

兩個總體

X為舊煉鋼爐出爐率，Y為新煉鋼爐出爐率，問新的操作能否提高出爐率？

X<-c(78.1,72.4,76.2,74.3,77.4,78.4,76.0,75.5,76.7,77.3) Y<-c(79.1,81.0,77.3,79.1,80.0,79.1,79.1,77.3,80.2,82.1) t.test(X, Y, var.equal=TRUE, alternative = "less") Two Sample t-test data: X and Y t = -4.2957, df = 18, p-value = 0.0002176 alternative hypothesis: true difference in means is less than 0 95 percent confidence interval: -Inf -1.908255 sample estimates: mean of x mean of y 76.23 79.43

成對數據t檢驗

對每個高爐進行配對t檢驗

X<-c(78.1,72.4,76.2,74.3,77.4,78.4,76.0,75.5,76.7,77.3) Y<-c(79.1,81.0,77.3,79.1,80.0,79.1,79.1,77.3,80.2,82.1) t.test(X-Y, alternative = "less") One Sample t-test data: X - Y t = -4.2018, df = 9, p-value = 0.00115 alternative hypothesis: true mean is less than 0 95 percent confidence interval: -Inf -1.803943 sample estimates: mean of x -3.2

正態總體方差的假設檢驗

var.test() => F Test to Compare Two Variances

x <- rnorm(50, mean = 0, sd = 2) y <- rnorm(30, mean = 1, sd = 1) var.test(x, y) # x和y的方差是否相同？ var.test(lm(x ~ 1), lm(y ~ 1)) # 相同.

從小學5年級男生中抽取20名，測量其身高（厘米）如下；問：在0.05顯著性水平下，平均值是否等於149，sigma^2是否等於75？

X<-scan()
136 144 143 157 137 159 135 158 147 165 158 142 159 150 156 152 140 149 148 155 var.test(X,Y) F test to compare two variances data: X and Y F = 34.945, num df = 19, denom df = 9, p-value = 6.721e-06 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 9.487287 100.643093 sample estimates: ratio of variances 34.94489

對煉鋼爐的數據進行分析

X<-c(78.1,72.4,76.2,74.3,77.4,78.4,76.0,75.5,76.7,77.3) Y<-c(79.1,81.0,77.3,79.1,80.0,79.1,79.1,77.3,80.2,82.1) var.test(X,Y) F test to compare two variances data: X and Y F = 1.4945, num df = 9, denom df = 9, p-value = 0.559 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.3712079 6.0167710 sample estimates: ratio of variances 1.494481 

若因子數大於2，要進行方差齊性檢驗，則用bartlett.test：
> bartlett.test(InsectSprays$count, InsectSprays$spray)

        Bartlett test of homogeneity of variances

data:  InsectSprays$count and InsectSprays$spray
Bartlett's K-squared = 25.96, df = 5, p-value = 9.085e-05

> bartlett.test(count ~ spray, data = InsectSprays)

        Bartlett test of homogeneity of variances

data:  count by spray
Bartlett's K-squared = 25.96, df = 5, p-value = 9.085e-05


二項分布的總體檢驗

有一批蔬菜種子的平均發芽率為P=0.85,現在隨機抽取500粒，用種衣劑進行浸種處理，結果有445粒發芽，問種衣劑有無效果。

binom.test(445,500,p=0.85) Exact binomial test data: 445 and 500 number of successes = 445, number of trials = 500, p-value = 0.01207 alternative hypothesis: true probability of success is not equal to 0.85 95 percent confidence interval: 0.8592342 0.9160509 sample estimates: probability of success 0.89

按照以往經驗，新生兒染色體異常率一般為1%，某醫院觀察了當地400名新生兒，有一例染色體異常，問該地區新生兒染色體是否低於一般水平？


binom.test(1,400,p=0.01,alternative="less") Exact binomial test data: 1 and 400 number of successes = 1, number of trials = 400, p-value = 0.09048 alternative hypothesis: true probability of success is less than 0.01 95 percent confidence interval: 0.0000000 0.0118043 sample estimates: probability of success 0.0025

非參數檢驗

數據是否正態分布的Neyman-Pearson 擬合優度檢驗-chisq

5種品牌啤酒愛好者的人數如下
A 210
B 312
C 170
D 85
E 223
問不同品牌啤酒愛好者人數之間有沒有差異？

X<-c(210, 312, 170, 85, 223) chisq.test(X) Chi-squared test for given probabilities data: X X-squared = 136.49, df = 4, p-value < 2.2e-16

檢驗學生成績是否符合正態分布

X<-scan() 25 45 50 54 55 61 64 68 72 75 75 78 79 81 83 84 84 84 85 86 86 86 87 89 89 89 90 91 91 92 100 A<-table(cut(X, br=c(0,69,79,89,100))) #cut 將變量區域划分為若干區間 #table 計算因子合並后的個數 p<-pnorm(c(70,80,90,100), mean(X), sd(X)) p<-c(p[1], p[2]-p[1], p[3]-p[2], 1-p[3]) chisq.test(A,p=p) Chi-squared test for given probabilities data: A X-squared = 8.334, df = 3, p-value = 0.03959 #均值之間有無顯著區別

大麥的雜交后代芒性狀的比例無芒：長芒：短芒=9：3：4,而實際觀測值為335：125：160 ,檢驗觀測值是否符合理論假設？

chisq.test(c(335, 125, 160), p=c(9,3,4)/16) Chi-squared test for given probabilities data: c(335, 125, 160) X-squared = 1.362, df = 2, p-value = 0.5061

現有42個數據，分別表示某一時間段內電話總機借到呼叫的次數，
接到呼叫的次數 0 1 2 3 4 5 6
出現的頻率 7 10 12 8 3 2 0
問：某個時間段內接到的呼叫次數是否符合Possion分布？

x<-0:6 y<-c(7,10,12,8,3,2,0) mean<-mean(rep(x,y)) q<-ppois(x,mean) n<-length(y) p[1]<-q[1] p[n]<-1-q[n-1] for(i in 2:(n-1)) p[i]<-1-q[i-1] chisq.test(y, p= rep(1/length(y), length(y)) ) Chi-squared test for given probabilities data: y X-squared = 19.667, df = 6, p-value = 0.003174 Z<-c(7, 10, 12, 8) n<-length(Z); p<-p[1:n-1]; p[n]<-1-q[n-1] chisq.test(Z, p= rep(1/length(Z), length(Z))) Chi-squared test for given probabilities data: Z X-squared = 1.5946, df = 3, p-value = 0.6606

P值越小越有理由拒絕無效假設，認為總體之間有差別的統計學證據越充分。需要注意：不拒絕H0不等於支持H0成立，僅表示現有樣本信息不足以拒絕H0。
傳統上，通常將P＞0.05稱為“不顯著”，0.0l<P≤0.05稱為“顯著”，P≤0.0l稱為“非常顯著”。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 R語言：常用統計檢驗 R語言與概率統計(二) 假設檢驗 R語言各種假設檢驗實例整理（常用） R語言Wald檢驗 vs 似然比檢驗常用統計檢驗的Python實現【R】正態檢驗與R語言 R語言︱基本函數、統計量、常用操作函數 R語言-分組統計 R語言-分組統計 R語言_格蘭因果檢驗