(1)QQ概率圖
功能和原理:檢驗樣本的概率分布是否服從某種理論分布。PP概率圖的原理是檢驗實際累積概率分布與理論累積概率分布是否吻合,若吻合,則散點應圍繞在一條直線周圍,或者實際概率與理論概率之差分布在對稱於以0為水平軸的帶內。QQ概率圖的原理是檢驗實際分位數與理論分位數之差分布是否吻合,若吻合,則散點應圍繞在一條直線周圍,或者實際分位數與理論分位數之差分布在對稱於以0為水平軸的帶內。QQ概率圖以樣本的分位數為橫軸,以指定理論分布的分位數為縱軸繪制散點圖。
> library(DAAG)
> data(possum)
> attach(possum)
The following object(s) are masked from 'possum (position 12)':
age, belly, case, chest, earconch, eye, footlgth, hdlngth, Pop,
sex, site, skullw, taill, totlngth
> fpossum <- possum[possum$sex=="f",]
> mean = mean(totlngth)
> sd = sd(totlngth)
> x <- sort(totlngth)
> n <- length(x)
> y <- (1:n)/n
>
> plot(x,y,
+ type = 's',
+ main = "Empirical CDF of ")
> curve(pnorm(x, mean, sd),
+ col = 'red',
+ lwd = 2,
+ add = T)
圖形表示,數據與正態性略有差異,特別是中部區域。
(2)與正態密度函數直接比較
> library(DAAG)
> data(possum)
> attach(possum)
The following object(s) are masked from 'possum (position 13)':
age, belly, case, chest, earconch, eye, footlgth, hdlngth, Pop,
sex, site, skullw, taill, totlngth
> fpossum <- possum[possum$sex=="f",]
> dens <- density(totlngth)
> xlim <- range(dens$x)
> ylim <- range(dens$y)
> mean = mean(totlngth)
> sd = sd(totlngth)
> par(mfrow=c(1,2))
>
> hist(totlngth,
+ breaks=72.5+(0:5)*5,
+ xlim = xlim ,
+ ylim = ylim ,
+ probability = T ,
+ xlab = "total length",
+ main = "A:Breaks at 72.5...")
> lines(dens,
+ col = par('fg'),
+ lty = 2)
> curve( dnorm(x, mean, sd),
+ col = 'red',
+ add = T)
>
> hist(totlngth,
+ breaks = 75 + (0:5) * 5 ,
+ xlim = xlim,
+ ylim = ylim,
+ probability = T,
+ xlab="total length",
+ main = "B:Breaks at 75")
> lines(dens,
+ col = par('fg'),
+ lty = 2)
> curve(dnorm(x,mean,sd),
+ col = 'red',
+ add = T)
看圖直接看和正態密度函數的差異度。
(3)使用經驗分布函數,直接比較數據的經驗分布函數和正態分布的分布函數對比。
> library(DAAG)
> data(possum)
> attach(possum)
The following object(s) are masked from 'possum (position 14)':
age, belly, case, chest, earconch, eye, footlgth, hdlngth, Pop,
sex, site, skullw, taill, totlngth
> fpossum <- possum[possum$sex=="f",]
> mean = mean(totlngth)
> sd = sd(totlngth)
> x <- sort(totlngth)
> n <- length(x)
> y <- (1:n)/n
>
> plot(x,y,
+ type = 's',
+ main = "Empirical CDF of ")
> curve(pnorm(x, mean, sd),
+ col = 'red',
+ lwd = 2,
+ add = T)
總體來說,數據並不完全服從正態分布,需要做進一步檢驗,看和正態分布的差距多大,是否在接受范圍之內??