R語言支持很多圖形,並且有些圖形是非常少見的,可能也因為自己不是專業弄數據分析的所以就孤陋寡聞了,總結下目前學習到的圖形。
條形圖
這個圖比較常見,很多數據統計軟件都支持這種圖形,這種圖形可以很好的展示數據的匯總結果,可以簡潔明了的方式表達數據背后的含義
> library(vcd) > counts<-table(Arthritis$Improved) > counts None Some Marked 42 14 28 > barplot(counts,main="Simple Bar Plot",xlab="Improvement",ylab=""Frequency) Error: unexpected symbol in "barplot(counts,main="Simple Bar Plot",xlab="Improvement",ylab=""Frequency" > barplot(counts,main="Simple Bar Plot",xlab="Improvement",ylab="Freqency") > > barplot(counts,main="Horizontal Bar Plot",xlab="Frequency",ylab="Improvement",horiz=TRUE)
堆砌圖
這個圖是條形圖的進化版本,它可以表達出更加豐富的含義,如果說條形圖只能表達兩個維度的結果,那么堆砌圖則能表達三個維度的數據分析結果
library(vcd) > counts<-table(Arthritis$Improved,Arthritis$Treatment) > counts Placebo Treated None 29 13 Some 7 7 Marked 7 21 > barplot(counts,main="Stacked Bar Plot",xlab="Treatment",ylab="Frequency",col=c("red","yellow","green"),legend=rownames(counts))
分組條形圖
和上面的堆砌圖一樣的效果,只是數據的展現方式不一樣。
> barplot(counts,main="Stacked Bar Plot",xlab="Treatment",ylab="Frequency",col=c("red","yellow","green"),legend=rownames(counts),beside=TRUE)
均值圖
個人覺得和條形圖類型,就圖形而言,沒有顯著的差別。
states<-data.frame(state.region,state.x77) means<-aggregate(states$Illiteracy,by=list(state.region),FUN=mean) > means Group.1 x 1 Northeast 1.000000 2 South 1.737500 3 North Central 0.700000 4 West 1.023077 > means<-means[order(means$x),] > means Group.1 x 3 North Central 0.700000 1 Northeast 1.000000 4 West 1.023077 2 South 1.737500 > barplot(means$x,names.arg = means$Group.1) > title("Mean Illiteracy Rate") > > > par(mar=c(5,8,4,2)) > par(las=2) > counts<-table(Arthritis$Improved) > barplot(counts,main="Treatment Outcome", horiz=TRUE, cex.name=0.8, names.arg = c("No Improvement","Some Improvement", "Marked Improvement")) >
荊狀圖
和堆砌圖類似,但是所有分組的高度都是一樣的,唯一不同的則是分組中的色塊面積大小,用來分析數據在某種情況下所占比例比較合適。
> library(vcd) > counts<-table(Treatment,Improved) Error in table(Treatment, Improved) : object 'Treatment' not found > attach(Arthritis) > counts<-table(Treatment,Improved) > spine(counts,main="Spinogram Example") > counts Improved Treatment None Some Marked Placebo 29 7 7 Treated 13 7 21
餅圖
最常見的圖,不多說了
library(plotrix) > par(mfrow=c(2,2)) > slices<-c(10,12,4,16,8) > lbls<-c("US","UK","Australia","Germany","France") > pie(slices,labels=lbls,main="Simple Pie Chart") > > pct<-round(slices/sum(slices)*100) > lbls2<-paste(lbls," ",pct,"%",sep="") > lbls2 [1] "US 20%" "UK 24%" "Australia 8%" "Germany 32%" "France 16%" > pie(slices,labels=lbls,explode=0.1,main="3D Pie Chart ") > pie(slices,labels=lbls2,col=rainbow(length(lbls2)),main="Pie Chart wit Precentage") > pie3D(slices,labels=lbls,explode=0.1,main="3D Pie Chart ") > mytable<-table(state.region) > pie(mytable,labels=lbls3,main="Pie Chart from a Table\n (with sample sizes)")
扇形圖
和餅圖類型,不過這個圖形還是比較少見的
> library(plotrix) > slices<-c(10,12,4,16,8) > lbls<-c("US","UK","Australia","Germany","France") > fan.plot(slices,labels=lbls,main="Fan Plot")
直方圖
柱圖,最常見的圖,和之前提到的條形圖類似。
> par(mfrow=c(2,2)) > hist(mtcars$mpg) > > hist(mtcars$mpg,breaks=12,col="red",xlab="Miles Per Gallon",main="Colored histogram with 12 bins") > > > hist(mtcars$mpg,freq=FALSE,col="red",xlab="Miles Per Gallon",main="Histogram, rug plot, density curve") > rug(jitter(mycars$mpg)) #軸須圖 > lines(density(mtcars$mpg),col="blue",lwd=2) #密度曲線 > x<-mtcars$mpg > h<-hist(x,breaks=12,col="red",xlab="Miles Per Gallon",main="Histogram with normal curve and box") > xfit<-seq(min(x),max(x),length=40) > yfit<-dnorm(xfit,mean=mean(x),sd=sd(x)) > yfit<-yfit*diff(h$mids[1:2])*length(x) > lines(xfit,yfit,col="blue",lwd=2) > box() > mtcars$mpg [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 [17] 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4
核密度圖
這個圖形比較少見,有點像原始版本的熱點圖,用來顯示變量的密度關系。
> library(sm) >par(mfrow=c(2,1)) > d<-density(mtcars$mpg) > plot(d) > d<-density(mtcars$mpg) > plot(d,main="Kernel Density of Miles Per Gallon") > polygon(d,col="red",border="blue") > attach(mtcars) > cyl.f<-factor(cyl,levels=c(4,6,8),labels=c("4 cylinder","6 cylinder","8 cylinder")) > sm.density.compare(mpg,cyl,xlab="Miles Per Gallon") > title(main="MPG Distribution by Car Cylinders") > > colfill<-c(2:(1+length(levels(cyl.f)))) #這行代碼沒效果 > legend(locator(1),levels(cyl.f),fill=colfill)
箱線圖
這個圖也比較有意思,它主要關注一組觀察變量的5個指標:Min,1/4,mean,4/3,Max。第一次發現這么有意思的分析方式,不過在日常的統計中,這5ge指標應該是經常被使用的,所以箱線圖也是非常實用的一種圖形。
boxplot(mtcars$mpg,main="Box plot",ylab="Miles per Gallon") > > boxplot(mpg~cyl,data=mtcars,main="Car Mileage Data", xlab="Number of Cylinders",ylab="Miles Per Gallon") boxplot(mpg~cyl,data=mtcars,notch=TRUE,varwidth=TRUE,col="red",main="Car Mileage Data",xlab="Number of Cylinders",ylab="Miles Per Gallon") #有對稱效果的箱線圖,該圖形包含了變量密度信息 #分組箱線圖 mtcars$cyl.f<-factor(mtcars$cyl,levels=c(4,6,8),labels=c("4","6","8")) > mtcars$cyl.f mtcars$am.f<-factor(mtcars$am,levels=c(0,1),labels=c("auto","standard")) > mtcars$am.f [1] standard standard standard auto auto auto auto auto auto [10] auto auto auto auto auto auto auto auto standard [19] standard standard auto auto auto auto auto standard standard [28] standard standard standard standard standard Levels: auto standard > boxplot(mpg~am.f*cyl.f,data=mtcars,varwidth=TRUE,col=c("gold","darkgreen"),main="MPG Distribution by Auto Type",xlab="Auto Type",ylab="Miles Per Gallon") >
小提琴圖
和箱線圖的分析套路類似,但是提供更加明確的變量密度分布信息。
> library(vioplot) x1<-mtcars$mpg[mtcars$cyl==4] > x2<-mtcars$mpg[mtcars$cyl==6] > x3<-mtcars$mpg[mtcars$cyl==8] > vioplot(x1,x2,x3,names=c("4 cyl","6 cyl","8 cyl"),col="gold") > title("Violin Plots of Miles Per Gallon",ylab="Miles Per Gallon",xlab="Number of Cylinders")
點圖
也是一種比較常見的圖,它的進化版本應該是散點圖
> dotchart(mtcars$mpg, labels=row.names(mtcars),cex=.7,main="Gas Mileage for Car Models",xlab="Miles Per Gallon") > #分組散點圖 > x<-mtcars[order(mtcars$mpg),] > x$cyl<-factor(x$cyl) > x$color[x$cyl==4] <- "red" > x$color[x$cyl==6] <- "blue" > x$color[x$cyl==8]<- "darkgreen" > dotchart(x$mpg,labels=row.names(x),cex=.7,groups=x$cyl,gcolor="black",color=x$color,pch=19,main="Gas Mileage for Car Models\ngrouped by cylinder", xlab="Miles Per Gallon")