對book3.csv數據集,實現如下功能:
(1)創建訓練集、測試集
(2)用rpart包創建關於類別的cart算法的決策樹
(3)用測試集進行測試,並評估模型
book3.csv數據集

setwd('D:\\data') list.files() dat=read.csv(file="book3.csv",header=TRUE) #變量重命名,並通過x1~x11對class屬性進行預測 colnames(dat)<-c("x1","x2","x3","x4","x5","x6","x7","x8","x9","x10","x11","class") n=nrow(dat) split<-sample(n,n*(3/4)) traindata=dat[split,] testdata=dat[-split,] set.seed(1) library(rpart) #用測試集進行測試 Gary1<-rpart(class~.,data=testdata,method="class", control=rpart.control(minsplit=1),parms=list(split="gini")) printcp(Gary1) #交叉矩陣評估模型 pre1<-predict(Gary1,newdata=testdata,type='class') tab<-table(pre1,testdata$class) tab #評估模型(預測)的正確率 sum(diag(tab))/sum(tab)
實現過程
數據預處理並創建訓練(測試)集
setwd('D:\\data') list.files() dat=read.csv(file="book3.csv",header=TRUE) #變量重命名,並通過x1~x11對class屬性進行預測 colnames(dat)<-c("x1","x2","x3","x4","x5","x6","x7","x8","x9","x10","x11","class") n=nrow(dat) split<-sample(n,n*(3/4)) traindata=dat[split,] testdata=dat[-split,]
設定生成隨機數的種子,種子是為了讓結果具有重復性
set.seed(1)
加載rpart包創建關於類別的cart算法的決策樹
library(rpart)
用測試集進行測試
> Gary1<-rpart(class~.,data=testdata,method="class", control=rpart.control(minsplit=1),parms=list(split="gini")) > printcp(Gary1) Classification tree: #分類樹: rpart(formula = class ~ ., data = testdata, method = "class", parms = list(split = "gini"), control = rpart.control(minsplit = 1)) Variables actually used in tree construction: #樹構建中實際使用的變量: [1] x1 x10 x2 x4 x5 x8 #〔1〕X1 x10 x2 x4 x5 x8 Root node error: 57/175 = 0.32571 #根節點錯誤:57/175=0.32571 n= 175 CP nsplit rel error xerror xstd 1 0.754386 0 1.000000 1.00000 0.108764 2 0.052632 1 0.245614 0.31579 0.070501 3 0.035088 3 0.140351 0.31579 0.070501 4 0.017544 6 0.035088 0.35088 0.073839 5 0.010000 7 0.017544 0.31579 0.070501
交叉矩陣評估模型
pre1<-predict(Gary1,newdata=testdata,type='class') > tab<-table(pre1,testdata$class) > tab pre1 惡性 良性 惡性 57 1 良性 0 117
評估模型(預測)的正確率
對角線上的數據實際值和預測值相同,非對角線上的值為預測錯誤的值
> sum(diag(tab))/sum(tab)
[1] 0.9942857