R_Studio(cart算法決策樹)對book3.csv數據用測試集進行測試並評估模型

本文轉載自查看原文 2018-10-20 14:13 808 R

對book3.csv數據集，實現如下功能：

　　（1）創建訓練集、測試集

　　（2）用rpart包創建關於類別的cart算法的決策樹

　　（3）用測試集進行測試，並評估模型

　　book3.csv數據集

setwd('D:\\data')                       
list.files()　                               
dat=read.csv(file="book3.csv",header=TRUE)   

#變量重命名，並通過x1~x11對class屬性進行預測
colnames(dat)<-c("x1","x2","x3","x4","x5","x6","x7","x8","x9","x10","x11","class")

n=nrow(dat)
split<-sample(n,n*(3/4))
traindata=dat[split,]
testdata=dat[-split,]

set.seed(1) 

library(rpart)

#用測試集進行測試
Gary1<-rpart(class~.,data=testdata,method="class", control=rpart.control(minsplit=1),parms=list(split="gini"))  
printcp(Gary1)

#交叉矩陣評估模型
pre1<-predict(Gary1,newdata=testdata,type='class')
tab<-table(pre1,testdata$class)
tab

#評估模型(預測)的正確率
sum(diag(tab))/sum(tab)

Gary.Script

實現過程

　　數據預處理並創建訓練(測試)集

setwd('D:\\data')                       
list.files()　                               
dat=read.csv(file="book3.csv",header=TRUE)   

#變量重命名，並通過x1~x11對class屬性進行預測
colnames(dat)<-c("x1","x2","x3","x4","x5","x6","x7","x8","x9","x10","x11","class")

n=nrow(dat)
split<-sample(n,n*(3/4))
traindata=dat[split,]
testdata=dat[-split,]

　　設定生成隨機數的種子,種子是為了讓結果具有重復性

set.seed(1)

　　加載rpart包創建關於類別的cart算法的決策樹

library(rpart)

　　用測試集進行測試

> Gary1<-rpart(class~.,data=testdata,method="class", control=rpart.control(minsplit=1),parms=list(split="gini"))  
> printcp(Gary1)

Classification tree:　　　　　　　　　　　　　　　　　　　　　　#分類樹：
rpart(formula = class ~ ., data = testdata, method = "class", 
    parms = list(split = "gini"), control = rpart.control(minsplit = 1))

Variables actually used in tree construction:　　　　　　#樹構建中實際使用的變量：
[1] x1  x10 x2  x4  x5  x8 　　　　　　　　　　　　　　　　　#〔1〕X1 x10 x2 x4 x5 x8

Root node error: 57/175 = 0.32571    　　　　　　　　　　　#根節點錯誤：57/175＝0.32571

n= 175 

        CP nsplit rel error  xerror     xstd
1 0.754386      0  1.000000 1.00000 0.108764
2 0.052632      1  0.245614 0.31579 0.070501
3 0.035088      3  0.140351 0.31579 0.070501
4 0.017544      6  0.035088 0.35088 0.073839
5 0.010000      7  0.017544 0.31579 0.070501

　　交叉矩陣評估模型

 pre1<-predict(Gary1,newdata=testdata,type='class')
> tab<-table(pre1,testdata$class)
> tab
      
pre1   惡性 良性
  惡性   57    1
  良性    0  117

　　評估模型(預測)的正確率

　　對角線上的數據實際值和預測值相同，非對角線上的值為預測錯誤的值

> sum(diag(tab))/sum(tab)
[1] 0.9942857

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 決策樹-Cart算法二 R_針對churn數據用id3、cart、C4.5和C5.0創建決策樹模型進行判斷哪種模型更合適決策樹之 CART 理解CART決策樹 R（rattle）實現決策樹算法決策樹--CART樹詳解 R_Studio(關聯)對Groceries數據集進行關聯分析機器學習-CART決策樹機器學習：基於CART算法的決策樹——分類樹與回歸樹決策樹模型