1.理解回歸樹和模型樹

決策樹用於數值預測：

回歸樹：基於到達葉節點的案例的平均值做出預測，沒有使用線性回歸的方法。
模型樹：在每個葉節點，根據到達該節點的案例建立多元線性回歸模型。因此葉節點數目越多，一顆模型樹越大，比同等回歸樹更難理解，但模型可能更精確。

將回歸加入到決策樹：

分類決策樹中，一致性（均勻性）由熵值來度量；數值決策樹，則通過統計量（如方差、標准差或平均絕對偏差等）來度量。

標准偏差減少SDR：一個常見的分割標准。

原始值的標准差減去分割后加權標准差的減少量
比如計算特征A和特征B的SDR分別為1.2和1.4，即特征B標准差減少得更多（更加均勻），所以首先使用特征B，這就是回歸樹。而模型樹則需要再建立一個結果相對於特征A的線性回歸模型，然后根據兩個線性模型中的任何一個為新的案例做出預測。

2.回歸樹和模型樹應用示例

葡萄酒質量評級

1）收集數據

白葡萄酒數據包含4898個葡萄酒案例的11種化學特征的信息（如酸性/含糖量/pH/密度等，還包含一列質量等級）。
數據下載：

鏈接: https://pan.baidu.com/s/1pN_PtZOYjOz2I-KJqSq6pw 提取碼: 6swg

2）探索和准備數據

## Step 2: Exploring and preparing the data ----
wine <- read.csv("whitewines.csv")

# examine the wine data
str(wine)

# the distribution of quality ratings
hist(wine$quality)

# summary statistics of the wine data
summary(wine)

wine_train <- wine[1:3750, ]
wine_test <- wine[3751:4898, ]

3）訓練數據

## Step 3: Training a model on the data ----
# regression tree using rpart
library(rpart)
m.rpart <- rpart(quality ~ ., data = wine_train)

# get basic information about the tree
m.rpart

# get more detailed information about the tree
summary(m.rpart)

# use the rpart.plot package to create a visualization
library(rpart.plot)

# a basic decision tree diagram
rpart.plot(m.rpart, digits = 3)

# a few adjustments to the diagram
rpart.plot(m.rpart, digits = 4, fallen.leaves = TRUE, type = 3, extra = 101)

alcohol是決策樹種第一個使用的變量，所以它是葡萄酒質量種唯一最重要的指標。

4）評估模型

①預測值與真實值的范圍以及相關性
②用平均絕對誤差度量性能
平均絕對誤差MAE：考慮預測值離真實值有多遠

## Step 4: Evaluate model performance ----

# generate predictions for the testing dataset
p.rpart <- predict(m.rpart, wine_test)

# compare the distribution of predicted values vs. actual values
summary(p.rpart)
summary(wine_test$quality)

# compare the correlation
cor(p.rpart, wine_test$quality)

# function to calculate the mean absolute error
MAE <- function(actual, predicted) {
  mean(abs(actual - predicted))  
}

# mean absolute error between predicted and actual values
MAE(p.rpart, wine_test$quality)

# mean absolute error between actual values and mean value
mean(wine_train$quality) # result = 5.87
MAE(5.87, wine_test$quality)

5）提高模型性能

回歸樹在葉節點進行預測時只使用了一個單一的值，模型樹可以通過回歸樹模型取代葉節點來改善回歸樹。

M5'算法（M5-prime）：RWeka::M5P函數


## Step 5: Improving model performance ----
# train a M5' Model Tree
library(RWeka)
m.m5p <- M5P(quality ~ ., data = wine_train)

# display the tree
m.m5p

# get a summary of the model's performance
summary(m.m5p)

# generate predictions for the model
p.m5p <- predict(m.m5p, wine_test)

# summary statistics about the predictions
summary(p.m5p)

# correlation between the predicted and true values
cor(p.m5p, wine_test$quality)

# mean absolute error of predicted and true values
# (uses a custom function defined above)
MAE(wine_test$quality, p.m5p)

分割與回歸樹相似，但節點不是以一個數值預測終止，而是以一個線性模型終止（LM1，LM2...LM163）

模型樹的預測范圍、相關性、平均絕對誤差比回歸樹都有所改善。

PS：回歸樹和模型樹的結果比較費解，這篇推文解讀有點簡單

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 機器學習-線性回歸（基於R語言）機器學習——回歸樹【機器學習與R語言】13- 如何提高模型的性能？ Spark機器學習5·回歸模型(pyspark) R語言機器學習包機器學習與R語言：NB 吳裕雄 python 機器學習——集成學習梯度提升決策樹GradientBoostingRegressor回歸模型機器學習（七）—回歸機器學習《回歸二》【機器學習實戰】第9章樹回歸（Tree Regression）