數據分析與挖掘 - R語言:貝葉斯分類算法(案例二)


接着案例一,我們再使用另一種方法實例一個案例

 

直接上代碼:

#!/usr/bin/Rscript

library(plyr)
library(reshape2)

#1、根據訓練集創建朴素貝葉斯分類器
#1.1、生成類別的概率

##計算訓練集合D中類別出現的概率,即P{c_i}
##輸入:trainData 訓練集,類型為數據框
##      strClassName 指明訓練集中名稱為    strClassName列為分類結果
##輸出:數據框,P{c_i}的集合,類別名稱|概率(列名為 prob)
class_prob <- function(trainData, strClassName){
    #訓練集樣本數
    #nrow返回行數
    length.train <- nrow(trainData)
    dTemp <- ddply(trainData, strClassName, "nrow")
    dTemp <- ddply(dTemp, strClassName, mutate, prob = nrow/length.train)
    dTemp[,-2]
}

##1.2、生成每個類別下,特征取不同值的概率
##計算訓練集合D中,生成每個類別下,特征取不同值的概率,即P{fi|c_i}
##輸入:trainData 訓練集,類型為數據框
##      strClassName 指明訓練集中名稱為strClassName列為分類結果,其余的全部列認為是特征值
##輸出:數據框,P{fi|c_i}的集合,類別名稱|特征名稱|特征取值|概率(列名為 prob)
feature_class_prob <- function(trainData, strClassName){
    # 橫表轉換為縱表
    data.melt <- melt(trainData,id=c(strClassName))
    # 統計頻數
    aa <- ddply(data.melt, c(strClassName,"variable","value"), "nrow")
    # 計算概率
    bb <- ddply(aa, c(strClassName,"variable"), mutate, sum = sum(nrow), prob = nrow/sum)
    # 增加列名
    colnames(bb) <- c("class.name",
                    "feature.name",
                    "feature.value",
                    "feature.nrow",
                    "feature.sum",
                    "prob")
    # 返回結果
    bb[,c(1,2,3,6)]
}

## 以上創建完朴素貝葉斯分類器

## 2、使用生成的朴素貝葉斯分類器進行預測 ##使用生成的朴素貝葉斯分類器進行預測P{fi|c_i} ##輸入:oneObs 數據框,待預測的樣本,格式為 特征名稱|特征值 ## pc 數據框,訓練集合D中類別出現的概率,即P{c_i} 類別名稱|概率 ## pfc 數據框,每個類別下,特征取不同值的概率,即P{fi|c_i} ## 類別名稱|特征名稱|特征值|概率 ##輸出:數據框,待預測樣本的分類對每個類別的概率,類別名稱|后驗概率(列名為 prob) pre_class <- function(oneObs, pc,pfc){ colnames(oneObs) <- c("feature.name", "feature.value") colnames(pc) <- c("class.name","prob") colnames(pfc) <- c("class.name","feature.name","feature.value","prob") # 取出特征的取值的條件概率 feature.all <- join(oneObs,pfc,by=c("feature.name","feature.value"),type="inner") # 取出特征取值的條件概率連乘 feature.prob <- ddply(feature.all,.(class.name),summarize,prob_fea=prod(prob)) #prod為連乘函數 #取出類別的概率 class.all <- join(feature.prob,pc,by="class.name",type="inner") #輸出結果 ddply(class.all,.(class.name),mutate,pre_prob=prob_fea*prob)[,c(1,4)] } ##3、數據測試 ##用上面蘋果的數據作為例子進行測試 #訓練集 train.apple <-data.frame( size=c("","","","","",""), weight=c("","","","","",""), color=c("","","","","",""), taste=c("good","good","bad","bad","bad","good") ) #待預測樣本 oneObs<-data.frame( feature.name =c("size", "weight", "color"), feature.value =c("","","") ) #預測分類 pc <- class_prob(train.apple,"taste") pfc <- feature_class_prob(train.apple,"taste") pre_class(oneObs, pc, pfc)

預測結果為:

class.name pre_prob
1 bad 0.07407407
2 good 0.03703704

可見該蘋果的口味為:bad

 

*********************************************這里是分割線****************************************************

我們使用這個方法再預測一下案例一中的數據集。

#數據集樣本
data <- data.frame(c("sunny","hot","high","weak","no",  
                 "sunny","hot","high","strong","no",  
                 "overcast","hot","high","weak","yes",  
                 "rain","mild","high","weak","yes",  
                 "rain","cool","normal","weak","yes",  
                 "rain","cool","normal","strong","no",  
                 "overcast","cool","normal","strong","yes",  
                 "sunny","mild","high","weak","no",  
                 "sunny","cool","normal","weak","yes",  
                 "rain","mild","normal","weak","yes",  
                 "sunny","mild","normal","strong","yes",  
                 "overcast","mild","high","strong","yes",  
                 "overcast","hot","normal","weak","yes",  
                 "rain","mild","high","strong","no"), 
                 byrow = TRUE,
                 dimnames = list(day = c(),condition = c("outlook","temperature","humidity","wind","playtennis")), 
                 nrow=14, 
                 ncol=5);  

#待預測樣本
ddata<-data.frame(
    feature.name =c("outlook", "temperature","humidity","wind"),
    feature.value =c("overcast","mild","normal","weak")
)


#預測分類
pc <- class_prob(data,"playtennis")
pfc <- feature_class_prob(data,"playtennis")
pre_class(ddata, pc, pfc)

預測結果為:

class.name   pre_prob
1         no 0.02666667
2        yes 0.13168724

預測結果為:yes,可見與案例一的結果一樣。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM