朴素貝葉斯分類器(Naive Bayes)


1. 貝葉斯定理

如果有兩個事件,事件 A 和事件 B 。已知事件 A 發生的概率為 p(A) ,事件 B 發生的概率為 P(B) ,事件 A 發生的前提下。事件 B 發生的概率為 p(B|A) ,事件 B 發生的前提下。事件 A 發生的概率為 p(A|B) ,事件 A 和事件 B 同一時候發生的概率是 p(AB) 。則有

p(AB)=p(A)p(B|A)=p(B)p(A|B)(1)

依據式(1)能夠推出貝葉斯定理為
p(B|A)=p(B)p(A|B)p(A)(2)

給定一個全集 {B1,B1,,Bn} ,當中 Bi Bj 是不相交的,即 BiBj= 。則依據全概率公式。對於一個事件 A 。會有
p(A)=i=1np(Bi)p(A|Bi)(3)

則廣義的貝葉斯定理有
p(Bi|A)=p(Bi)p(A|Bi)ni=1p(Bi)p(A|Bi)(4)

2. 朴素貝葉斯基本原理

給定一組訓練數據集 {(X1,y1),(X2,y2),(X3,y3),,(Xm,ym)} 。當中, m 是樣本的個數。每個數據集包括着 n 個特征,即 Xi=(xi1,xi2,,xin) 。類標記集合為 {y1,y2,,yk} 。設 p(y=yi|X=x) 表示輸入的 X 樣本為 x 時,輸出的 y yk 的概率。
如果如今給定一個新的樣本 x 。要推斷其屬於哪一類,可分別求解 p(y=y1|x) p(y=y2|x) p(y=y3|x) ,…, p(y=yk|x) 的值。哪一個值最大,就屬於那一類。即,求解最大的后驗概率 argmaxp(y|x)


那怎樣求解出這些后驗概率呢?依據貝葉斯定理。有

p(y=yi|x)=p(yi)p(x|yi)p(x)(5)

一般地,朴素貝葉斯方法如果各個特征之間是相互獨立的,則式(5)能夠寫成:
p(y=yi|x)=p(yi)p(x|yi)p(x)=p(yi)nj=1p(xj|yi)nj=1p(xj)(6)

由於(6)式的分母。對於每個 p(y=yi|x) 求解都是一樣的。所以,在實際操作中。能夠省略掉。終於。朴素貝葉斯分類器的判別公式變成例如以下的形式:
y=argmaxyip(yi)p(x|yi)=argmaxyip(yi)j=1np(xj|yi)(7)

以下,是怎樣通過樣本對 p(y) p(x|y) 進行概率預計。

3. 朴素貝葉斯法的參數預計

3.1 極大似然預計

在朴素貝葉斯法中,學習就是意味着預計先驗概率 p(y) 和 條件概率 p(x|y) 。然后依據先驗概率和條件概率,去計算新的樣本的后驗概率 p(y|x)

當中,預計先驗概率和條件概率的方法有非常多,比方極大似然預計,多項式。高斯。伯努利等。
當中,在極大似然預計中,先驗概率 p(y) 的極大似然預計例如以下:

p(y=yi)=yi(8)

如果輸入樣本的第 j 的特征中全部可能取值的集合是 {aj1,aj2,,ajsj} 。則條件概率 p(x(j)|y=yi) 的極大似然預計例如以下:
p(x(j)=ajl|y=yi)=yijajlyi(9)

樣例1
該樣例來自李航的《統計學習方法》。
表中 X(1) X(2) 為特征,取值的集合各自是 A1={1,2,3} A2={S,M,L} Y 為類標記, Y=1,1

試求。 x=(2,S) 的類標記。

數據例如以下所看到的。當中,特征 X(2) 的取值 {S,M,L} 分別表示成 {0,1,2}

import numpy as np
import pandas as pd

x1 = np.array([1,1,1,1,1,2,2,2,2,2,3,3,3,3,3])
x2 = np.array([0,1,1,0,0,0,1,1,2,2,2,1,1,2,2])
y = np.array([-1,-1,1,1,-1,-1,-1,1,1,1,1,1,1,1,-1])

dataSet = np.concatenate((x1[:,None],x2[:,None],y[:,None]),axis=1)

df = pd.DataFrame(dataSet,index=np.arange(1,16,1),columns=['X1','X2','y'])

df.T
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
X1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
X2 0 1 1 0 0 0 1 1 2 2 2 1 1 2 2
y -1 -1 1 1 -1 -1 -1 1 1 1 1 1 1 1 -1

求解

step1: 求解先驗概率
p(y=1)=615 p(y=1)=915

step2 求解條件概率
(2.1) 特征 X1
p(X1=1|y=1)=36=12 p(X1=2|y=1)=26=13 p(X1=3|y=1)=16
p(X1=1|y=1)=29 p(X1=2|y=1)=39=13 p(X1=3|y=1)=49

(2.2) 特征 X1
p(X2=0|y=1)=36=12 p(X2=1|y=1)=26=13 p(X2=2|y=1)=16
p(X2=0|y=1)=19 p(X2=1|y=1)=49= p(X2=2|y=1)=49

step3 求解后驗概率
p(y=1)p(X=(2,S)|y=1)=p(y=1)p(X1=2|y=1)p(X2=S|y=1)=6151312=115
p(y=1)p(X=(2,S)|y=1)=p(y=1)p(X1=2|y=1)p(X2=S|y=1)=9151319=145

由於 115>145 , 所以該樣本的類標記為 1

例如以下是python的極大似然預計的朴素貝葉斯代碼,代碼執行結果跟求解一致。

class MLENB:
    """ Maximum likelihood estimation Naive Bayes Attributes ---------- class_prior_ : array, shape (n_classes, ) Smoothed empirical probability for each class. class_count_: array, shape (n_classes,) number of training samples observed in each class. MLE_: array, shape(n_classes, n_features) Maximum likelihood estimation of each feature per class, each of element is a dict """

    def __init__(self):
        pass

    def fit(self,X,y):
        """Fit maximum likelihood estimation Naive Bayes according to X, y Parameters ---------- X : array-like, shape (n_samples, n_features) Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape (n_samples,) Target values. Returns ------- self : object Returns self. """
        n_samples = X.shape[0]
        n_features = X.shape[1]
        n_classes = len(set(y))

        self.class_count_ = np.empty(n_classes)
        self.class_prior_ = np.empty(n_classes)
        self.MLE_ = np.empty((n_classes,n_features),dtype=dict)

        self.target_unique = np.unique(y)
        for i in range(n_classes):
            dataX_tu = X[y == self.target_unique[i]]
            self.class_prior_[i] = dataX_tu.shape[0] / float(len(y))
            self.class_count_[i] = dataX_tu.shape[0]

            for j in range(n_features):
                feature = dataX_tu[:,j]
                feature_unique = np.unique(feature)
                fp = {}
                for f_item in feature_unique:
                    fp[f_item] = list(feature).count(f_item) / float(len(feature))
                self.MLE_[i,j] = fp

        return self

    def __predict_likelihood(self,x):
        if x.ndim == 1:
            x = np.array([x])
        n_samples = x.shape[0]
        n_features = x.shape[1]
        n_classes = len(self.class_count_)

        likelihood = []
        for x_item in x:
            class_p = []
            for i in range(n_classes):
                p = self.class_prior_[i]
                for j in range(n_features):
                    if x_item[j] in self.MLE_[i,j]:
                        p *= self.MLE_[i,j][x_item[j]]
                    else:
                        p *= 0
                class_p.append(p)           
            likelihood.append(class_p)
        return np.array(likelihood)


    def predict(self,x):
        """Perform classification on an array of test vectors X. Parameters ---------- X : array-like, shape = [n_samples, n_features] Returns ------- C : array, shape = [n_samples] Predicted target values for X """

        likelihood = self.__predict_likelihood(x)
        max_index = np.argmax(likelihood, axis=1)
        return np.array([self.target_unique[i] for i in max_index])

    def predict_proba(self,x):
        """ Return probability estimates for the test vector X. Parameters ---------- X : array-like, shape = [n_samples, n_features] Returns ------- C : array-like, shape = [n_samples, n_classes] Returns the probability of the samples for each class in the model. The columns correspond to the classes in sorted order, as they appear in the attribute `classes_`. """
        likelihood = self.__predict_likelihood(x)
        return np.array([lh / np.sum(lh) for lh in likelihood])    
# 測驗結果
X = dataSet[:,0:-1]
y = dataSet[:,-1]

mlenb = MLENB()
mlenb.fit(X,y)
print(mlenb.predict(np.array([2,0])))
print(mlenb.predict_proba(np.array([2,0])))

[-1]
[[ 0.75  0.25]]

3.2 Multinomial Naive Bayes

用極大似然預計可能會出現所要預計的概率值為0的情況。

這時會影響到后驗概率的計算結果,使分類產生偏差。這時。能夠採用多項式模型,對先驗概率和條件概率做一些平滑處理。詳細公式為:
先驗概率 p(y) 的預計例如以下:

p(y=yi)=yi+α+×α(10)

如果輸入樣本的第 j 個特征的全部可能取值的集合是 {aj1,aj2,,ajsj} 。則條件概率 p(x(j)|y=yi) 的預計例如以下:

p(x(j)|y=yi)=yi,jajl+αyi+j×α(11)

當中。 α 是平滑值。當 α=1 時,是拉普拉斯平滑(Laplace smoothing),當 α=0 時,退化到極大似然預計。當 0<α<1 時,稱作Lidstone平滑。

有個疑問:多項式朴素貝葉斯與李航《統計學習方法》中說的貝葉斯預計有啥差別?本文的方法是參考李航的貝葉斯預計。

python的多項式朴素貝葉斯的參考代碼例如以下:

class MultinomialNB:
    """Naive Bayes classifier for multinomial models Attributes ---------- class_prior_ : array, shape (n_classes, ) Smoothed empirical probability for each class. class_count_: array, shape (n_classes,) number of training samples observed in each class. bayes_estimation_: array, shape(n_classes, n_features) bayes estimations of each feature per class, each of element is a dict """
    def __init__(self, alpha=1.0):
        self.alpha_ = 1.0

    def fit(self,X,y):
        n_samples = X.shape[0]
        n_features = X.shape[1]
        n_classes = len(set(y))

        self.class_count_ = np.empty(n_classes)
        self.class_prior_ = np.empty(n_classes)
        self.bayes_estimation_ = np.empty((n_classes,n_features),dtype=dict)

        self.target_unique = np.unique(y)
        for i in range(n_classes):
            dataX_tu = X[y == self.target_unique[i]]
            self.class_prior_[i] = (dataX_tu.shape[0] + self.alpha_) / (float(len(y)) + n_classes * self.alpha_)
            self.class_count_[i] = dataX_tu.shape[0]

            for j in range(n_features):
                feature = dataX_tu[:,j]
                feature_unique = np.unique(feature)
                fp = {}
                for f_item in feature_unique:
                    fp[f_item] = (list(feature).count(f_item) + self.alpha_) / (float(len(feature)) + len(feature_unique) * self.alpha_)
                self.bayes_estimation_[i,j] = fp

        return self

    def __predict_likelihood(self,x):
        if x.ndim == 1:
            x = np.array([x])
        n_samples = x.shape[0]
        n_features = x.shape[1]
        n_classes = len(self.class_count_)

        likelihood = []
        for x_item in x:
            class_p = []
            for i in range(n_classes):
                p = self.class_prior_[i]
                for j in range(n_features):
                    if x_item[j] in self.bayes_estimation_[i,j]:
                        p *= self.bayes_estimation_[i,j][x_item[j]]
                    else:
                        p *= 0
                class_p.append(p)           
            likelihood.append(class_p)
        return np.array(likelihood)


    def predict(self,x):
        likelihood = self.__predict_likelihood(x)
        max_index = np.argmax(likelihood, axis=1)
        return np.array([self.target_unique[i] for i in max_index])

    def predict_proba(self,x):
        likelihood = self.__predict_likelihood(x)
        return np.array([lh / np.sum(lh) for lh in likelihood])
# 測驗結果
X = dataSet[:,0:-1]
y = dataSet[:,-1]

mnb = MultinomialNB()
mnb.fit(X,y)
print(mnb.predict(np.array([2,0])))
print(mnb.predict_proba(np.array([2,0])))
[-1]
[[ 0.65116279  0.34883721]]

3.3 Gaussian Naive Bayes

當輸入的特征是連續值的時候,我們無法用上面的方法來預計先驗概率和條件概率,能夠採用高斯模型。

高斯模型如果特征服從高斯分布。
其特征的似然預計例如以下所看到的:

p(xi|y)=12πσ2yexp((xiμy)22σ2y)(12)

當中。
σ2y 是第 i 個特征的方差, μy 是第 i 個特征的均值。
其python代碼例如以下:

class GaussianNB:
    """ Attributes ---------- class_prior_ : array, shape (n_classes,) probability of each class. class_count_ : array, shape (n_classes,) number of training samples observed in each class. theta_ : array, shape (n_classes, n_features) mean of each feature per class sigma_ : array, shape (n_classes, n_features) variance of each feature per class """

    def __init__(self):
        pass

    def fit(self, X, y):
        n_samples = X.shape[0]
        n_features = X.shape[1]
        n_classes = len(set(y))

        self.theta_ = np.zeros([n_classes,n_features]) 
        self.sigma_ = np.zeros([n_classes,n_features]) 
        self.class_prior = np.zeros(n_classes)   
        self.class_count = np.zeros(n_classes)   

        self.target_unique = np.unique(y)    
        for i in range(n_classes):
            dataX_tu = X[y == self.target_unique[i]]
            self.class_prior[i] = dataX_tu.shape[0] / float(len(y))
            self.class_count[i] = dataX_tu.shape[0]
            self.theta_[i,:] = np.mean(dataX_tu,axis=0)
            self.sigma_[i,:] = np.var(dataX_tu,axis=0)

        return self

    def __predict_likelihood(self,x):
        if x.ndim == 1:
            x = np.array([x])

        n_samples = x.shape[0]
        likelihood = []
        for x_item in x:
            gaussian = np.exp(-(x_item-self.theta_)**2 / (2 * self.sigma_)) / np.sqrt(2*np.pi*self.sigma_)
            p = np.exp(np.sum(np.log(gaussian),axis=1))
            likelihood.append(self.class_prior * p)
        return np.array(likelihood)

    def predict(self,x):
        likelihood = self.__predict_likelihood(x)
        max_index = np.argmax(likelihood, axis=1)
        return np.array([self.target_unique[i] for i in max_index])

    def predict_proba(self,x):
        likelihood = self.__predict_likelihood(x)
        return np.array([lh / np.sum(lh) for lh in likelihood])
# 測驗結果
X = dataSet[:,0:-1]
y = dataSet[:,-1]

gnb = GaussianNB()
gnb.fit(X,y)
print(gnb.predict(np.array([2,0])))
print(gnb.predict_proba(np.array([2,0])))
[-1]
[[ 0.74566865  0.25433135]]

3.4 Bernoulli Naive Bayes

5. Naive Bayes 注意事項

  1. Works only with categorical predictors, numerical predictors must be categorized or binned before use
  2. Works with the assumption of predictor independence, and thus cannot detect or account for relationships between the predictors, unlike a decision tree for example.


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM