朴素贝叶斯分类器(Naive Bayes)


1. 贝叶斯定理

如果有两个事件,事件 A 和事件 B 。已知事件 A 发生的概率为 p(A) ,事件 B 发生的概率为 P(B) ,事件 A 发生的前提下。事件 B 发生的概率为 p(B|A) ,事件 B 发生的前提下。事件 A 发生的概率为 p(A|B) ,事件 A 和事件 B 同一时候发生的概率是 p(AB) 。则有

p(AB)=p(A)p(B|A)=p(B)p(A|B)(1)

依据式(1)能够推出贝叶斯定理为
p(B|A)=p(B)p(A|B)p(A)(2)

给定一个全集 {B1,B1,,Bn} ,当中 Bi Bj 是不相交的,即 BiBj= 。则依据全概率公式。对于一个事件 A 。会有
p(A)=i=1np(Bi)p(A|Bi)(3)

则广义的贝叶斯定理有
p(Bi|A)=p(Bi)p(A|Bi)ni=1p(Bi)p(A|Bi)(4)

2. 朴素贝叶斯基本原理

给定一组训练数据集 {(X1,y1),(X2,y2),(X3,y3),,(Xm,ym)} 。当中, m 是样本的个数。每个数据集包括着 n 个特征,即 Xi=(xi1,xi2,,xin) 。类标记集合为 {y1,y2,,yk} 。设 p(y=yi|X=x) 表示输入的 X 样本为 x 时,输出的 y yk 的概率。
如果如今给定一个新的样本 x 。要推断其属于哪一类,可分别求解 p(y=y1|x) p(y=y2|x) p(y=y3|x) ,…, p(y=yk|x) 的值。哪一个值最大,就属于那一类。即,求解最大的后验概率 argmaxp(y|x)


那怎样求解出这些后验概率呢?依据贝叶斯定理。有

p(y=yi|x)=p(yi)p(x|yi)p(x)(5)

一般地,朴素贝叶斯方法如果各个特征之间是相互独立的,则式(5)能够写成:
p(y=yi|x)=p(yi)p(x|yi)p(x)=p(yi)nj=1p(xj|yi)nj=1p(xj)(6)

由于(6)式的分母。对于每个 p(y=yi|x) 求解都是一样的。所以,在实际操作中。能够省略掉。终于。朴素贝叶斯分类器的判别公式变成例如以下的形式:
y=argmaxyip(yi)p(x|yi)=argmaxyip(yi)j=1np(xj|yi)(7)

以下,是怎样通过样本对 p(y) p(x|y) 进行概率预计。

3. 朴素贝叶斯法的參数预计

3.1 极大似然预计

在朴素贝叶斯法中,学习就是意味着预计先验概率 p(y) 和 条件概率 p(x|y) 。然后依据先验概率和条件概率,去计算新的样本的后验概率 p(y|x)

当中,预计先验概率和条件概率的方法有非常多,比方极大似然预计,多项式。高斯。伯努利等。
当中,在极大似然预计中,先验概率 p(y) 的极大似然预计例如以下:

p(y=yi)=yi(8)

如果输入样本的第 j 的特征中全部可能取值的集合是 {aj1,aj2,,ajsj} 。则条件概率 p(x(j)|y=yi) 的极大似然预计例如以下:
p(x(j)=ajl|y=yi)=yijajlyi(9)

样例1
该样例来自李航的《统计学习方法》。
表中 X(1) X(2) 为特征,取值的集合各自是 A1={1,2,3} A2={S,M,L} Y 为类标记, Y=1,1

试求。 x=(2,S) 的类标记。

数据例如以下所看到的。当中,特征 X(2) 的取值 {S,M,L} 分别表示成 {0,1,2}

import numpy as np
import pandas as pd

x1 = np.array([1,1,1,1,1,2,2,2,2,2,3,3,3,3,3])
x2 = np.array([0,1,1,0,0,0,1,1,2,2,2,1,1,2,2])
y = np.array([-1,-1,1,1,-1,-1,-1,1,1,1,1,1,1,1,-1])

dataSet = np.concatenate((x1[:,None],x2[:,None],y[:,None]),axis=1)

df = pd.DataFrame(dataSet,index=np.arange(1,16,1),columns=['X1','X2','y'])

df.T
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
X1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
X2 0 1 1 0 0 0 1 1 2 2 2 1 1 2 2
y -1 -1 1 1 -1 -1 -1 1 1 1 1 1 1 1 -1

求解

step1: 求解先验概率
p(y=1)=615 p(y=1)=915

step2 求解条件概率
(2.1) 特征 X1
p(X1=1|y=1)=36=12 p(X1=2|y=1)=26=13 p(X1=3|y=1)=16
p(X1=1|y=1)=29 p(X1=2|y=1)=39=13 p(X1=3|y=1)=49

(2.2) 特征 X1
p(X2=0|y=1)=36=12 p(X2=1|y=1)=26=13 p(X2=2|y=1)=16
p(X2=0|y=1)=19 p(X2=1|y=1)=49= p(X2=2|y=1)=49

step3 求解后验概率
p(y=1)p(X=(2,S)|y=1)=p(y=1)p(X1=2|y=1)p(X2=S|y=1)=6151312=115
p(y=1)p(X=(2,S)|y=1)=p(y=1)p(X1=2|y=1)p(X2=S|y=1)=9151319=145

由于 115>145 , 所以该样本的类标记为 1

例如以下是python的极大似然预计的朴素贝叶斯代码,代码执行结果跟求解一致。

class MLENB:
    """ Maximum likelihood estimation Naive Bayes Attributes ---------- class_prior_ : array, shape (n_classes, ) Smoothed empirical probability for each class. class_count_: array, shape (n_classes,) number of training samples observed in each class. MLE_: array, shape(n_classes, n_features) Maximum likelihood estimation of each feature per class, each of element is a dict """

    def __init__(self):
        pass

    def fit(self,X,y):
        """Fit maximum likelihood estimation Naive Bayes according to X, y Parameters ---------- X : array-like, shape (n_samples, n_features) Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape (n_samples,) Target values. Returns ------- self : object Returns self. """
        n_samples = X.shape[0]
        n_features = X.shape[1]
        n_classes = len(set(y))

        self.class_count_ = np.empty(n_classes)
        self.class_prior_ = np.empty(n_classes)
        self.MLE_ = np.empty((n_classes,n_features),dtype=dict)

        self.target_unique = np.unique(y)
        for i in range(n_classes):
            dataX_tu = X[y == self.target_unique[i]]
            self.class_prior_[i] = dataX_tu.shape[0] / float(len(y))
            self.class_count_[i] = dataX_tu.shape[0]

            for j in range(n_features):
                feature = dataX_tu[:,j]
                feature_unique = np.unique(feature)
                fp = {}
                for f_item in feature_unique:
                    fp[f_item] = list(feature).count(f_item) / float(len(feature))
                self.MLE_[i,j] = fp

        return self

    def __predict_likelihood(self,x):
        if x.ndim == 1:
            x = np.array([x])
        n_samples = x.shape[0]
        n_features = x.shape[1]
        n_classes = len(self.class_count_)

        likelihood = []
        for x_item in x:
            class_p = []
            for i in range(n_classes):
                p = self.class_prior_[i]
                for j in range(n_features):
                    if x_item[j] in self.MLE_[i,j]:
                        p *= self.MLE_[i,j][x_item[j]]
                    else:
                        p *= 0
                class_p.append(p)           
            likelihood.append(class_p)
        return np.array(likelihood)


    def predict(self,x):
        """Perform classification on an array of test vectors X. Parameters ---------- X : array-like, shape = [n_samples, n_features] Returns ------- C : array, shape = [n_samples] Predicted target values for X """

        likelihood = self.__predict_likelihood(x)
        max_index = np.argmax(likelihood, axis=1)
        return np.array([self.target_unique[i] for i in max_index])

    def predict_proba(self,x):
        """ Return probability estimates for the test vector X. Parameters ---------- X : array-like, shape = [n_samples, n_features] Returns ------- C : array-like, shape = [n_samples, n_classes] Returns the probability of the samples for each class in the model. The columns correspond to the classes in sorted order, as they appear in the attribute `classes_`. """
        likelihood = self.__predict_likelihood(x)
        return np.array([lh / np.sum(lh) for lh in likelihood])    
# 測验结果
X = dataSet[:,0:-1]
y = dataSet[:,-1]

mlenb = MLENB()
mlenb.fit(X,y)
print(mlenb.predict(np.array([2,0])))
print(mlenb.predict_proba(np.array([2,0])))

[-1]
[[ 0.75  0.25]]

3.2 Multinomial Naive Bayes

用极大似然预计可能会出现所要预计的概率值为0的情况。

这时会影响到后验概率的计算结果,使分类产生偏差。这时。能够採用多项式模型,对先验概率和条件概率做一些平滑处理。详细公式为:
先验概率 p(y) 的预计例如以下:

p(y=yi)=yi+α+×α(10)

如果输入样本的第 j 个特征的全部可能取值的集合是 {aj1,aj2,,ajsj} 。则条件概率 p(x(j)|y=yi) 的预计例如以下:

p(x(j)|y=yi)=yi,jajl+αyi+j×α(11)

当中。 α 是平滑值。当 α=1 时,是拉普拉斯平滑(Laplace smoothing),当 α=0 时,退化到极大似然预计。当 0<α<1 时,称作Lidstone平滑。

有个疑问:多项式朴素贝叶斯与李航《统计学习方法》中说的贝叶斯预计有啥差别?本文的方法是參考李航的贝叶斯预计。

python的多项式朴素贝叶斯的參考代码例如以下:

class MultinomialNB:
    """Naive Bayes classifier for multinomial models Attributes ---------- class_prior_ : array, shape (n_classes, ) Smoothed empirical probability for each class. class_count_: array, shape (n_classes,) number of training samples observed in each class. bayes_estimation_: array, shape(n_classes, n_features) bayes estimations of each feature per class, each of element is a dict """
    def __init__(self, alpha=1.0):
        self.alpha_ = 1.0

    def fit(self,X,y):
        n_samples = X.shape[0]
        n_features = X.shape[1]
        n_classes = len(set(y))

        self.class_count_ = np.empty(n_classes)
        self.class_prior_ = np.empty(n_classes)
        self.bayes_estimation_ = np.empty((n_classes,n_features),dtype=dict)

        self.target_unique = np.unique(y)
        for i in range(n_classes):
            dataX_tu = X[y == self.target_unique[i]]
            self.class_prior_[i] = (dataX_tu.shape[0] + self.alpha_) / (float(len(y)) + n_classes * self.alpha_)
            self.class_count_[i] = dataX_tu.shape[0]

            for j in range(n_features):
                feature = dataX_tu[:,j]
                feature_unique = np.unique(feature)
                fp = {}
                for f_item in feature_unique:
                    fp[f_item] = (list(feature).count(f_item) + self.alpha_) / (float(len(feature)) + len(feature_unique) * self.alpha_)
                self.bayes_estimation_[i,j] = fp

        return self

    def __predict_likelihood(self,x):
        if x.ndim == 1:
            x = np.array([x])
        n_samples = x.shape[0]
        n_features = x.shape[1]
        n_classes = len(self.class_count_)

        likelihood = []
        for x_item in x:
            class_p = []
            for i in range(n_classes):
                p = self.class_prior_[i]
                for j in range(n_features):
                    if x_item[j] in self.bayes_estimation_[i,j]:
                        p *= self.bayes_estimation_[i,j][x_item[j]]
                    else:
                        p *= 0
                class_p.append(p)           
            likelihood.append(class_p)
        return np.array(likelihood)


    def predict(self,x):
        likelihood = self.__predict_likelihood(x)
        max_index = np.argmax(likelihood, axis=1)
        return np.array([self.target_unique[i] for i in max_index])

    def predict_proba(self,x):
        likelihood = self.__predict_likelihood(x)
        return np.array([lh / np.sum(lh) for lh in likelihood])
# 測验结果
X = dataSet[:,0:-1]
y = dataSet[:,-1]

mnb = MultinomialNB()
mnb.fit(X,y)
print(mnb.predict(np.array([2,0])))
print(mnb.predict_proba(np.array([2,0])))
[-1]
[[ 0.65116279  0.34883721]]

3.3 Gaussian Naive Bayes

当输入的特征是连续值的时候,我们无法用上面的方法来预计先验概率和条件概率,能够採用高斯模型。

高斯模型如果特征服从高斯分布。
其特征的似然预计例如以下所看到的:

p(xi|y)=12πσ2yexp((xiμy)22σ2y)(12)

当中。
σ2y 是第 i 个特征的方差, μy 是第 i 个特征的均值。
其python代码例如以下:

class GaussianNB:
    """ Attributes ---------- class_prior_ : array, shape (n_classes,) probability of each class. class_count_ : array, shape (n_classes,) number of training samples observed in each class. theta_ : array, shape (n_classes, n_features) mean of each feature per class sigma_ : array, shape (n_classes, n_features) variance of each feature per class """

    def __init__(self):
        pass

    def fit(self, X, y):
        n_samples = X.shape[0]
        n_features = X.shape[1]
        n_classes = len(set(y))

        self.theta_ = np.zeros([n_classes,n_features]) 
        self.sigma_ = np.zeros([n_classes,n_features]) 
        self.class_prior = np.zeros(n_classes)   
        self.class_count = np.zeros(n_classes)   

        self.target_unique = np.unique(y)    
        for i in range(n_classes):
            dataX_tu = X[y == self.target_unique[i]]
            self.class_prior[i] = dataX_tu.shape[0] / float(len(y))
            self.class_count[i] = dataX_tu.shape[0]
            self.theta_[i,:] = np.mean(dataX_tu,axis=0)
            self.sigma_[i,:] = np.var(dataX_tu,axis=0)

        return self

    def __predict_likelihood(self,x):
        if x.ndim == 1:
            x = np.array([x])

        n_samples = x.shape[0]
        likelihood = []
        for x_item in x:
            gaussian = np.exp(-(x_item-self.theta_)**2 / (2 * self.sigma_)) / np.sqrt(2*np.pi*self.sigma_)
            p = np.exp(np.sum(np.log(gaussian),axis=1))
            likelihood.append(self.class_prior * p)
        return np.array(likelihood)

    def predict(self,x):
        likelihood = self.__predict_likelihood(x)
        max_index = np.argmax(likelihood, axis=1)
        return np.array([self.target_unique[i] for i in max_index])

    def predict_proba(self,x):
        likelihood = self.__predict_likelihood(x)
        return np.array([lh / np.sum(lh) for lh in likelihood])
# 測验结果
X = dataSet[:,0:-1]
y = dataSet[:,-1]

gnb = GaussianNB()
gnb.fit(X,y)
print(gnb.predict(np.array([2,0])))
print(gnb.predict_proba(np.array([2,0])))
[-1]
[[ 0.74566865  0.25433135]]

3.4 Bernoulli Naive Bayes

5. Naive Bayes 注意事项

  1. Works only with categorical predictors, numerical predictors must be categorized or binned before use
  2. Works with the assumption of predictor independence, and thus cannot detect or account for relationships between the predictors, unlike a decision tree for example.


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM