朴素貝葉斯中的基本假設
- 訓練數據是由$P\left( {X,Y} \right)$獨立同分布產生的
- 條件獨立假設(當類別確定時特征之間是相互獨立的):\[P\left( {X = x|Y = {c_k}} \right) = P\left( {{X^{\left( 1 \right)}} = {x^{\left( 1 \right)}},{X^{\left( 2 \right)}} = {x^{\left( 2 \right)}}, \ldots ,{X^{\left( n \right)}} = {x^{\left( n \right)}}|Y = {c_k}} \right) = \prod\limits_{j = 1}^n {P\left( {{X^{\left( j \right)}} = {x^{\left( j \right)}}|Y = {c_k}} \right)} \]
算法思想
對於給定的輸入$x$,通過學習得到的模型計算后驗概率分布$P\left( {Y{\rm{ = }}{c_k}|X = x} \right)$,將后驗概率最大的類作為$x$的類,后驗概率根據貝葉斯公式計算:\[P\left( {Y{\rm{ = }}{c_k}|X = x} \right) = \frac{{P\left( {Y = {c_k}} \right)\prod\limits_j {P\left( {{X^{\left( j \right)}} = {x^{\left( j \right)}}|Y = {c_k}} \right)} }}{{\sum\limits_i {P\left( {Y = {c_i}} \right)\prod\limits_j {P\left( {{X^{\left( j \right)}} = {x^{\left( j \right)}}|Y = {c_i}} \right)} } }}\]
朴素貝葉斯分類器可表示為:\[y = \arg {\max _{{c_k}}}P\left( {Y = {c_k}|X = x} \right) = \arg {\max _{{c_k}}}\frac{{P\left( {Y = {c_k}} \right)\prod\limits_j {P\left( {{X^j} = {x^j}|Y = {c_k}} \right)} }}{{\sum\limits_k {P\left( {Y = {c_k}} \right)\prod\limits_j {P\left( {{X^j} = {x^j}|Y = {c_k}} \right)} } }}\]
等價於:\[y = \arg {\max _{{c_k}}}P\left( {Y = {c_k}} \right)\prod\limits_j {P\left( {{X^j} = {x^j}|Y = {c_k}} \right)} \]
朴素貝葉斯法把實例分到后驗概率最大的類中。這等價於損失函數是0-1函數時的期望風險最小化。
參數估計