[Scikit-learn] 2.1 Clustering - Variational Bayesian Gaussian Mixture


最重要的一点是:Bayesian GMM为什么拟合的更好? 

PRML 这段文字做了解释:

 

Ref: http://freemind.pluskid.org/machine-learning/deciding-the-number-of-clusterings/ 

链接中提到了一些其他的无监督聚类。

 

From: http://scikit-learn.org/stable/modules/mixture.html#variational-bayesian-gaussian-mixture

Due to its Bayesian nature, the variational algorithm needs more hyper- parameters than expectation-maximization,

the most important of these being the concentration parameter weight_concentration_prior.

  • Specifying a low value for the concentration prior will make the model put most of the weight on few components set the remaining components weights very close to zero.
  • High values of the concentration prior will allow a larger number of components to be active in the mixture.

 

The examples below compare Gaussian mixture models with a fixed number of components, to the variational Gaussian mixture models with a Dirichlet process prior. Here, a classical Gaussian mixture is fitted with 5 components on a dataset composed of 2 clusters.

We can see that the variational Gaussian mixture with a Dirichlet process prior is able to limit itself to only 2 components whereas the Gaussian mixture fits the data with a fixed number of components that has to be set a priori by the user. In this case the user has selected n_components=5 which does not match the true generative distribution of this toy dataset. Note that with very little observations, the variational Gaussian mixture models with a Dirichlet process prior can take a conservative stand, and fit only one component.

 

Dirichlet distribution 具有自动的特征选取的作用,找出起主要作用的components。

5 for GMM [ 0.1258077   0.23638361  0.23330578  0.26361639  0.14088652] 5 for Bayesian GMM [ 0.001019    0.00101796  0.49948856  0.47955123  0.01892325]

 

问题来了:

为什么dirichlet会让三个的权重偏小,而GMM却没有,难道是收敛速度不同?

应该跟速度没有关系。加了先验后,后验变为了dirichlet,那么参数的估计过程中便具备了dirichlet的良好性质。

原始数据

Our data set will be the classic Old Faithful dataset

plt.scatter(data['eruptions'], data['waiting'], alpha=0.5); plt.xlabel('eruptions'); plt.ylabel('waiting');

 如何拟合?

from sklearn.mixture import BayesianGaussianMixture mixture_model = BayesianGaussianMixture( n_components=10, random_state=5,  # control the pseudo-random initialization
    weight_concentration_prior_type='dirichlet_distribution', weight_concentration_prior=1.0,  # parameter of the Dirichlet component prior
    max_iter=200,  # choose this to be big in case it takes a long time to fit
) mixture_model.fit(data);

 

Ref: http://scikit-learn.org/stable/auto_examples/mixture/plot_concentration_prior.html

可直接调用该程式:

 

plot_ellipses(ax1, model.weights_, model.means_, model.covariances_) def plot_ellipses(ax, weights, means, covars): """ Given a list of mixture component weights, means, and covariances, plot ellipses to show the orientation and scale of the Gaussian mixture dispersal. """
    for n in range(means.shape[0]): eig_vals, eig_vecs = (covars[n])
unit_eig_vec
= eig_vecs[0] / np.linalg.norm(eig_vecs[0]) angle = np.arctan2(unit_eig_vec[1], unit_eig_vec[0]) # Ellipse needs degrees angle = 180 * angle / np.pi # eigenvector normalization eig_vals = 2 * np.sqrt(2) * np.sqrt(eig_vals) ell = mpl.patches.Ellipse( means[n], eig_vals[0], eig_vals[1], 180 + angle, edgecolor=None,) ell2 = mpl.patches.Ellipse( means[n], eig_vals[0], eig_vals[1], 180 + angle, edgecolor='black', fill=False, linewidth=1,) ell.set_clip_box(ax.bbox) ell2.set_clip_box(ax.bbox) ell.set_alpha(weights[n]) ell.set_facecolor('#56B4E9') ax.add_artist(ell) ax.add_artist(ell2)

 

plot_results( mixture_model, data['eruptions'], data['waiting'], 'weight_concentration_prior={}'.format(1.0)) def plot_results(model, x, y, title, plot_title=False): fig, ax = plt.subplots(3, 1, sharex=False)
# 上面是ax没用,以下重新定义了ax1 ax2 gs
= gridspec.GridSpec(3, 1)  # 自定义子图位置 ax1 = plt.subplot(gs[0:2, 0])
# 以下四行是固定套路 ax1.set_title(title) ax1.scatter(x, y, s
=5, marker='o', alpha=0.8) ax1.set_xticks(()) ax1.set_yticks(())
n_components
= model.get_params()['n_components'] plot_ellipses(ax1, model.weights_, model.means_, model.covariances_)
# ax1:画椭圆
# ax2:画权重 ax2
= plt.subplot(gs[2, 0]) ax2.get_xaxis().set_tick_params(direction='out') ax2.yaxis.grid(True, alpha=0.7) for k, w in enumerate(model.weights_): ax2.bar(k, w, width=0.9, color='#56B4E9', zorder=3, align='center', edgecolor='black') ax2.text(k, w + 0.007, "%.1f%%" % (w * 100.), horizontalalignment='center') ax2.set_xlim(-.6, n_components - .4) ax2.set_ylim(0., 1.1) ax2.tick_params(axis='y', which='both', left='off', right='off', labelleft='off') ax2.tick_params(axis='x', which='both', top='off') if plot_title: ax1.set_ylabel('Estimated Mixtures') ax2.set_ylabel('Weight of each component')

 

查看拟合过程:

lower_bounds = [] mixture_model = BayesianGaussianMixture( n_components =10, covariance_type ='full', max_iter =1, random_state =2, weight_concentration_prior_type ='dirichlet_distribution', warm_start =True, )
# 设置model.fit为只递归一次
for i in range(200): mixture_model.fit(data) if mixture_model.converged_: break lower_bounds.append(mixture_model.lower_bound_) if i%5==0 and i<60: plt.figure(); plot_results( mixture_model, data['eruptions'], data['waiting'], 'EM step={}, lower_bound={}'.format( i, mixture_model.lower_bound_) ); plt.figure(); plt.plot(lower_bounds); plt.gca().set_xlabel('step') plt.gca().set_ylabel('lower bound')

 

 

 

 

Lower bound 逐渐增加。 

 

不同初始,效果对比:

for seed in range(6,11): lower_bounds = [] mixture_model = BayesianGaussianMixture( n_components=10, covariance_type='full', max_iter=1, random_state=seed, weight_concentration_prior_type='dirichlet_distribution', warm_start=True, ) for i in range(200): mixture_model.fit(data) if mixture_model.converged_: break lower_bounds.append(mixture_model.lower_bound_) plt.plot(lower_bounds); plt.gca().set_xlabel('step') plt.gca().set_ylabel('lower bound');

Result: 

 


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM