受限玻爾茲曼機（RBM）

本文轉載自查看原文 2014-03-29 20:53 2481

1.基於能量的模型(Energy-Based Models,EBM)

基於能量的模型（EBM）把我們所關心變量的各種組合和一個標量能量聯系在一起。我們訓練模型的過程就是不斷改變標量能量的過程，因此就有了數學上期望的意義。比如，如果一個變量組合被認為是合理的，它同時也具有較小的能量。基於能量的概率模型通過能量函數來定義概率分布：

（1）

其中，正則化因子Z被稱為配分函數：

EBM可以通過對原始數據的負對數似然函數來運用梯度下降來完成訓練。我們的過程也可以分為兩步：1定義對數似然函數；2.定義損失函數。

對數似然函數：

損失函數就是負對數似然函數:

2.含有隱含層的EBM

在許多情況下，我們無法觀察到樣品的所有參數；或者有時候為了提高系統的表達能力，我們希望引入一些不可見參數。因此我們把樣品的所有參數分為兩部分：可見的x部分和不可見的h部分。

在這種情況下，x的概率可以表達為邊緣概率的方式：

為了讓形式上和式（1）統一，我們引入自由能量的概念：

這樣我們就可以把概率寫為

這樣負對數似然函數梯度可以寫成下面很有趣的形式：

上面的梯度可以分為正負兩部分，正的部分可以通過減小自由能量來增加訓練數據的概率，而負的部分可以降低由模型生成的樣品的可能性。

用解析的方法求梯度通常是非常困難的，因為需要計算。

為了便於計算，我們要做的第一步是用確定數量的樣品來進行估計，用來估計負梯度的樣品叫做負粒子，梯度可以寫成

在這里我們理想的認為N中的x取樣過程是滿足概率P的。

通過上面的公式，整個運算過程基本上變的可行，唯一的問題是如何知道負粒子N，

受限玻爾茲曼機（RBM）

RBM的能量函數定義為：

其中，W是連接權重，b和c分別是可見層和隱含層的偏置量。

自由能量公式就可以寫為：

由於RBM元素之間的獨立性：

二進制的RBM

自由能量可以進一步簡化為：

用二進制單元簡化公式

RBM中的取樣

取樣可通過收斂Markov chain完成，同時用Gibbs采樣進行單步操作。

對一個N個自由變量組成的樣品進行Gibbs采樣實際上通過計算每一個來完成。

用圖可以描述為

這個過程是相當耗時的。必須想辦法提高效率。

CD-K

CD采用兩種技巧提高速度：

合適的初始化。

k步之后停止。通常k=1。

實現

RBM類的建立

class RBM(object):
  """Restricted Boltzmann Machine (RBM) """
  def __init__(self, input=None, n_visible=784, n_hidden=500,
               W=None, hbias=None, vbias=None, numpy_rng=None,
               theano_rng=None):
      """
      RBM constructor. Defines the parameters of the model along with
      basic operations for inferring hidden from visible (and vice-versa),
      as well as for performing CD updates.

      :param input: None for standalone RBMs or symbolic variable if RBM is
      part of a larger graph.

      :param n_visible: number of visible units

      :param n_hidden: number of hidden units

      :param W: None for standalone RBMs or symbolic variable pointing to a
      shared weight matrix in case RBM is part of a DBN network; in a DBN,
      the weights are shared between RBMs and layers of a MLP

      :param hbias: None for standalone RBMs or symbolic variable pointing
      to a shared hidden units bias vector in case RBM is part of a
      different network

      :param vbias: None for standalone RBMs or a symbolic variable
      pointing to a shared visible units bias
      """

      self.n_visible = n_visible
      self.n_hidden = n_hidden


      if numpy_rng is None:
          # create a number generator
          numpy_rng = numpy.random.RandomState(1234)

      if theano_rng is None:
          theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))

      if W is None :
         # W is initialized with `initial_W` which is uniformely sampled
         # from -4.*sqrt(6./(n_visible+n_hidden)) and 4.*sqrt(6./(n_hidden+n_visible))
         # the output of uniform if converted using asarray to dtype
         # theano.config.floatX so that the code is runable on GPU
         initial_W = numpy.asarray(numpy.random.uniform(
                   low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                   high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
                   size=(n_visible, n_hidden)),
                   dtype=theano.config.floatX)
         # theano shared variables for weights and biases
         W = theano.shared(value=initial_W, name='W')

      if hbias is None :
         # create shared variable for hidden units bias
         hbias = theano.shared(value=numpy.zeros(n_hidden,
                             dtype=theano.config.floatX), name='hbias')

      if vbias is None :
          # create shared variable for visible units bias
          vbias = theano.shared(value =numpy.zeros(n_visible,
                              dtype = theano.config.floatX),name='vbias')


      # initialize input layer for standalone RBM or layer0 of DBN
      self.input = input if input else T.dmatrix('input')

      self.W = W
      self.hbias = hbias
      self.vbias = vbias
      self.theano_rng = theano_rng
      # **** WARNING: It is not a good idea to put things in this list
      # other than shared variables created in this function.
      self.params = [self.W, self.hbias, self.vbias]

下一步是建立函數來完成（7）和（8）

def propup(self, vis):
    ''' This function propagates the visible units activation upwards to
    the hidden units

    Note that we return also the pre_sigmoid_activation of the layer. As
    it will turn out later, due to how Theano deals with optimization and
    stability this symbolic variable will be needed to write down a more
    stable graph (see details in the reconstruction cost function)
    '''
    pre_sigmoid_activation = T.dot(vis, self.W) + self.hbias
    return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]

def sample_h_given_v(self, v0_sample):
    ''' This function infers state of hidden units given visible units '''
    # compute the activation of the hidden units given a sample of the visibles
    pre_sigmoid_h1, h1_mean = self.propup(v0_sample)
    # get a sample of the hiddens given their activation
    # Note that theano_rng.binomial returns a symbolic sample of dtype
    # int64 by default. If we want to keep our computations in floatX
    # for the GPU we need to specify to return the dtype floatX
    h1_sample = self.theano_rng.binomial(size=h1_mean.shape, n=1, p=h1_mean,
                                         dtype=theano.config.floatX)
    return [pre_sigmoid_h1, h1_mean, h1_sample]

def propdown(self, hid):
    '''This function propagates the hidden units activation downwards to
    the visible units

    Note that we return also the pre_sigmoid_activation of the layer. As
    it will turn out later, due to how Theano deals with optimization and
    stability this symbolic variable will be needed to write down a more
    stable graph (see details in the reconstruction cost function)
    '''
    pre_sigmoid_activation = T.dot(hid, self.W.T) + self.vbias
    return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]

def sample_v_given_h(self, h0_sample):
    ''' This function infers state of visible units given hidden units '''
    # compute the activation of the visible given the hidden sample
    pre_sigmoid_v1, v1_mean = self.propdown(h0_sample)
    # get a sample of the visible given their activation
    # Note that theano_rng.binomial returns a symbolic sample of dtype
    # int64 by default. If we want to keep our computations in floatX
    # for the GPU we need to specify to return the dtype floatX
    v1_sample = self.theano_rng.binomial(size=v1_mean.shape,n=1, p=v1_mean,
                                         dtype=theano.config.floatX)
    return [pre_sigmoid_v1, v1_mean, v1_sample]

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 受限玻爾茲曼機（RBM）基於受限玻爾茲曼機(RBM)的協同過濾受限玻爾茲曼機（RBM, Restricted Boltzmann machines）和深度信念網絡（DBN, Deep Belief Networks）深度學習（七）：玻爾茲曼機、受限玻爾茲曼機、深度信念網絡限制玻爾茲曼機（Restricted Boltzmann Machine）RBM 受限玻爾茲曼機(Restricted Boltzmann Machine)分析玻爾茲曼機及其相關模型 Boltzmann Machine 玻爾茲曼機入門六.隨機神經網絡Boltzmann（玻爾茲曼機）最大熵與玻爾茲曼分布