jare用java實現了論文《Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions》中提出的算法——基於半監督的遞歸自動編碼機,用來預測情感分類。詳情可查看論文內容,代碼git地址為:https://github.com/sancha/jrae。
鳥瞰
主函數訓練流程
FineTunableTheta tunedTheta = rae.train(params);// 根據參數和數據訓練神經網絡權重
      tunedTheta.Dump(params.ModelFile);
      System.out.println("RAE trained. The model file is saved in "
          + params.ModelFile);
    // 特征抽取器
      RAEFeatureExtractor fe = new RAEFeatureExtractor(params.EmbeddingSize,
          tunedTheta, params.AlphaCat, params.Beta, params.CatSize,
          params.Dataset.Vocab.size(), rae.f);
    // 獲取訓練數據
      List<LabeledDatum<Double, Integer>> classifierTrainingData = fe
          .extractFeaturesIntoArray(params.Dataset, params.Dataset.Data,
              params.TreeDumpDir);
    // 測試精度
      SoftmaxClassifier<Double, Integer> classifier = new SoftmaxClassifier<Double, Integer>();
      Accuracy TrainAccuracy = classifier.train(classifierTrainingData);
      System.out.println("Train Accuracy :" + TrainAccuracy.toString());
 
        
幾個重要的接口以及實現類
1、Minimizer<T extends DifferentiableFunction>
public interface Minimizer<T extends DifferentiableFunction> {
  /**
   * Attempts to find an unconstrained minimum of the objective
   * <code>function</code> starting at <code>initial</code>, within
   * <code>functionTolerance</code>.
   *
   * @param function          the objective function
   * @param functionTolerance a <code>double</code> value
   * @param initial           a initial feasible point
   * @return Unconstrained minimum of function
   */
  double[] minimize(T function, double functionTolerance, double[] initial);
  double[] minimize(T function, double functionTolerance, double[] initial, int maxIterations);
}
 
        如其所述,該接口用來找到給定目標函數的最小化極值,目標函數必須是處處可微的,並實現DifferentiableFunction接口。functionTolerance是最小誤差,initial是初始點,maxIterations是最大迭代次數。
public interface DifferentiableFunction extends Function {
  double[] derivativeAt(double[] x);
}
public interface Function {
  int dimension();
  double valueAt(double[] x);
}
 
        QNMinimizer類實現了該接口,利用L-BFGS優化算法對目標函數進行優化,下面是算法的注釋:
/** * This code is part of the Stanford NLP Toolkit. * * * An implementation of L-BFGS for Quasi Newton unconstrained minimization. * * The general outline of the algorithm is taken from: <blockquote> <i>Numerical * Optimization</i> (second edition) 2006 Jorge Nocedal and Stephen J. Wright * </blockquote> A variety of different options are available. * * <h3>LINESEARCHES</h3> * * BACKTRACKING: This routine simply starts with a guess for step size of 1. If * the step size doesn't supply a sufficient decrease in the function value the * step is updated through step = 0.1*step. This method is certainly simpler, * but doesn't allow for an increase in step size, and isn't well suited for * Quasi Newton methods. * * MINPACK: This routine is based off of the implementation used in MINPACK. * This routine finds a point satisfying the Wolfe conditions, which state that * a point must have a sufficiently smaller function value, and a gradient of * smaller magnitude. This provides enough to prove theoretically quadratic * convergence. In order to find such a point the linesearch first finds an * interval which must contain a satisfying point, and then progressively * reduces that interval all using cubic or quadratic interpolation. * * * SCALING: L-BFGS allows the initial guess at the hessian to be updated at each * step. Standard BFGS does this by approximating the hessian as a scaled * identity matrix. To use this method set the scaleOpt to SCALAR. A better way * of approximate the hessian is by using a scaling diagonal matrix. The * diagonal can then be updated as more information comes in. This method can be * used by setting scaleOpt to DIAGONAL. * * * CONVERGENCE: Previously convergence was gauged by looking at the average * decrease per step dividing that by the current value and terminating when * that value because smaller than TOL. This method fails when the function * value approaches zero, so two other convergence criteria are used. The first * stores the initial gradient norm |g0|, then terminates when the new gradient * norm, |g| is sufficiently smaller: i.e., |g| < eps*|g0| the second checks * if |g| < eps*max( 1 , |x| ) which is essentially checking to see if the * gradient is numerically zero. * * Each of these convergence criteria can be turned on or off by setting the * flags: <blockquote><code> * private boolean useAveImprovement = true; * private boolean useRelativeNorm = true; * private boolean useNumericalZero = true; * </code></blockquote> * * To use the QNMinimizer first construct it using <blockquote><code> * QNMinimizer qn = new QNMinimizer(mem, true) * </code> * </blockquote> mem - the number of previous estimate vector pairs to store, * generally 15 is plenty. true - this tells the QN to use the MINPACK * linesearch with DIAGONAL scaling. false would lead to the use of the criteria * used in the old QNMinimizer class. */
OK,可以結合我前面文章,了解L-BFGS算法的原理,然后該類實現了這個算法,並且在某些細節上做了一些修改。具體的實現算法先略去不議,日后再說。
2、DifferentiableFunction
DifferentiableFunction定義上面已經給出,對應一個可微的函數。抽象類MemoizedDifferentiableFunction實現了這個接口,封裝了一些通用的代碼:
public abstract class MemoizedDifferentiableFunction implements DifferentiableFunction {
	protected double[] prevQuery, gradient;
	protected double value;
	protected int evalCount;
	
	protected void initPrevQuery()
	{
		prevQuery = new double[ dimension() ];
	}
	
	protected boolean requiresEvaluation(double[] x)
	{
		if(DoubleArrays.equals(x,prevQuery))
			return false;
		
		System.arraycopy(x, 0, prevQuery, 0, x.length);
		evalCount++;	
		return true;
	}
	
	@Override
	public double[] derivativeAt(double[] x){
		if(DoubleArrays.equals(x,prevQuery))
			return gradient;
		valueAt(x);
		return gradient;
	}
}
 
        封裝的通用方法為,保存了上次請求的參數,如果傳入參數已經被請求過,直接返回結果即可;保存了執行請求的次數;實現了求導流程,首先調用valueAt求得當前值$f(x)$,然后返回梯度(導數),valueAt由子類實現,即約定子類在計算$f(x)$的時候順便計算好了$f'(x)$,然后保存到gradient變量中。
兩個子類分別為RAECost和SoftmaxCost。
SoftmaxCost類表示,在給定樣本的情況下,計算出給定權重的誤差,導數指明減小誤差的梯度。對應的是一個2層的網絡,輸入層為features(特征),輸出層為label,並且轉換函數為softmax(能量函數)。
RAECost類表示,在給定樣本的情況下,計算出給定權重的誤差,誤差包括生成遞歸樹的誤差與label分類的誤差只和,導數指明梯度,也是兩者梯度之和。
在調用Minimizer接口進行優化時,傳入的第一個參數即是RAECost對象,優化完畢時即是訓練完畢時。
參考文獻:
http://www.socher.org/index.php/Main/Semi-SupervisedRecursiveAutoencodersForPredictingSentimentDistributions
