根據前篇博文《神經網絡之后向傳播算法》,現在用java實現一個bp神經網絡。矩陣運算采用jblas庫,然后逐漸增加功能,支持並行計算,然后支持輸入向量調整,最后支持L-BFGS學習算法。
上帝說,要有神經網絡,於是,便有了一個神經網絡。上帝還說,神經網絡要有節點,權重,激活函數,輸出函數,目標函數,然后也許還要有一個准確率函數,於是,神經網絡完成了:
public class Net { List<DoubleMatrix> weights = new ArrayList<DoubleMatrix>(); List<DoubleMatrix> bs = new ArrayList<>(); List<ScalarDifferentiableFunction> activations = new ArrayList<>(); CostFunctionFactory costFunc; CostFunctionFactory accuracyFunc; int[] nodesNum; int layersNum; public Net(int[] nodesNum, ScalarDifferentiableFunction[] activations,CostFunctionFactory costFunc) { super(); this.initNet(nodesNum, activations); this.costFunc=costFunc; this.layersNum=nodesNum.length-1; } public Net(int[] nodesNum, ScalarDifferentiableFunction[] activations,CostFunctionFactory costFunc,CostFunctionFactory accuracyFunc) { this(nodesNum,activations,costFunc); this.accuracyFunc=accuracyFunc; } public void resetNet() { this.initNet(nodesNum, (ScalarDifferentiableFunction[]) activations.toArray()); } private void initNet(int[] nodesNum, ScalarDifferentiableFunction[] activations) { assert (nodesNum != null && activations != null && nodesNum.length == activations.length + 1 && nodesNum.length > 1); this.nodesNum = nodesNum; this.weights.clear(); this.bs.clear(); this.activations.clear(); for (int i = 0; i < nodesNum.length - 1; i++) { // 列數==輸入;行數==輸出。 int columns = nodesNum[i]; int rows = nodesNum[i + 1]; double r1 = Math.sqrt(6) / Math.sqrt(rows + columns + 1); //r1=0.001; // W DoubleMatrix weight = DoubleMatrix.rand(rows, columns).muli(2*r1).subi(r1); //weight=DoubleMatrix.ones(rows, columns); weights.add(weight); // b DoubleMatrix b = DoubleMatrix.zeros(rows, 1); bs.add(b); // activations this.activations.add(activations[i]); } } }
上帝造完了神經網絡,去休息了。人說,我要使用神經網絡,我要利用正向傳播計算各層的結果,然后利用反向傳播調整網絡的狀態,最后,我要讓它能告訴我獵物在什么方向,花兒為什么這樣香。
public class Propagation { Net net; public Propagation(Net net) { super(); this.net = net; } // 多個樣本。 public ForwardResult forward(DoubleMatrix input) { ForwardResult result = new ForwardResult(); result.input = input; DoubleMatrix currentResult = input; int index = -1; for (DoubleMatrix weight : net.weights) { index++; DoubleMatrix b = net.bs.get(index); final ScalarDifferentiableFunction activation = net.activations .get(index); currentResult = weight.mmul(currentResult).addColumnVector(b); result.netResult.add(currentResult); // 乘以導數 DoubleMatrix derivative = MatrixUtil.applyNewElements( new ScalarFunction() { @Override public double valueAt(double x) { return activation.derivativeAt(x); } }, currentResult); currentResult = MatrixUtil.applyNewElements(activation, currentResult); result.finalResult.add(currentResult); result.derivativeResult.add(derivative); } result.netResult=null;// 不再需要。 return result; } // 多個樣本梯度平均值。 public BackwardResult backward(DoubleMatrix target, ForwardResult forwardResult) { BackwardResult result = new BackwardResult(); DoubleMatrix cost = DoubleMatrix.zeros(1,target.columns); DoubleMatrix output = forwardResult.finalResult .get(forwardResult.finalResult.size() - 1); DoubleMatrix outputDelta = DoubleMatrix.zeros(output.rows, output.columns); DoubleMatrix outputDerivative = forwardResult.derivativeResult .get(forwardResult.derivativeResult.size() - 1); DoubleMatrix accuracy = null; if (net.accuracyFunc != null) { accuracy = DoubleMatrix.zeros(1,target.columns); } for (int i = 0; i < target.columns; i++) { CostFunction costFunc = net.costFunc.create(target.getColumn(i) .toArray()); cost.put(i, costFunc.valueAt(output.getColumn(i).toArray())); // System.out.println(i); DoubleMatrix column1 = new DoubleMatrix( costFunc.derivativeAt(output.getColumn(i).toArray())); DoubleMatrix column2 = outputDerivative.getColumn(i); outputDelta.putColumn(i, column1.muli(column2)); if (net.accuracyFunc != null) { CostFunction accuracyFunc = net.accuracyFunc.create(target .getColumn(i).toArray()); accuracy.put(i, accuracyFunc.valueAt(output.getColumn(i).toArray())); } } result.deltas.add(outputDelta); result.cost = cost; result.accuracy = accuracy; for (int i = net.layersNum - 1; i >= 0; i--) { DoubleMatrix pdelta = result.deltas.get(result.deltas.size() - 1); // 梯度計算,取所有樣本平均 DoubleMatrix layerInput = i == 0 ? forwardResult.input : forwardResult.finalResult.get(i - 1); DoubleMatrix gradient = pdelta.mmul(layerInput.transpose()).div( target.columns); result.gradients.add(gradient); // 偏置梯度 result.biasGradients.add(pdelta.rowMeans()); // 計算前一層delta,若i=0,delta為輸入層誤差,即input調整梯度,不作平均處理。 DoubleMatrix delta = net.weights.get(i).transpose().mmul(pdelta); if (i > 0) delta = delta.muli(forwardResult.derivativeResult.get(i - 1)); result.deltas.add(delta); } Collections.reverse(result.gradients); Collections.reverse(result.biasGradients); //其它的delta都不需要。 DoubleMatrix inputDeltas=result.deltas.get(result.deltas.size()-1); result.deltas.clear(); result.deltas.add(inputDeltas); return result; } public Net getNet() { return net; } }
上面是一次正向/反向傳播的具體代碼。訓練方式為批量訓練,即所有樣本一起訓練。然而我們可以傳入只有一列的input/target樣本實現adapt方式的串行訓練,也可以把樣本分成很多批傳入實現mini-batch方式的訓練,這,不是Propagation要考慮的事情,它只是忠實的把傳入的數據正向過一遍,反向過一遍,然后把過后的數據原封不動的返回給你。至於傳入什么,以及結果怎么運用,是Trainer和Learner要做的事情。下回分解。