神經網絡可以采用有監督和無監督兩種方式來進行訓練。傳播訓練算法是一種非常有效的有監督訓練算法。6種傳播算法如下:
1·Backpropagation Training
2·Quick Propagation Training (QPROP)
3·Manhattan Update Rule
4·Resilient Propagation Training (RPROP)
5·Scaled Conjugate Gradient (SCG)
6·Levenberg Marquardt (LMA)
1、反向傳播算法(Backpropagation Training)
Backpropagation is one of the oldest training methods for feedforward neural networks. Backpropagation uses two parameters in conjunction with the gradient descent calculated in the previous section. The first parameter is the learning rate which is essentially a percent that determines how directly the gradient descent should be applied to the weight matrix. The gradient is multiplied by the learning rate and then added to the weight matrix. This slowly optimizes the weights to values that will produce a lower error.
One of the problems with the backpropagation algorithm is that the gradient descent algorithm will seek out local minima. These local minima are points of low error, but may not be a global minimum. The second parameter provided to the backpropagation algorithm helps the backpropagation out of local minima. The second parameter is called momentum. Momentum specifies to what degree the previous iteration weight changes should be applied to the current iteration.
The momentum parameter is essentially a percent, just like the learning rate. To use momentum, the backpropagation algorithm must keep track of what changes were applied to the weight matrix from the previous iteration. These changes will be reapplied to the current iteration, except scaled by the momentum parameters. Usually the momentum parameter will be less than one, so the weight changes from the previous training iteration are less significant than the changes calculated for the current iteration. For example, setting the momentum to 0.5 would cause 50% of the previous training iteration's changes to be applied to the weights for the current weight matrix.
總結:最早提出的方法,需要提供學習速率和動量參數
2、曼哈頓跟新規則(Manhattan Update Rule)
One of the problems with the backpropagation training algorithm is the degree to which the weights are changed. The gradient descent can often apply too large of a change to the weight matrix. The Manhattan Update Rule and resilient propagation training algorithms only use the sign of the gradient. The magnitude is discarded. This means it is only important if the gradient is positive, negative or near zero.
For the Manhattan Update Rule, this magnitude is used to determine how to update the weight matrix value. If the magnitude is near zero, then no change is made to the weight value. If the magnitude is positive, then the weight value is increased by a specific amount. If the magnitude is negative, then the weight value is decreased by a specific amount. The amount by which the weight value is changed is defined as a constant. You must provide this constant to the Manhattan Update Rule algorithm, like 0.00001.Manhattan propagation generally requires a small learning rate.
總結:提供學習速率參數,權重矩陣的該變量是一個固定的值,解決了采用梯度下降算法計算得到的權重改變量往往過大的問題。
3、快速傳播算法(Quick Propagation Training ,QPROP)
Quick propagation (QPROP) is another variant of propagation training. Quick propagation is based on Newton's Method, which is a means of finding a function's roots. This can be adapted to the task of minimizing the error of a neural network. Typically QPROP performs much better than backpropagation. The user must provide QPROP with a learning rate parameter. However, there is no momentum parameter as QPROP is typically more tolerant of higher learning rates. A learning rate of 2.0 is generally a good starting point.
總結:基於牛頓法的QPROP需要學習速率參數,不需要動量參數。
4、彈性傳播算法(Resilient Propagation Training ,RPROP)
The resilient propagation training (RPROP) algorithm is often the most efficient training algorithm for supervised feedforward neural networks. One particular advantage to the RPROP algorithm is that it requires no parameter setting before using it. There are no learning rates, momentum values or update constants that need to be determined. This is good because it can be difficult to determine the exact optimal learning rate.
The RPROP algorithms works similar to the Manhattan Update Rule in that only the magnitude of the descent is used. However, rather than using a fixed constant to update the weight values, a much more granular approach is used. These deltas will not remain fixed like in the Manhattan Update Rule or backpropagation algorithm. Rather, these delta values will change as training progresses.
The RPROP algorithm does not keep one global update value, or delta. Rather, individual deltas are kept for every weight matrix value. These deltas are first initialized to a very small number. Every iteration through the RPROP algorithm will update the weight values according to these delta values. However, as previously mentioned, these delta values do not remain fixed. The gradient is used to determine how they should change using the magnitude to determine how the deltas should be modified further. This allows every individual weight matrix value to be individually trained, an advantage not provided by either the backpropagation algorithm or the Manhattan Update Rule.
總結:最有效的有監督前饋神經網絡訓練算法,不需要提供參數
5、量化共軛梯度法(Scaled Conjugate Gradient ,SCG)
Scaled Conjugate Gradient (SCG) is a fast and efficient training. SCG is based on a class of optimization algorithms called Conjugate Gradient Methods (CGM). SCG is not applicable for all data sets. When it is used within its applicability, it is quite efficient. Like RPROP, SCG is at an advantage as there are no parameters that must be set.
總結:不需要參數,但是不適用所有的數據集
6、LM算法(Levenberg Marquardt ,LMA)
The Levenberg Marquardt algorithm (LMA) is a very efficient training method for neural networks. In many cases, LMA will outperform Resilient Propagation. LMA is a hybrid algorithm based on both Newton's Method and gradient descent (backpropagation),integrating the strengths of both. Gradient descent is guaranteed to converge to a local minimum, albeit slowly. GNA is quite fast but often fails to converge. By using a damping factor to interpolate between the two, a hybrid method is created.
總結:最有效的訓練算法,不需要提供參數。LM算法是牛頓法和梯度下降算法相結合的一種混合算法。梯度下降算法保證收斂到局部極小值,但比較緩慢;牛頓法很快,但很容易不收斂。通過使用阻尼因子在兩者之間插值,生成了LM算法,整合了兩者的優勢。
--------參考《Encog3java-user》