機器學習案例二：缺失時間序列數據填補與ESN(回聲狀態網絡）

本文轉載自查看原文 2020-05-12 18:13 868 機器學習與數據挖掘

　　時間序列數據是一種與時間因素有關系的連續的數據，通常使用傳感器等來獲取，具有極高的應用價值，可以實時記錄被監測設備或人的狀態，同時可以用於預測建模，得到對某事件未來發展的一個期望。

在使用傳感器進行數據采集的過程中，在沒有備用傳感器的情況下，會由於種種原因出現采集到的數據在某個時間段內數據缺失的現象。針對某個時間段內的部分數據缺失需要進行科學的驗證，最重要的是要驗證的是在數據缺失的前后傳感器采集的數據是否發生了質的變化（如果發生則認為缺失數據前或后的數據是可用的，整體不可用）。

　　時間序列數據的填補不像單一缺失值的填補那么輕松，特別是在時間序列具有變化趨勢和明顯的周期波動現象。常用的時間學列填補方法的思路是從前到后填補、從后到前填補和兩端同時開始填補。

　　本例中以某傳感器采集的時間序列數據為基礎，來使用具有遞歸性質的神經網絡來對缺失的數據進行填補。（數據量在1500左右，數據量不是很大）

　　常用的具有遞歸性質的神經網絡有Elman神經網絡和ESN神經網絡（由於本例數據較少，因此沒有使用現在很流行的LSTM神經網絡）。Elman神經網絡的出現時間較早，原理較簡單，這里介紹ESN神經網絡。Jarger在2004年首先提出針對傳統遞歸神經網絡訓練算法改進的新型遞歸神經網絡，即回聲狀態網絡（ESN）。對於BP神經網絡中訓練樣本效率非常低的情況，回聲狀態網絡憑借獨特結構形態和訓練方式有效避免了神經網絡規模無法擴大以及局部最優情況。為了解決傳統神經網絡遇到的收斂慢和局部最小等問題，全新的ESN神經網絡內部構造了儲備池作為中心計算單元的重要結構，最大程度地模仿了生物神經元的構造和計算特征。由於沒有使用梯度下降的學習算法，回聲狀態網絡轉而使用單次訓練算法而非大量重復多次訓練。另外模型中的復雜網絡結構（儲備池）由數量極大的神經元群相互連接，需要事先初始化儲備池神經網絡連接矩陣的權值，這使得ESN較之其他神經網絡具有更好的穩定性。相對於其他神經網絡而言，ESN能夠更好的描述非線性混沌時間序列。

　　ESN的代碼如下：

%% Prepare
clear all; 
disp('Loading data......');    

%% Data input
% Train data
traindata = '';
% Teach data
teachdata ='';

%% Data prepare
train = xlsread(traindata);
teach = xlsread(teachdata);

%% Exercise
tic
InputSequence = train;
OutputSequence = teach;
%% ESN
% 訓練集和測試集
% [Am,An] = size(YA);
% tic
% InputSequence = YA(1:494,:);
% OutputSequence = input2;

% split the data into train and test

tic

train_fraction = 0.7 ; % use 50% in training and 50% in testing
[trainInputSequence, testInputSequence] = split_train_test(InputSequence,train_fraction);
[trainOutputSequence,testOutputSequence] = split_train_test(OutputSequence,train_fraction);

% generate an esn 
nInputUnits = 9; nInternalUnits = 50; nOutputUnits = 1; 
esn = generate_esn(nInputUnits, nInternalUnits, nOutputUnits, ...
    'spectralRadius',0.2,'inputScaling',[0.1;0.1;0.1;0.1;0.1;0.1;0.1;0.1;0.1],'inputShift',[0;0;0;0;0;0;0;0;0], ...
    'teacherScaling',[0.3],'teacherShift',[-0.2],'feedbackScaling', 0, ...
    'type', 'plain_esn'); 

esn.internalWeights = esn.spectralRadius * esn.internalWeights_UnitSR;

% train the ESN
nForgetPoints = 50 ; % discard the first 100 points
[trainedEsn, stateMatrix] = train_esn(trainInputSequence, trainOutputSequence, esn, nForgetPoints) ; 

nForgetPoints = 50 ; 
predictedTrainOutput = test_esn(trainInputSequence, trainedEsn, nForgetPoints);
predictedTestOutput = test_esn(testInputSequence,  trainedEsn, nForgetPoints) ; 

% create input-output plots
nPlotPoints = 60 ; 
nPlotPoints1 = 100 ; 
plot_sequence(trainOutputSequence(nForgetPoints+1:end,:), predictedTrainOutput, nPlotPoints1,...
    'training: teacher sequence (red) vs predicted sequence (blue)');
grid on;
plot_sequence(testOutputSequence(nForgetPoints+1:end,:), predictedTestOutput, nPlotPoints, ...
    'testing: teacher sequence (red) vs predicted sequence (blue)') ; 
grid on;

%compute NRMSE training error
trainError = compute_error(predictedTrainOutput, trainOutputSequence); 
disp(sprintf('train NRMSE = %s', num2str(trainError)))

%compute NRMSE testing error
testError = compute_error(predictedTestOutput, testOutputSequence); 
disp(sprintf('test NRMSE = %s', num2str(testError)))

disp('訓練結束!');
toc

　　ESN神經網絡的特殊結構需要調節的參數有隱含層神經元的個數、儲備池的半徑、輸入信號的縮放比例、輸入信號的偏移、輸出信號的縮放比例和縮放信號的偏移。其中，隱含層的神經元個數對模型的預測精度影響最大，剩余的其他參數中，儲備池的半徑也對預測精度有較大的影響。在進行仿真的時候，神經元的傳遞函數選擇ESN神經網絡中的plain_esn。

　　因此在使用ESN神經網絡的時候，主要需要調整的參數是隱含層的神經元個數和儲備池的半徑，一下是ESN的主函數：

nInputUnits = 9; nInternalUnits = 50; nOutputUnits = 1; 
esn = generate_esn(nInputUnits, nInternalUnits, nOutputUnits, ...
    'spectralRadius',0.2,'inputScaling',[0.1;0.1;0.1;0.1;0.1;0.1;0.1;0.1;0.1],'inputShift',[0;0;0;0;0;0;0;0;0], ...
    'teacherScaling',[0.3],'teacherShift',[-0.2],'feedbackScaling', 0, ...
    'type', 'plain_esn');

需要調節的是：1）nInternalUnits（隱含層的神經元個數）和 2）spectralRadius（儲備池的半徑）。

通過調節參數就可以進行數據的預測填補了。

plain_esn的源碼：

function internalState = plain_esn(totalstate , esn , varargin)

% PLAIN_ESN computes the new internal states of the ESN by using the simple
% esn equations
%
% input arguments:
% totalstate: the previous totalstate, vector of size 
%     (esn.nInternalUnits + esn.nInputUnits + esn.nOutputUnits) x 1
% esn: the ESN structure
%
% output: 
% internalState: the updated internal state, size esn.nInternalUnits x 1
%
% Created April 30, 2006, D. Popovici
% Copyright: Fraunhofer IAIS 2006 / Patent pending%
% Revision 1, June 6, 2006, H.Jaeger
% Revision 2, June 23, 2007, H. Jaeger (include esn.feedbackScaling)


internalState =  feval( esn.reservoirActivationFunction , ...
    [esn.internalWeights , esn.inputWeights , esn.feedbackWeights * diag(esn.feedbackScaling )] * totalstate)  ;   
%%%% add noise to the Esn 
internalState = internalState + esn.noiseLevel * (rand(esn.nInternalUnits,1) - 0.5) ;

還有其他源碼需要去下載資源包(http://bbs.06climate.com/forum.php?mod=viewthread&tid=35933)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 回聲狀態網絡（ESN）基礎教程回聲狀態網絡ESN（Echo State Networks）機器學習（二十二）— 數據缺失處理方法機器學習sklearn（五）：數據處理（二）缺失值處理機器學習中數據缺失的處理及建模方法【機器學習】--時間序列算法從初識到應用數據缺失值、異常值的識別和填補機器學習--標准化和缺失值處理、數據降維基於機器學習的web異常檢測——基於HMM的狀態序列建模，將原始數據轉化為狀態機表示，然后求解概率判斷異常與否時間序列數據如何插補缺失值？