以4-fold validation training為例
(1) 給定數據集data和標簽集label
樣本個數為
sampNum = len(data)
(2) 將給定的所有examples分為10組
每個fold個數為
foldNum = sampNum/10
(3) 將給定的所有examples分為10組
參考scikit-learn的3.1節:Cross-validation
1 import np 2 from sklearn import cross_validation 3 # dataset 4 5 data = np.array([[1,3],[2,4],[3.1,3],[4,5],[5.0,0.3],[4.1,3.1]]) 6 label = np.array([0,1,1,1,0,0]) 7 sampNum= len(data) 8 9 # 10-fold (9份為training,1份為validation) 10 kf = KFold(len(data), n_folds=4) 11 iFold = 0 12 for train_index, val_index in kf: 13 iFold = iFold+1 14 X_train, X_val, y_train, y_val = data[train_index], data[val_index], label[train_index], label[val_index] # 這里的X_train,y_train為第iFold個fold的訓練集,X_val,y_val為validation set
給定的數據集如下:
所有樣本的指標集為:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
每個iFold(共4個)的訓練集和validation set的index分別為:
iFold = 0 (訓練集中包含6個examples,validation set 中包含3個examples)
iFold = 1
iFold = 2
iFold = 3
每個iFold的訓練集和validation set分別為:
X_train, X_val, y_train, y_val = data[train_index], data[val_index], label[train_index], label[val_index]