Python-sklearn包中StratifiedKFold和KFold生成交叉驗證數據集的區別

本文轉載自查看原文 2020-04-15 11:44 806 python/ 機器學習

一、StratifiedKFold及KFold主要區別及函數參數
KFold交叉采樣：將訓練/測試數據集划分n_splits個互斥子集，每次只用其中一個子集當做測試集，剩下的（n_splits-1）作為訓練集，進行n_splits次實驗並得到n_splits個結果。
注：對於不能均等分的數據集，前n_samples%n_spllits子集擁有n_samples//n_spllits+1個樣本，其余子集都只有n_samples//n_spllits個樣本。（例10行數據分3份，只有一份可分4行，其他均為3行）

1 sklearn.model_selection.KFold(n_splits=3,shuffle=False,random_state=None)

n_splits：表示將數據划分幾等份
shuffle：在每次划分時，是否進行洗牌
若為False，其效果相當於random_state為整數(含零)，每次划分的結果相同
若為True，每次划分的結果不一樣，表示經過洗牌，隨機取樣的
random_state：隨機種子數，當設定值(一般為0)后可方便調參，因為每次生成的數據集相同

StratifiedKFold分層采樣，用於交叉驗證：與KFold最大的差異在於，StratifiedKFold方法是根據標簽中不同類別占比來進行拆分數據的。

sklearn.model_selection.StratifiedKFold(n_splits=3,shuffle=False,random_state=None)

　　參數含義同KFold。

二、實例分析兩者差別
首先生成8行數據(含特征和標簽數據)

 1 import numpy as np
 2 from sklearn.model_selection import StratifiedKFold,KFold
 3 
 4 X=np.array([
 5     [1,2,3,4],
 6     [11,12,13,14],
 7     [21,22,23,24],
 8     [31,32,33,34],
 9     [41,42,43,44],
10     [51,52,53,54],
11     [61,62,63,64],
12     [71,72,73,74]
13 ])
14  
15 y=np.array([1,1,0,0,1,1,0,0])

利用KFold方法交叉采樣：按順序分別取第1-2、3-4、5-6和7-8的數據

#按順序分別取第1-2、3-4、5-6和7-8的數據。
kfolder = KFold(n_splits=4,random_state=1)
for train, test in kfolder.split(X,y):
    print('Train: %s | test: %s' % (train, test),'\n')
>>>
Train: [2 3 4 5 6 7] | test: [0 1]
Train: [0 1 4 5 6 7] | test: [2 3]
Train: [0 1 2 3 6 7] | test: [4 5]
Train: [0 1 2 3 4 5] | test: [6 7]

利用StratifiedKFold方法分層采樣：依照標簽的比例來抽取數據，本案例集標簽0和1的比例是1：1，因此在抽取數據時也是按照標簽比例1：1來提取的

 1 #依照標簽的比例來抽取數據，本案例集標簽0和1的比例是1：1
 2 #因此在抽取數據時也是按照標簽比例1：1來提取的
 3 sfolder = StratifiedKFold(n_splits=4,random_state=0)
 4 for train, test in sfolder.split(X,y):
 5     print('Train: %s | test: %s' % (train, test))
 6 >>>
 7 Train: [1 3 4 5 6 7] | test: [0 2]
 8 Train: [0 2 4 5 6 7] | test: [1 3]
 9 Train: [0 1 2 3 5 7] | test: [4 6]
10 Train: [0 1 2 3 4 6] | test: [5 7]

————————————————
版權聲明：本文為CSDN博主「ckSpark」的原創文章，遵循 CC 4.0 BY-SA 版權協議，轉載請附上原文出處鏈接及本聲明。
原文鏈接：https://blog.csdn.net/MsSpark/article/details/84455402

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 StratifiedKFold和KFold的區別（幾種常見的交叉驗證） Python中的sklearn--KFold與StratifiedKFold 機器學習筆記：sklearn交叉驗證之KFold與StratifiedKFold KFold，StratifiedKFold k折交叉切分利用 sklearn 生成交叉特征：數據集划分：交叉驗證 sklearn的K折交叉驗證函數KFold使用 KFold交叉驗證方式 Python——sklearn提供的自帶的數據集 StratifiedKFold與KFold