Cross_validation.train_test_split 中 stratify這個參數的意義是什么？

本文轉載自查看原文 2017-01-29 22:13 9312 機器學習

比單獨使用train_test_split來划分數據更嚴謹

stratify是為了保持split前類的分布。比如有100個數據，80個屬於A類，20個屬於B類。如果train_test_split(... test_size=0.25, stratify = y_all), 那么split之后數據如下：
training: 75個數據，其中60個屬於A類，15個屬於B類。
testing: 25個數據，其中20個屬於A類，5個屬於B類。
用了stratify參數，training集和testing集的類的比例是 A：B= 4：1，等同於split前的比例（80：20）。通常在這種類分布不平衡的情況下會用到stratify。

這個參數sklearn的文檔4中講的不是太清楚

幫助文檔

http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 訓練集測試集划分 train_test_split(X, y, stratify=y）關於train_test_split和cross_val_score交叉檢驗 train_test_split參數含義 sklearn的train_test_split()各函數參數含義解釋（非常全） sklearn.model_selection 的train_test_split方法和參數 sklearn的train_test_split()各函數參數含義解釋（非常全） train_test_split用法 train_test_split()函數 sklearn的train_test_split函數深度學習 | sklearn的train_test_split()各函數參數含義解釋（超級全）