使用Sklearn-train_test_split 划分數據集

本文轉載自查看原文 2018-01-24 16:38 10974 機器學習實驗

使用sklearn.model_selection.train_test_split可以在數據集上隨機划分出一定比例的訓練集和測試集

1.使用形式為：

1 from sklearn.model_selection import train_test_split 
2 X_train, X_test, y_train, y_test = train_test_split(train_data,train_target,test_size=0.2, random_state=0)

2.參數解釋：

train_data：樣本特征集

train_target：樣本的標簽集

test_size：樣本占比，測試集占數據集的比重，如果是整數的話就是樣本的數量

random_state：是隨機數的種子。在同一份數據集上，相同的種子產生相同的結果，不同的種子產生不同的划分結果

X_train,y_train:構成了訓練集

X_test,y_test：構成了測試集

3.舉例：

生成一個包含100個樣本的數據集，隨機換分出20%為測試集

 1 #py36
 2 #!/usr/bin/env python
 3 # -*- coding: utf-8 -*-
 4 
 5 #from sklearn.cross_validation import train_test_split
 6 from sklearn.model_selection import train_test_split 
 7 
 8 # 生成100條數據：100個2維的特征向量，對應100個標簽
 9 X = [["feature ","one "]] * 50 + [["feature ","two "]] * 50
10 y = [1] * 50 + [2] * 50
11 
12 # 隨機抽取20%的測試集
13 X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=1)
14 print ("train:",len(X_train), "test:",len(X_test))
15 
16 # 查看被划分出的測試集
17 for i in range(len(X_test)):
18     print ("".join(X_test[i]), y_test[i])
19 
20 '''
21 train: 80 test: 20
22 feature two  2
23 feature two  2
24 feature one  1
25 feature two  2
26 feature two  2
27 feature one  1
28 feature one  1
29 feature two  2
30 feature two  2
31 feature two  2
32 feature two  2
33 feature one  1
34 feature two  2
35 feature two  2
36 feature two  2
37 feature one  1
38 feature one  1
39 feature one  1
40 feature two  2
41 feature one  1
42 '''

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Sklearn-train_test_split隨機划分訓練集和測試集 sklearn.model_selection.train_test_split划分訓練數據集 sklearn 划分數據集。 sklearn之划分數據集機器學習sklearn（四）：數據處理（一）數據集拆分（一）train_test_split sklearn.model_selection.train_test_split隨機划分訓練集和測試集 sklearn中的train_test_split （隨機划分訓練集和測試集）數據集划分——train set, validate set and test set 使用python划分數據集 sklearn中，數據集划分函數 StratifiedShuffleSplit.split() 使用踩坑