本文主要內容摘自 易悠 博主的 Pima印第安人數據集上的機器學習-分類算法(根據診斷措施預測糖尿病的發病)
https://blog.csdn.net/yizheyouye/article/details/79791473
在一些地方做了補充說明,便於小白理解。
數據集簡介
該數據集最初來自國家糖尿病/消化/腎臟疾病研究所。
數據集的目標是基於數據集中包含的某些診斷測量來診斷性的預測 患者是否患有糖尿病。
從較大的數據庫中選擇這些實例有幾個約束條件。尤其是,這里的所有患者都是Pima印第安至少21歲的女性。
數據集由多個醫學預測變量和一個目標變量組成Outcome。預測變量包括患者的懷孕次數、BMI、胰島素水平、年齡等。
數據項描述
- Pregnancies:懷孕次數
- Glucose:葡萄糖
- BloodPressure:血壓 (mm Hg)
- SkinThickness:皮層厚度 (mm)
- Insulin:胰島素 2小時血清胰島素(mu U / ml
- BMI:體重指數 (體重/身高)^2
- DiabetesPedigreeFunction:糖尿病譜系功能
- Age:年齡 (歲)
- Outcome:類標變量 (0或1)
1.加載庫
import xgboost as xgb
from sklearn.metrics import accuracy_score
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns # matplotlib的高級API
from sklearn.preprocessing import StandardScaler #導入標准化功能
from sklearn.model_selection import train_test_split #構建二分類算法模型
2.導入數據
pima = pd.read_csv("D:\\xgbtest\\pima-indians-diabetes.csv")
pima.shape # panda的shape形狀屬性,給出對象的尺寸(行數目,列數目)
(768, 9)
# 查看Series或者DataFrame對象的小樣本;顯示的默認元素數量的前五個。可以自定義數量。
pima.head()
|
Pregnancies |
Glucose |
BloodPressure |
SkinThickness |
Insulin |
BMI |
DiabetesPedigreeFunction |
Age |
Outcome |
0 |
6 |
148 |
72 |
35 |
0 |
33.6 |
0.627 |
50 |
1 |
1 |
1 |
85 |
66 |
29 |
0 |
26.6 |
0.351 |
31 |
0 |
2 |
8 |
183 |
64 |
0 |
0 |
23.3 |
0.672 |
32 |
1 |
3 |
1 |
89 |
66 |
23 |
94 |
28.1 |
0.167 |
21 |
0 |
4 |
0 |
137 |
40 |
35 |
168 |
43.1 |
2.288 |
33 |
1 |
# panda的describe描述屬性,展示了每一個字段的
#【count條目統計,mean平均值,std標准值,min最小值,25%,50%中位數,75%,max最大值】
pima.describe()
|
Pregnancies |
Glucose |
BloodPressure |
SkinThickness |
Insulin |
BMI |
DiabetesPedigreeFunction |
Age |
Outcome |
count |
768.000000 |
768.000000 |
768.000000 |
768.000000 |
768.000000 |
768.000000 |
768.000000 |
768.000000 |
768.000000 |
mean |
3.845052 |
120.894531 |
69.105469 |
20.536458 |
79.799479 |
31.992578 |
0.471876 |
33.240885 |
0.348958 |
std |
3.369578 |
31.972618 |
19.355807 |
15.952218 |
115.244002 |
7.884160 |
0.331329 |
11.760232 |
0.476951 |
min |
0.000000 |
0.000000 |
0.000000 |
0.000000 |
0.000000 |
0.000000 |
0.078000 |
21.000000 |
0.000000 |
25% |
1.000000 |
99.000000 |
62.000000 |
0.000000 |
0.000000 |
27.300000 |
0.243750 |
24.000000 |
0.000000 |
50% |
3.000000 |
117.000000 |
72.000000 |
23.000000 |
30.500000 |
32.000000 |
0.372500 |
29.000000 |
0.000000 |
75% |
6.000000 |
140.250000 |
80.000000 |
32.000000 |
127.250000 |
36.600000 |
0.626250 |
41.000000 |
1.000000 |
max |
17.000000 |
199.000000 |
122.000000 |
99.000000 |
846.000000 |
67.100000 |
2.420000 |
81.000000 |
1.000000 |
pima.groupby('Outcome').size() #將某Outcome分組統計
Outcome
0 500
1 268
dtype: int64
3. Data Visualization - 數據可視化
pima.hist(figsize=(16,14)) #查看每個字段的數據分布;figsize的參數顯示的是每個子圖的長和寬
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001F2B0715BC8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000001F2B09BBF88>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000001F2B09FAF08>],
[<matplotlib.axes._subplots.AxesSubplot object at 0x000001F2B0A36048>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000001F2B0A6C108>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000001F2B0AA6208>],
[<matplotlib.axes._subplots.AxesSubplot object at 0x000001F2B0ADD308>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000001F2B0B1DF48>,
<matplotlib.axes._subplots.AxesSubplot object at 0x000001F2B0B25048>]],
dtype=object)

sns.pairplot(pima,hue='Outcome')
#對角線上是各個屬性的直方圖(分布圖),而非對角線上是兩個不同屬性之間的相關圖
# seaborn常用命令
#【1】set_style()是用來設置主題的,Seaborn有5個預設好的主題:darkgrid、whitegrid、dark、white、ticks,默認為darkgrid
#【2】set()通過設置參數可以用來設置背景,調色板等,更加常用
#【3】displot()為hist加強版
#【4】kdeplot()為密度曲線圖
#【5】boxplot()為箱圖
#【6】joinplot()聯合分布圖
#【7】heatmap()熱點圖
#【8】pairplot()多變量圖,可以支持各種類型的變量分析,是特征分析很好用的工具
# hue :針對某一字段進行分類
# kind:用於控制非對角線上的圖的類型,可選"scatter"與"reg"
C:\Anaconda3\lib\site-packages\statsmodels\nonparametric\kde.py:487: RuntimeWarning: invalid value encountered in true_divide
binned = fast_linbin(X, a, b, gridsize) / (delta * nobs)
C:\Anaconda3\lib\site-packages\statsmodels\nonparametric\kdetools.py:34: RuntimeWarning: invalid value encountered in double_scalars
FAC1 = 2*(np.pi*bw/RANGE)**2
<seaborn.axisgrid.PairGrid at 0x1f2b12b5e08>

sns.pairplot(pima)
<seaborn.axisgrid.PairGrid at 0x1f2b46b8908>

pima.plot(kind='box', subplots=True, layout=(3,3), sharex=False,sharey=False, figsize=(16,14))
#pandas.plot作圖:數據分為Series 和 DataFrame兩種類型;現釋義數據為DataFrame的參數
#【0】data:DataFrame
#【1】x:label or position,default None 指數據框列的標簽或位置參數
#【2】y:label or position,default None 指數據框列的標簽或位置參數
#【3】kind:str(line折線圖、bar條形圖、barh橫向條形圖、hist柱狀圖、
# box箱線圖、kde Kernel的密度估計圖,主要對柱狀圖添加Kernel概率密度線、
# density same as “kde”、area區域圖、pie餅圖、scatter散點圖、hexbin)
#【4】subplots:boolean,default False,為每一列單獨畫一個子圖
#【5】sharex:boolean,default True if ax is None else False
#【6】sharey:boolean,default False
#【7】loglog:boolean,default False,x軸/y軸同時使用log刻度
Pregnancies AxesSubplot(0.125,0.657941;0.227941x0.222059)
Glucose AxesSubplot(0.398529,0.657941;0.227941x0.222059)
BloodPressure AxesSubplot(0.672059,0.657941;0.227941x0.222059)
SkinThickness AxesSubplot(0.125,0.391471;0.227941x0.222059)
Insulin AxesSubplot(0.398529,0.391471;0.227941x0.222059)
BMI AxesSubplot(0.672059,0.391471;0.227941x0.222059)
DiabetesPedigreeFunction AxesSubplot(0.125,0.125;0.227941x0.222059)
Age AxesSubplot(0.398529,0.125;0.227941x0.222059)
Outcome AxesSubplot(0.672059,0.125;0.227941x0.222059)
dtype: object

column_x = pima.columns[0:len(pima.columns) - 1] # 選擇特征列,去掉目標列(Outcome)
column_x
Index(['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
'BMI', 'DiabetesPedigreeFunction', 'Age'],
dtype='object')
corr = pima[pima.columns].corr() # 計算變量的相關系數,得到一個N * N的矩陣
corr
|
Pregnancies |
Glucose |
BloodPressure |
SkinThickness |
Insulin |
BMI |
DiabetesPedigreeFunction |
Age |
Outcome |
Pregnancies |
1.000000 |
0.129459 |
0.141282 |
-0.081672 |
-0.073535 |
0.017683 |
-0.033523 |
0.544341 |
0.221898 |
Glucose |
0.129459 |
1.000000 |
0.152590 |
0.057328 |
0.331357 |
0.221071 |
0.137337 |
0.263514 |
0.466581 |
BloodPressure |
0.141282 |
0.152590 |
1.000000 |
0.207371 |
0.088933 |
0.281805 |
0.041265 |
0.239528 |
0.065068 |
SkinThickness |
-0.081672 |
0.057328 |
0.207371 |
1.000000 |
0.436783 |
0.392573 |
0.183928 |
-0.113970 |
0.074752 |
Insulin |
-0.073535 |
0.331357 |
0.088933 |
0.436783 |
1.000000 |
0.197859 |
0.185071 |
-0.042163 |
0.130548 |
BMI |
0.017683 |
0.221071 |
0.281805 |
0.392573 |
0.197859 |
1.000000 |
0.140647 |
0.036242 |
0.292695 |
DiabetesPedigreeFunction |
-0.033523 |
0.137337 |
0.041265 |
0.183928 |
0.185071 |
0.140647 |
1.000000 |
0.033561 |
0.173844 |
Age |
0.544341 |
0.263514 |
0.239528 |
-0.113970 |
-0.042163 |
0.036242 |
0.033561 |
1.000000 |
0.238356 |
Outcome |
0.221898 |
0.466581 |
0.065068 |
0.074752 |
0.130548 |
0.292695 |
0.173844 |
0.238356 |
1.000000 |
plt.subplots(figsize=(10,5)) # 可以先試用plt設置畫布的大小,然后在作圖,修改
sns.heatmap(corr,annot = True) # 使用熱度圖可視化這個相關系數矩陣
<matplotlib.axes._subplots.AxesSubplot at 0x1f2bb97ab08>

X=pima.iloc[:,0:8] #選取所有的行,選取前8列,不含Outcome
Y=pima.iloc[:,8] # 選出第9列。設為目標列
select_top_4=SelectKBest(score_func=chi2,k=4) # 通過卡方檢驗選擇4個得分最高的特征
fit = select_top_4.fit(X, Y) # 獲取特征信息和目標值信息
features = fit.transform(X) # 特征轉換
fit.get_support(indices=True).tolist() #得到宣傳的相關性排名前4的列[1,4,5,7]
# 因此,表現最佳的特征是:Glucose-葡萄糖、Insulin-胰島素、BMI指數、Age-年齡
# SelectKBest() 只保留K個最高分的特征
# SelectPercentile() 只保留用戶指定百分比的最高得分的特征
# 使用常見的單變量統計檢驗:假正率SelectFpr,錯誤發現率SelectFdr,或者總體錯誤率SelectFwe
# GenericUnivariateSelect通過結構化策略進行特征選擇,通過超參數搜索估計器進行特征選擇
# SelectKBest()和SelectPercentile()能夠返回特征評價的得分和P值
#
# sklearn.feature_selection.SelectPercentile(score_func=<function f_classif>, percentile=10)
# sklearn.feature_selection.SelectKBest(score_func=<function f_classif>, k=10)
# 其中的參數score_func有以下選項:
#【1】回歸:f_regression:相關系數,計算每個變量與目標變量的相關系數,然后計算出F值和P值
# mutual_info_regression:互信息,互信息度量X和Y共享的信息:
# 它度量知道這兩個變量其中一個,對另一個不確定度減少的程度。
#【2】分類:chi2:卡方檢驗
# f_classif:方差分析,計算方差分析(ANOVA)的F值(組間均方/組內均方);
# mutual_info_classif:互信息,互信息方法可以捕捉任何一種統計依賴,但是作為非參數方法,需要更多的樣本進行准確的估計。
[1, 4, 5, 7]
features[0:5] #新特征列
array([[148. , 0. , 33.6, 50. ],
[ 85. , 0. , 26.6, 31. ],
[183. , 0. , 23.3, 32. ],
[ 89. , 94. , 28.1, 21. ],
[137. , 168. , 43.1, 33. ]])
pima.head()
|
Pregnancies |
Glucose |
BloodPressure |
SkinThickness |
Insulin |
BMI |
DiabetesPedigreeFunction |
Age |
Outcome |
0 |
6 |
148 |
72 |
35 |
0 |
33.6 |
0.627 |
50 |
1 |
1 |
1 |
85 |
66 |
29 |
0 |
26.6 |
0.351 |
31 |
0 |
2 |
8 |
183 |
64 |
0 |
0 |
23.3 |
0.672 |
32 |
1 |
3 |
1 |
89 |
66 |
23 |
94 |
28.1 |
0.167 |
21 |
0 |
4 |
0 |
137 |
40 |
35 |
168 |
43.1 |
2.288 |
33 |
1 |
X_features = pd.DataFrame(data = features, columns=["Glucose","Insulin","BMI","Age"]) # 構造新特征DataFrame
X_features.head()
|
Glucose |
Insulin |
BMI |
Age |
0 |
148.0 |
0.0 |
33.6 |
50.0 |
1 |
85.0 |
0.0 |
26.6 |
31.0 |
2 |
183.0 |
0.0 |
23.3 |
32.0 |
3 |
89.0 |
94.0 |
28.1 |
21.0 |
4 |
137.0 |
168.0 |
43.1 |
33.0 |
5. Standardization - 標准化
它將屬性值更改為 均值為0,標准差為1 的 高斯分布.
當算法期望輸入特征處於高斯分布時,它非常有用
rescaledX = StandardScaler().fit_transform(X_features) # 通過sklearn的preprocessing數據預處理中StandardScaler特征縮放 標准化特征信息
X = pd.DataFrame(data = rescaledX, columns = X_features.columns) # 構建新特征DataFrame
X.head()
|
Glucose |
Insulin |
BMI |
Age |
0 |
0.848324 |
-0.692891 |
0.204013 |
1.425995 |
1 |
-1.123396 |
-0.692891 |
-0.684422 |
-0.190672 |
2 |
1.943724 |
-0.692891 |
-1.103255 |
-0.105584 |
3 |
-0.998208 |
0.123302 |
-0.494043 |
-1.041549 |
4 |
0.504055 |
0.765836 |
1.409746 |
-0.020496 |
6 機器學習 - 構建二分類算法模型
# 切分數據集為:特征訓練集、特征測試集、目標訓練集、目標測試集
X_train,X_test,Y_train,Y_test = train_test_split(X,Y, random_state = 22, test_size = 0.2) #test_size=0.2 表示測試集占20%。
X_train.describe()
|
Glucose |
Insulin |
BMI |
Age |
count |
614.000000 |
614.000000 |
614.000000 |
614.000000 |
mean |
0.027972 |
0.026404 |
-0.004205 |
0.013040 |
std |
1.010934 |
1.044467 |
1.001749 |
1.000775 |
min |
-3.783654 |
-0.692891 |
-4.060474 |
-1.041549 |
25% |
-0.653939 |
-0.692891 |
-0.582887 |
-0.786286 |
50% |
-0.090591 |
-0.380306 |
0.000942 |
-0.360847 |
75% |
0.660541 |
0.435886 |
0.584771 |
0.638934 |
max |
2.444478 |
6.652839 |
3.478529 |
4.063716 |
X_test.describe()
|
Glucose |
Insulin |
BMI |
Age |
count |
154.000000 |
154.000000 |
154.000000 |
154.000000 |
mean |
-0.111523 |
-0.105273 |
0.016766 |
-0.051990 |
std |
0.953583 |
0.796792 |
0.999346 |
1.001728 |
min |
-3.783654 |
-0.692891 |
-4.060474 |
-1.041549 |
25% |
-0.779128 |
-0.692891 |
-0.643173 |
-0.871374 |
50% |
-0.325319 |
-0.692891 |
0.019980 |
-0.445935 |
75% |
0.457109 |
0.303472 |
0.476889 |
0.660206 |
max |
2.381884 |
3.474899 |
4.455807 |
2.787399 |
from sklearn.model_selection import KFold #在樣本量不充足的情況下,將數據集A隨機分為k個包,每次將其中一個包作為測試集,剩下k-1個包作為訓練集進行訓練
from sklearn.model_selection import cross_val_score #交叉驗證
from sklearn.linear_model import LogisticRegression #邏輯回歸
from sklearn.naive_bayes import GaussianNB #朴素貝葉斯
from sklearn.neighbors import KNeighborsClassifier #k近鄰算法
from sklearn.tree import DecisionTreeClassifier # 決策樹
from sklearn.svm import SVC # 支持向量機
#構建模型訓練庫
models = []
models.append(("LR", LogisticRegression())) #邏輯回歸
models.append(("NB", GaussianNB())) # 高斯朴素貝葉斯
models.append(("KNN", KNeighborsClassifier())) #K近鄰分類
models.append(("DT", DecisionTreeClassifier())) #決策樹分類
models.append(("SVM", SVC())) # 支持向量機分類
results = []
names = []
for name, model in models:
kflod = KFold(n_splits=10, random_state=22) #n_splits=10表示划分10等份
cv_result = cross_val_score(model, X_train,Y_train, cv = kflod,scoring="accuracy")
names.append(name)
results.append(cv_result)
for i in range(len(names)):
print(names[i], results[i].mean()) #10次結果的平均值
# print(names[i], results[i])
C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\svm\base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
"avoid this warning.", FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\svm\base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
"avoid this warning.", FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\svm\base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
"avoid this warning.", FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\svm\base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
"avoid this warning.", FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\svm\base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
"avoid this warning.", FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\svm\base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
"avoid this warning.", FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\svm\base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
"avoid this warning.", FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\svm\base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
"avoid this warning.", FutureWarning)
LR 0.7768905341089372
NB 0.7604970914859862
KNN 0.7459280803807509
DT 0.7052353252247487
SVM 0.776890534108937
C:\Anaconda3\lib\site-packages\sklearn\svm\base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
"avoid this warning.", FutureWarning)
C:\Anaconda3\lib\site-packages\sklearn\svm\base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
"avoid this warning.", FutureWarning)
7.基於PCA和網格搜索SVM參數
from sklearn.decomposition import KernelPCA
kpca = KernelPCA(n_components = 2, kernel = 'rbf') #kernel = 'rbf' 核函數為 高斯核函數,n_components:降維后的維數 2維
X_train_pca = kpca.fit_transform(X_train) #fit_transform(trainData)對部分數據先擬合fit,找到該part的整體指標,如均值、方差、最大值最小值等等(根據具體轉換的目的),然后對該trainData進行轉換transform,從而實現數據的標准化、歸一化等等。
X_test_pca = kpca.transform(X_test)
X_train_pca
array([[ 0.15782849, 0.52629779],
[-0.46935192, -0.08705639],
[-0.50419273, -0.10623074],
...,
[-0.15498918, -0.30343083],
[-0.00947015, 0.49025615],
[-0.47013626, 0.33490378]])
X_train_pca[:,0] #取第一列內容
array([ 0.15782849, -0.46935192, -0.50419273, -0.40127659, 0.26693241,
0.12419275, -0.41346825, -0.36722434, 0.47125014, 0.33551338,
-0.14425425, -0.41581747, -0.4735639 , -0.02254679, -0.45565123,
-0.09445771, 0.4963092 , 0.09394252, -0.35635267, 0.55692213,
0.4074359 , 0.32242905, 0.54003767, 0.05277386, 0.56584651,
-0.41917147, -0.48282729, -0.49420424, -0.13460808, 0.08635639,
-0.44432144, -0.34453537, -0.42608779, 0.16995983, -0.21688916,
-0.32545661, 0.58365787, 0.00455868, -0.3981599 , -0.49417953,
0.29006988, 0.49988682, -0.09618903, -0.36887994, 0.33148477,
0.45050849, 0.03685666, 0.50999482, -0.55613464, -0.11064861,
-0.44901275, -0.32664071, -0.39189829, -0.2814305 , -0.1442734 ,
0.49749441, -0.35970399, -0.3572959 , -0.52601848, 0.03529335,
-0.39551508, 0.03865236, 0.59915976, -0.22809752, 0.51332594,
-0.30609807, 0.51198058, 0.11076417, 0.53430825, -0.36677975,
0.21777596, -0.34551103, 0.22279546, -0.4188913 , 0.02508964,
0.21135203, -0.50516928, -0.44368181, 0.32077834, -0.3727761 ,
0.55187406, -0.58415462, 0.32139991, 0.4964167 , -0.47709937,
-0.5224905 , 0.08704304, -0.40145882, 0.54569526, 0.56721082,
0.36499384, 0.54774852, 0.37360731, 0.35630263, -0.10187306,
-0.41775088, 0.41991146, -0.40832194, 0.43300788, -0.34062645,
-0.36980399, 0.53974126, -0.23955354, -0.12968428, -0.21995641,
0.51170576, -0.56572068, 0.53340254, -0.12384844, -0.3059209 ,
-0.4685448 , -0.29425461, 0.54604775, 0.213119 , -0.38405876,
-0.37509016, -0.22567296, -0.36437284, -0.04640558, 0.33138854,
0.35663435, 0.08932352, -0.33219648, 0.52451919, -0.40007001,
-0.11182522, -0.2669243 , -0.26112188, -0.3739665 , -0.16818256,
-0.14003239, -0.50127376, -0.05158452, -0.58429708, 0.30308066,
0.40193365, 0.12180392, -0.27943992, 0.2097314 , -0.33809923,
-0.4154654 , -0.05624215, 0.40132252, -0.13257505, 0.00708937,
-0.44882547, -0.5843274 , -0.20368299, -0.5229242 , -0.00326176,
-0.54328209, 0.40155171, 0.10832281, 0.31207971, -0.24214951,
0.0565288 , -0.44216033, -0.50644715, -0.36451873, 0.42987737,
0.56968443, 0.43786496, 0.52739085, 0.193608 , -0.10352305,
-0.42458143, 0.42725891, -0.33376691, -0.20494971, -0.24292709,
0.31414837, -0.23033971, -0.50656133, 0.19837829, 0.46580287,
-0.05757205, -0.50201635, -0.25916924, -0.36630403, 0.58128691,
0.1223393 , -0.10968261, -0.47043916, 0.09108547, -0.24727705,
0.60882781, 0.59949534, -0.52408212, -0.54259108, -0.17016844,
0.19741921, 0.58034719, -0.37831709, -0.59534482, 0.02777327,
-0.41399686, -0.20113498, 0.02656245, 0.60023316, 0.45621242,
-0.26186572, 0.52700926, -0.41849966, 0.19431189, 0.0141683 ,
-0.21649577, 0.51155667, -0.51172709, 0.45139868, -0.44850723,
0.1730488 , -0.15746702, 0.37691058, 0.12731304, 0.00153422,
-0.1365768 , -0.061836 , 0.18493436, 0.38307653, 0.37361627,
0.02538297, 0.18033288, 0.10333095, 0.59770147, 0.5129855 ,
-0.30249851, -0.1928662 , 0.43265471, 0.41912819, -0.54310834,
-0.03736507, -0.48468491, -0.21456798, -0.4581905 , -0.51050322,
0.15330166, 0.19818347, 0.4448342 , 0.1809717 , 0.41537501,
-0.04410617, 0.01806254, -0.00352837, 0.543371 , -0.09325596,
-0.27082285, 0.38202336, -0.28287678, -0.44058737, -0.23034651,
-0.17395191, 0.27735679, -0.18623012, 0.095736 , -0.43144865,
0.5199746 , 0.0500903 , 0.29754383, 0.36222651, -0.51014581,
0.35523825, 0.56027659, 0.16797281, 0.01657727, -0.46763864,
-0.13162286, -0.29291564, 0.18516032, -0.3282642 , 0.43156921,
0.42141332, 0.45840523, -0.53842473, 0.45833515, -0.46298113,
-0.2762104 , 0.52278872, 0.58528091, -0.47237865, -0.41429148,
0.52765301, 0.00361417, 0.33034937, -0.45320833, -0.48110214,
0.49767189, 0.20455939, -0.36196573, 0.57704621, 0.32731222,
0.12445131, 0.40422293, 0.14599325, -0.04503098, -0.20706374,
-0.02672354, -0.24455487, -0.14336704, 0.51091486, 0.21196257,
-0.45149275, -0.26951234, -0.05245596, 0.53873743, 0.50639104,
0.59906523, 0.54770223, 0.4165228 , 0.4518835 , 0.19217814,
0.31787211, 0.20137439, -0.0223966 , 0.27919473, 0.46500038,
-0.38975532, 0.43811046, -0.26184261, -0.47396822, -0.09742648,
-0.33187913, -0.22128955, 0.56596183, -0.55731609, 0.46916926,
-0.21495016, -0.37951541, 0.4486632 , 0.45074455, 0.32327442,
-0.29679448, 0.42253044, -0.49115011, -0.5163501 , -0.37410279,
0.04901173, -0.42507528, 0.14141326, 0.44813167, -0.38193377,
-0.27738888, -0.02437542, -0.30311613, 0.1621743 , -0.48680593,
0.22435148, 0.36539413, -0.36621682, -0.08902051, 0.16664896,
-0.14839292, 0.14018484, 0.53746322, 0.08305284, -0.40985428,
-0.22447429, -0.33253852, 0.57489904, 0.36956327, -0.40714534,
-0.18685074, 0.39599272, 0.4293772 , -0.50521321, -0.15961066,
-0.05878 , -0.14057274, 0.47163764, -0.11333671, 0.0918718 ,
-0.32572023, 0.38740449, 0.56549787, -0.56522873, 0.39421422,
0.27418438, -0.29978045, 0.23908904, 0.09237329, -0.1999239 ,
-0.2608009 , 0.10880794, 0.00805346, 0.28494833, 0.52237007,
0.26696303, -0.44490786, -0.26561845, 0.09182879, -0.35914137,
-0.59662999, 0.37325405, -0.2397743 , -0.19985249, 0.34437241,
-0.41604639, -0.44849277, 0.11905886, -0.33358733, 0.52381087,
-0.1473541 , -0.34840861, -0.33224258, -0.33522775, 0.07628193,
-0.39328321, -0.18213211, 0.33895529, -0.33728667, 0.52851973,
0.07378421, 0.54697463, 0.49251934, -0.49917468, -0.5761243 ,
0.32579443, -0.12092425, -0.11952292, 0.42721531, -0.26075873,
0.55832368, -0.32794444, 0.11297893, -0.0988719 , -0.53478938,
0.53336084, 0.58013259, 0.20716779, 0.45454849, -0.35308933,
0.21621672, 0.32070478, 0.496234 , -0.47308328, 0.3184419 ,
0.37042305, -0.06745015, -0.3804807 , 0.28595715, 0.2101819 ,
0.58118515, 0.56072524, 0.2624404 , 0.31195425, -0.20519747,
0.43842003, -0.04306946, -0.47155087, 0.4861581 , -0.582231 ,
-0.22463373, -0.55507891, 0.59560945, 0.23139893, -0.32235782,
0.26664671, 0.57775872, 0.55199384, -0.4410409 , -0.2317653 ,
-0.04764131, -0.34151538, -0.06686524, -0.28944432, 0.46809279,
-0.39679147, -0.10071955, -0.29073043, 0.29891923, 0.34272779,
-0.50069975, -0.33939521, 0.25704505, 0.49571086, 0.21412623,
-0.27063927, -0.28340112, -0.23518245, 0.1642747 , -0.39158477,
-0.07227826, -0.31368163, 0.31900904, 0.45258218, -0.05973313,
-0.3644227 , 0.55547131, 0.50798328, -0.37297415, -0.51698609,
-0.24891812, 0.11350438, 0.20442402, 0.12821931, 0.2643943 ,
0.36234052, 0.38649572, -0.30990206, 0.24942172, -0.23136943,
-0.36592941, -0.10593174, -0.25845139, 0.38655315, -0.27945239,
-0.47114641, 0.54764708, 0.46417278, -0.39221998, 0.13137033,
0.33122142, -0.26256907, -0.10406498, -0.21270412, -0.33911864,
-0.09463395, 0.48866821, -0.32254555, -0.15792221, -0.41165714,
-0.17400205, 0.12516324, -0.25468576, -0.21946839, 0.08496918,
-0.27678921, 0.18755047, 0.16694225, -0.30062462, 0.41722112,
-0.09844699, -0.01082282, 0.48379941, 0.43622351, -0.50313433,
-0.42154025, 0.44894987, -0.21782907, -0.00898854, 0.52017038,
0.22512053, -0.40246279, -0.29606225, 0.4283616 , 0.11339505,
0.55860631, -0.40594571, 0.47129184, -0.41754516, 0.51560652,
0.3695105 , -0.14722211, 0.56903989, 0.54327683, 0.1685439 ,
-0.13796426, -0.50277001, 0.23999378, -0.2796792 , -0.20221042,
0.26910982, 0.19856454, 0.15032422, -0.43803257, -0.03381743,
-0.36237946, -0.10807337, 0.45329644, 0.21773041, -0.42900925,
-0.20312804, 0.57167127, 0.22362473, 0.21233526, 0.25292779,
0.37219415, -0.46294468, -0.30054757, 0.52359702, 0.19198427,
0.0032104 , 0.21072021, -0.02578207, -0.25572102, 0.2170908 ,
0.13056545, 0.2382775 , -0.20813187, 0.09714407, -0.40836999,
-0.2056304 , -0.40358922, -0.05418426, -0.3440754 , 0.41572122,
-0.18762282, 0.47760044, 0.0329791 , 0.33532287, -0.28487346,
0.32781594, -0.4124837 , 0.13864395, -0.35140073, 0.35552638,
-0.43875215, -0.32266699, -0.45157135, -0.20838274, 0.06609919,
-0.20684292, -0.15498918, -0.00947015, -0.47013626])
X_test_pca
array([[-0.38756402, 0.11835035],
[ 0.57195862, -0.00960603],
[-0.37208979, 0.37118901],
[-0.48653176, -0.40521703],
[-0.00836192, 0.57145153],
[-0.0371229 , 0.51601362],
[ 0.41524769, -0.26985643],
[ 0.16035134, -0.509276 ],
[-0.06455708, 0.48425048],
[ 0.417378 , 0.09895509],
[ 0.25696726, 0.24449396],
[-0.10953252, 0.46266981],
[ 0.21567721, 0.31361786],
[-0.46069794, -0.10731345],
[ 0.22634618, 0.32017296],
[ 0.09177451, -0.01111798],
[ 0.55779427, -0.27833805],
[-0.32983615, -0.38708337],
[ 0.35151383, -0.19435006],
[ 0.15386501, -0.34689934],
[ 0.06530064, -0.2629399 ],
[-0.1382008 , 0.05274766],
[-0.28722212, -0.10225997],
[-0.53437726, 0.14424805],
[ 0.2871782 , 0.42843125],
[-0.4832349 , -0.38073688],
[ 0.58227365, 0.04043453],
[ 0.4982138 , -0.01061271],
[ 0.46651759, -0.21008781],
[-0.37443921, 0.19483632],
[ 0.39955194, -0.17311801],
[-0.26301881, -0.02602309],
[ 0.06346566, 0.22187656],
[-0.38541979, -0.13770401],
[ 0.51942007, -0.24023275],
[ 0.42395876, 0.12798384],
[-0.378558 , -0.24107124],
[ 0.19614246, -0.46801182],
[ 0.44216902, 0.02510745],
[-0.20202533, -0.13058074],
[-0.52556604, -0.10700433],
[ 0.44027058, 0.19517889],
[ 0.33621733, -0.29840215],
[ 0.44259778, 0.32712833],
[ 0.47149227, 0.06729522],
[-0.49108915, 0.09253045],
[ 0.56447866, -0.1350054 ],
[-0.19860802, 0.11208862],
[-0.43233466, 0.15962413],
[-0.17045928, 0.27839189],
[-0.08138147, 0.11123703],
[ 0.38748197, 0.11682792],
[ 0.54582855, 0.17254849],
[ 0.46745166, -0.31733869],
[-0.11816636, -0.22323349],
[ 0.39547636, 0.21162095],
[ 0.18171187, 0.14952157],
[-0.49355062, -0.06593826],
[ 0.51837149, 0.13365407],
[ 0.54103693, -0.02129088],
[ 0.56555002, -0.12046493],
[ 0.29547191, 0.05335639],
[-0.43270186, -0.2238955 ],
[ 0.44759057, 0.10151972],
[-0.40984195, -0.09966987],
[ 0.40935667, 0.14281673],
[-0.17154063, 0.03769814],
[-0.18172497, -0.04049748],
[ 0.07661264, -0.0745187 ],
[-0.46743925, -0.19603025],
[ 0.06841062, 0.43772878],
[-0.23325572, -0.60097818],
[ 0.41033828, -0.29467055],
[ 0.00523704, 0.19670511],
[-0.34203239, 0.39015085],
[ 0.24074491, -0.04118209],
[-0.29040027, -0.41788402],
[-0.14825355, -0.22782423],
[ 0.32661533, 0.04107025],
[ 0.54606324, -0.12895536],
[-0.33240783, -0.02905826],
[-0.31184038, 0.46931542],
[ 0.03458767, 0.18419404],
[ 0.37012586, -0.3506533 ],
[-0.26072673, -0.5207073 ],
[ 0.50152416, 0.13517485],
[ 0.28486107, 0.38358533],
[-0.14806894, -0.08791958],
[ 0.49337901, -0.26590183],
[ 0.57499148, -0.17668701],
[-0.27259333, 0.02874884],
[-0.19985111, 0.19836834],
[ 0.43386705, 0.32844935],
[ 0.48356329, -0.27092218],
[-0.10101631, -0.45348055],
[ 0.4656486 , -0.10811715],
[ 0.12049105, 0.46852205],
[-0.06816105, 0.51482862],
[ 0.38770775, 0.12720233],
[ 0.05889954, -0.5067662 ],
[-0.03627757, 0.45133473],
[-0.50135124, -0.2910278 ],
[-0.09228197, 0.39094619],
[ 0.27359144, 0.42200387],
[ 0.00161917, 0.35269595],
[-0.45595038, -0.54438011],
[-0.26933285, 0.06712416],
[-0.39361358, -0.5345423 ],
[ 0.28145017, 0.18943846],
[ 0.58180651, -0.17199271],
[ 0.45989177, 0.11010981],
[-0.20153768, 0.33185978],
[-0.23384667, -0.40910141],
[ 0.27801716, 0.25693676],
[ 0.5060829 , -0.3143347 ],
[-0.39905093, -0.45796303],
[ 0.03183599, -0.57270764],
[-0.42243458, -0.39929829],
[-0.32787023, 0.21921064],
[-0.42915034, 0.42582236],
[-0.33014291, 0.43075758],
[-0.12685567, 0.26811292],
[ 0.30233857, 0.32593074],
[ 0.52146275, -0.14945235],
[-0.40246384, -0.25586536],
[ 0.47336247, -0.18459005],
[-0.26643439, 0.55647749],
[ 0.37125515, -0.15022458],
[ 0.05097353, 0.58781148],
[-0.08716005, 0.56506801],
[-0.35775317, 0.16861444],
[-0.32502651, 0.07061118],
[ 0.46050619, -0.29526713],
[ 0.3773691 , -0.27907148],
[ 0.12623396, -0.00265671],
[ 0.12175117, 0.50022133],
[ 0.288411 , 0.35074085],
[ 0.45874774, -0.24512606],
[-0.45283137, -0.33723078],
[ 0.44240252, 0.16423784],
[-0.46737688, -0.13368821],
[ 0.45396852, 0.01291629],
[-0.01304959, 0.37316638],
[-0.10681987, -0.43774545],
[-0.50544311, -0.47831668],
[ 0.36306944, -0.20614549],
[-0.49755205, -0.45345956],
[-0.26539814, -0.41498823],
[-0.43596098, -0.03210072],
[-0.40212388, -0.42622276],
[ 0.58611714, 0.0359619 ],
[ 0.22868244, -0.22886431],
[-0.40853044, -0.52556456],
[-0.35517997, 0.16749218]])
Y_train
738 0
178 0
185 1
647 1
654 0
..
491 0
502 1
358 0
356 1
132 1
Name: Outcome, Length: 614, dtype: int64
plt.figure(figsize=(10,8))
# plt.scatter(X_train_pca[:,0], X_train_pca[:,1],c=Y_train,cmap='plasma')
plt.scatter(X_train_pca[:,0], X_train_pca[:,1],c=Y_train,cmap='plasma') # c=Y_train 對應着兩種顏色,區分點的顏色
plt.xlabel("First principal component")
plt.ylabel("Second principal component")
Text(0, 0.5, 'Second principal component')

# 【2】SVC
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix
classifier = SVC(kernel = 'rbf')
classifier.fit(X_train_pca, Y_train)
C:\Anaconda3\lib\site-packages\sklearn\svm\base.py:193: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
"avoid this warning.", FutureWarning)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False)
# 使用SVC預測生存
y_pred = classifier.predict(X_test_pca)
cm = confusion_matrix(Y_test, y_pred)
# cm
print(classification_report(Y_test, y_pred))
precision recall f1-score support
0 0.74 0.86 0.80 100
1 0.63 0.44 0.52 54
accuracy 0.71 154
macro avg 0.69 0.65 0.66 154
weighted avg 0.70 0.71 0.70 154
# 使用 網格搜索 來提高模型 就是窮舉法 遍歷各種參數的組合形式
from sklearn.model_selection import GridSearchCV
# C越大,即對誤分類的懲罰增大,准確率越高,但泛化能力弱
# gamma:float參數,默認為auto核函數系數,只對'rbf'、 ‘poly' 、 ‘sigmoid'有效。
param_grid = {'C':[0.1, 1, 10, 100], 'gamma':[1, 0.1, 0.01, 0.001]}
grid = GridSearchCV(SVC(),param_grid,refit=True,verbose = 2)
grid.fit(X_train_pca, Y_train)
#顯示窮舉后 最優的參數
print('窮舉后 最優的參數: ',grid.best_estimator_)
# 預測
grid_predictions = grid.predict(X_test_pca)
# 分類報告
print(classification_report(Y_test,grid_predictions))
C:\Anaconda3\lib\site-packages\sklearn\model_selection\_split.py:1978: FutureWarning: The default value of cv will change from 3 to 5 in version 0.22. Specify it explicitly to silence this warning.
warnings.warn(CV_WARNING, FutureWarning)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 1 out of 1 | elapsed: 0.0s remaining: 0.0s
Fitting 3 folds for each of 16 candidates, totalling 48 fits
[CV] C=0.1, gamma=1 ..................................................
[CV] ................................... C=0.1, gamma=1, total= 0.0s
[CV] C=0.1, gamma=1 ..................................................
[CV] ................................... C=0.1, gamma=1, total= 0.0s
[CV] C=0.1, gamma=1 ..................................................
[CV] ................................... C=0.1, gamma=1, total= 0.0s
[CV] C=0.1, gamma=0.1 ................................................
[CV] ................................. C=0.1, gamma=0.1, total= 0.0s
[CV] C=0.1, gamma=0.1 ................................................
[CV] ................................. C=0.1, gamma=0.1, total= 0.0s
[CV] C=0.1, gamma=0.1 ................................................
[CV] ................................. C=0.1, gamma=0.1, total= 0.0s
[CV] C=0.1, gamma=0.01 ...............................................
[CV] ................................ C=0.1, gamma=0.01, total= 0.0s
[CV] C=0.1, gamma=0.01 ...............................................
[CV] ................................ C=0.1, gamma=0.01, total= 0.0s
[CV] C=0.1, gamma=0.01 ...............................................
[CV] ................................ C=0.1, gamma=0.01, total= 0.0s
[CV] C=0.1, gamma=0.001 ..............................................
[CV] ............................... C=0.1, gamma=0.001, total= 0.0s
[CV] C=0.1, gamma=0.001 ..............................................
[CV] ............................... C=0.1, gamma=0.001, total= 0.0s
[CV] C=0.1, gamma=0.001 ..............................................
[CV] ............................... C=0.1, gamma=0.001, total= 0.0s
[CV] C=1, gamma=1 ....................................................
[CV] ..................................... C=1, gamma=1, total= 0.0s
[CV] C=1, gamma=1 ....................................................
[CV] ..................................... C=1, gamma=1, total= 0.0s
[CV] C=1, gamma=1 ....................................................
[CV] ..................................... C=1, gamma=1, total= 0.0s
[CV] C=1, gamma=0.1 ..................................................
[CV] ................................... C=1, gamma=0.1, total= 0.0s
[CV] C=1, gamma=0.1 ..................................................
[CV] ................................... C=1, gamma=0.1, total= 0.0s
[CV] C=1, gamma=0.1 ..................................................
[CV] ................................... C=1, gamma=0.1, total= 0.0s
[CV] C=1, gamma=0.01 .................................................
[CV] .................................. C=1, gamma=0.01, total= 0.0s
[CV] C=1, gamma=0.01 .................................................
[CV] .................................. C=1, gamma=0.01, total= 0.0s
[CV] C=1, gamma=0.01 .................................................
[CV] .................................. C=1, gamma=0.01, total= 0.0s
[CV] C=1, gamma=0.001 ................................................
[CV] ................................. C=1, gamma=0.001, total= 0.0s
[CV] C=1, gamma=0.001 ................................................
[CV] ................................. C=1, gamma=0.001, total= 0.0s
[CV] C=1, gamma=0.001 ................................................
[CV] ................................. C=1, gamma=0.001, total= 0.0s
[CV] C=10, gamma=1 ...................................................
[CV] .................................... C=10, gamma=1, total= 0.0s
[CV] C=10, gamma=1 ...................................................
[CV] .................................... C=10, gamma=1, total= 0.0s
[CV] C=10, gamma=1 ...................................................
[CV] .................................... C=10, gamma=1, total= 0.0s
[CV] C=10, gamma=0.1 .................................................
[CV] .................................. C=10, gamma=0.1, total= 0.0s
[CV] C=10, gamma=0.1 .................................................
[CV] .................................. C=10, gamma=0.1, total= 0.0s
[CV] C=10, gamma=0.1 .................................................
[CV] .................................. C=10, gamma=0.1, total= 0.0s
[CV] C=10, gamma=0.01 ................................................
[CV] ................................. C=10, gamma=0.01, total= 0.0s
[CV] C=10, gamma=0.01 ................................................
[CV] ................................. C=10, gamma=0.01, total= 0.0s
[CV] C=10, gamma=0.01 ................................................
[CV] ................................. C=10, gamma=0.01, total= 0.0s
[CV] C=10, gamma=0.001 ...............................................
[CV] ................................ C=10, gamma=0.001, total= 0.0s
[CV] C=10, gamma=0.001 ...............................................
[CV] ................................ C=10, gamma=0.001, total= 0.0s
[CV] C=10, gamma=0.001 ...............................................
[CV] ................................ C=10, gamma=0.001, total= 0.0s
[CV] C=100, gamma=1 ..................................................
[CV] ................................... C=100, gamma=1, total= 0.0s
[CV] C=100, gamma=1 ..................................................
[CV] ................................... C=100, gamma=1, total= 0.0s
[CV] C=100, gamma=1 ..................................................
[CV] ................................... C=100, gamma=1, total= 0.0s
[CV] C=100, gamma=0.1 ................................................
[CV] ................................. C=100, gamma=0.1, total= 0.0s
[CV] C=100, gamma=0.1 ................................................
[CV] ................................. C=100, gamma=0.1, total= 0.0s
[CV] C=100, gamma=0.1 ................................................
[CV] ................................. C=100, gamma=0.1, total= 0.0s
[CV] C=100, gamma=0.01 ...............................................
[CV] ................................ C=100, gamma=0.01, total= 0.0s
[CV] C=100, gamma=0.01 ...............................................
[CV] ................................ C=100, gamma=0.01, total= 0.0s
[CV] C=100, gamma=0.01 ...............................................
[CV] ................................ C=100, gamma=0.01, total= 0.0s
[CV] C=100, gamma=0.001 ..............................................
[CV] ............................... C=100, gamma=0.001, total= 0.0s
[CV] C=100, gamma=0.001 ..............................................
[CV] ............................... C=100, gamma=0.001, total= 0.0s
[CV] C=100, gamma=0.001 ..............................................
[CV] ............................... C=100, gamma=0.001, total= 0.0s
窮舉后 最優的參數: SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf', max_iter=-1,
probability=False, random_state=None, shrinking=True, tol=0.001,
verbose=False)
precision recall f1-score support
0 0.73 0.89 0.80 100
1 0.66 0.39 0.49 54
accuracy 0.71 154
macro avg 0.69 0.64 0.65 154
weighted avg 0.70 0.71 0.69 154
[Parallel(n_jobs=1)]: Done 48 out of 48 | elapsed: 0.2s finished
ax = sns.boxplot(data = results)
ax.set_xticklabels(names)
[Text(0, 0, 'LR'),
Text(0, 0, 'NB'),
Text(0, 0, 'KNN'),
Text(0, 0, 'DT'),
Text(0, 0, 'SVM')]

8 使用測試數據預測
# 使用邏輯回歸預測
lr = LogisticRegression() # LR模型構建
lr.fit(X_train, Y_train) #
predictions = lr.predict(X_test) # 使用測試值預測
C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
FutureWarning)
print(accuracy_score(Y_test, predictions)) # 打印評估指標(分類准確率)
0.7142857142857143
print(classification_report(Y_test,predictions))
precision recall f1-score support
0 0.73 0.88 0.80 100
1 0.65 0.41 0.50 54
accuracy 0.71 154
macro avg 0.69 0.64 0.65 154
weighted avg 0.70 0.71 0.69 154
conf = confusion_matrix(Y_test, predictions) # 混淆矩陣
label = ["0","1"] #
sns.heatmap(conf, annot = True, xticklabels=label, yticklabels=label)
<matplotlib.axes._subplots.AxesSubplot at 0x1f2bbd44a48>
