心臟病預測(SVM模型)


題目

Solve the heart disease problem

Here is a small dataset provided by the Cleveland Clinic Foundation for Heart Disease, which are several hundred rows in the CSV. Each row describes a patient, and each column describes an attribute.

Using this information to predict whether a patient has heart disease, which in this dataset is a binary classification task.

Remember, the most important things is preprocessing the data and transform to feature column.

代碼

數據預處理

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

//讀取數據
URL = 'https://storage.googleapis.com/applied-dl/heart.csv'
df = pd.read_csv(URL)

//結合數據集信息, 我們可以得到age,trestbpd,chol,thalach,oldpeak均為Numerical類型,不用處理。 sex,fbs,exang,target均為二分類數值,不用處理。 剩下的cp,restecg,slope,ca,thal均為多分類數值,需要數據預處理。

//拆分屬性的值
a = pd.get_dummies(df['cp'], prefix = "cp")
b = pd.get_dummies(df['restecg'], prefix = "restecg")
c = pd.get_dummies(df['slope'], prefix = "slope")
d = pd.get_dummies(df['ca'], prefix = "ca")
e = pd.get_dummies(df['thal'], prefix = "thal")
df = pd.concat([df, a, b, c, d, e], axis = 1)
df = df.drop(columns = ['cp', 'restecg', 'slope', 'ca', 'thal'])
df.head(5)
//提取XY值
Y = df.target.values
X = df.drop(['target'], axis = 1)

//數據標准化
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

sc = StandardScaler()
sc.fit(X)
X = sc.transform(X)
//拆分為訓練集和測試集
x_train, x_test, y_train, y_test = train_test_split(X, Y)

建立SVM模型

from sklearn.svm import SVC

svm = SVC(random_state = 1)
svm.fit(x_train, y_train)

acc = svm.score(x_test, y_test)*100
print("Test Accuracy of SVM Algorithm: {:.2f}%".format(acc))


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM