1、聚類模型
from sklearn.cluster import Kmeans
2、數據集
from sklearn.datasets import load_iris
sklearn標准數據結構
data = [[feature1,feature2,feature3]*nsample]
target = [0,2,,1,2,1,2,0...]
3、特征選擇 用於篩選特征
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
fs = SelectKBest(chi2,k=10)
4、預處理
from sklearn.preprocessing import LabelEncoder, LabelBinarizer
Binarizer
4、模型評估、選擇
from sklearn.model_selection import KFold
5、模型評估
from sklearn import metrics
y_pred = [0,2,1,3]
y_true = [0,1,2,3]
metrics.accuracy_score(y_true, y_pred)
0.5
metrics.accuracy_score(y_true, y_pred,normalize=False)
roc_auc_score(Receiver Operating Characteristics(受試者工作特性曲線,也就是說在不同的閾值下,True Positive Rate和False Positive Rate的變化情況))
auc就是曲線下面積,這個數值越高,則分類器越優秀
https://zhuanlan.zhihu.com/p/100059009
https://www.zhihu.com/question/39840928
6、朴素貝葉斯
sklearn.naive_bayes
7、鄰近算法
sklearn.neighbors
8、sklearn.svm 支持向量機
9、sklearn.tree 決策樹