scikit-learn,又寫作sklearn,是一個開源的基於python語言的機器學習工具包。它通過NumPy, SciPy和
Matplotlib等python數值計算的庫實現高效的算法應用,並且涵蓋了幾乎所有主流機器學習算法。
http://scikit-learn.org/stable/index.html
安裝必要的包:
pip install numpy pandas matplotlib scikit-learn graphviz scipy jupyter
本例在jupyter里運行,直接復制到jupyter里運行即可。
# -*- coding:utf-8 -*- from sklearn import tree from sklearn.datasets import load_wine from sklearn.model_selection import train_test_split wine = load_wine() print(wine.data.shape) print(wine.target) #如果wine是一張表,應該長這樣: import pandas as pd pd.concat([pd.DataFrame(wine.data),pd.DataFrame(wine.target)],axis=1) print(wine.feature_names) print(wine.target_names) Xtrain, Xtest, Ytrain, Ytest = train_test_split(wine.data,wine.target,test_size=0.3) print(Xtrain.shape) print(Xtest.shape) clf = tree.DecisionTreeClassifier(criterion="entropy") clf = clf.fit(Xtrain, Ytrain) score = clf.score(Xtest, Ytest) #返回預測的准確度 print(score) feature_name = ['酒精','蘋果酸','灰','灰的鹼性','鎂','總酚','類黃酮','非黃烷類酚類','花青素','顏色強度','色調','od280/od315稀釋葡萄酒','脯氨酸'] import graphviz dot_data = tree.export_graphviz(clf ,feature_names= feature_name ,class_names=["琴酒","雪莉","貝爾摩德"] ,filled=True ,rounded=True ) graph = graphviz.Source(dot_data)
graph #直接在jupyter里顯示為圖片
graph.render("tree") #同級目錄下生成tree.pdf文件
運行結果:
(178, 13) [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2] ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline'] ['class_0' 'class_1' 'class_2'] (124, 13) (54, 13) 0.9629629629629629

沒有jupyter的同學看這里:https://www.cnblogs.com/v5captain/p/6688494.html
機器學習不能沒有它,嘿嘿!

