什么是數據挖掘？什么是Weka？

數據挖掘是一門成熟的技術，Weka 是數據挖掘的工具包，是 Waikato Environment for Knowledge Analysis 的首字母縮略詞，我們稱作 Weka。

Exploring the Explorer

Building a classifier 建立分類器

Classify panel分類器面板
J48 一個決策樹分類器

J48 的配置面板改變參數，以及“More”按鈕
-Set minNumObj to 15 to avoid small leaves
-… option: pruned vs unpruned trees
右擊我們之前的運行記錄，得到一個小菜單。單擊“visualize tree”。
Confusion Matrix
weka的percentage of correctly classified instances保留小數點后四位，實際中我們常保留整數。

Using a filter 使用過濾器

在使用分類器之前，預處理數據很重要。
過濾器分為屬性過濾器和實例過濾器。

Use a filter to remove an attribute 使用過濾器刪除一個屬性

方法一：直接選中屬性，左下角remove。
方法二：和選擇classify方法一樣，點擊preprocess標簽下的choose，進入filter
Open weather.nominal.arff (again!)
 Check the filters
– supervised vs unsupervised
– attribute vs instance
 Choose the unsupervised attribute filter Remove
 Check the More information; look at the options
 Set attribute Indices to 3 and click OK 刪除屬性濕度（humidity）：濕度的序號是 3
 Apply the filter
 Recall that you can Save the result
 Press Undo

修改attributeIndex & nominalIndex

Allfilter 和 MultiFilter 用於合並使用多種過濾器。
監督過濾器在過濾時會使用類的值，它們不如不使用類值的無監督過濾器更為廣泛應用。
在選擇過濾器的時候，我們必須考慮是用監督過濾器還是無監督過濾器，用屬性過濾器還是實例過濾器。之后，就是用你的常識在過濾器列表中找到你想要的過濾器。

Remove instances where humidity is high 刪除濕度值為 high 的實例

 Supervised or unsupervised? 無監督
 Attribute or instance? 實例
 Look at them
 Select RemoveWithValues
 Set attributeIndex
 Set nominalIndices
 Apply
 Undo

Filters can be very powerful
Judiciously removing attributes can
– improve performance
– increase comprehensibility

Question 3
Identify one of the attributes that was removed by clicking Undo and then Apply. Now figure out why it was removed.
A The attribute name was too short
B Only one of the attribute’s values actually appears in the dataset
C The attribute only had two possible values
[B]
An attribute that has the same value for all instances in the dataset doesn’t yield any additional information, and Weka therefore deems it to be useless.

Question 4
Open the glass.arff dataset (which was downloaded when you installed Weka). Apply the unsupervised attribute filter Normalize. What is the new range (i.e. minimum and maximum) of the Na attribute?
The Normalize filter scales attributes into the range [0, 1].

Visualizing your data 可視化數據

Open iris.arff
 Bring up Visualize panel
 Click one of the plots; examine some instances
 Set x axis to petalwidth and y axis to petallength
 Click on Class colour to change the colour
 Bars on the right change correspond to attributes: click for x
axis; right-click for y axis
 Jitter slider
 Show Select Instance: Rectangle option
 Submit, Reset, Clear and Save

我們可以在下拉菜單中選擇不同的 x 軸和 y 軸。更簡單的方法是，單擊這些代表不同屬性的小橫條。單擊這里，x 軸就會改變為花萼長；單擊這里，x 軸就會改變為花萼寬；單擊這里，x 軸就會改變為花瓣長；等等。右鍵單擊這里，y 軸就會改變為花萼長。這樣，我們就可以快速地瀏覽這些不同的圖。
抖動（jitter）滑塊可以幫助你區分實際位置特別近的點。
選擇數據集的一部分

Visualizing classification errors 可視化分類結果

 Run J48 (trees>J48)
 Visualize classifier errors (from Results list) 日志區右鍵
 Plot predictedclass against class
 Identify errors shown by confusion matrix

深入了解你的數據，並且建立可視化模型。你可以做各種各樣的事情。你可以清理
你的數據，刪除異常數據。你可以觀察分類誤差。
例如，有一種過濾器可以添加類為一個新的屬性。讓我們去看看。找到這個過濾器，添加一個屬性。這是個監督過濾器，因為它用到了類。添加一個屬性，用過濾
器 AddClassfication。這里，我們打開配置面板，機器學習方案，選擇 J48，將
outputClassification 設置為 True。完成配置。現在應用這個過濾器。它將添加一個新的屬性。完成了。這個新增的屬性是根據 J48 分類的結果。
Weka 的功能非常強大，你可以利用分類器和過濾器做各種各樣的事情。

本次內容引用自：https://www.jianshu.com/p/4e77cf818618

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 USB基本概念介紹 Python基本概念介紹 InfluxDB基本概念和操作 OWL入門（1）-- 基本概念介紹 RSVP協議的基本概念介紹 Linux——基本概念及操作 ES 基本概念和基本操作 Kubernetes-Service介紹(一)-基本概念 apache Storm學習之二-基本概念介紹 RocketMQ基本概念及原理介紹

Weka的基本概念和操作介紹