機器學習：eclipse中調用weka的Classifier分類器代碼Demo

本文轉載自查看原文 2016-04-08 11:07 1738 weka/ 機器學習

　　weka中實現了很多機器學習算法，不管實驗室研究或者公司研發，都會或多或少的要使用weka，我的理解是weka是在本地的SparkML，SparkML是分布式的大數據處理機器學習算法，數據量不是很大的時候，使用weka可以模擬出很好的效果，決定使用哪個模型，然后再繼續后續的數據挖掘工作。

　　下面總結一個eclipse中調用weka的Classifier分類器代碼的Demo，通過這個實例，可以進一步跟蹤分類算法的原理，查看weka源碼，下一節中，介紹最簡單的IB1(1NN)算法源碼的具體分析。

　　以下是一個調用各種IB1分類器的過程，下一節介紹下IB1算法的源碼分析。

package mytest;

import java.io.File;

import weka.classifiers.Classifier;
import weka.classifiers.Evaluation;
import weka.classifiers.lazy.IB1;
import weka.core.Instance;
import weka.core.Instances;
import weka.core.converters.ArffLoader;
//import wlsvm.WLSVM;

public class SimpleClassification {//分類器

    public static void main(String[] args) {
        Instances ins = null;
        Classifier cfs = null;
        try {
            File file = new File("E:\\Develop/Weka-3-6/data/contact-lenses.arff");
//            File file = new File("E:\\yuce/data.csv");
            ArffLoader loader = new ArffLoader();
            loader.setFile(file);
            ins = loader.getDataSet();

            // 在使用樣本之前一定要首先設置instances的classIndex，否則在使用instances對象是會拋出異常
            ins.setClassIndex(ins.numAttributes() - 1);
            

            cfs = new IB1();

//            參數設置
//            String[] options=weka.core.Utils.splitOptions("-S 0 -K 2 -D 3 -G 0.0 -R 0.0 -N 0.5 -M 40.0 -C 1.0 -E 0.0010 -P 0.1 -B 0");
//            cfs.setOptions(options);
            
            
            Instance testInst;
            Evaluation testingEvaluation = new Evaluation(ins);
            int length = ins.numInstances();
            for (int i = 0; i < length; i++) {
                testInst = ins.instance(i);
                // 通過這個方法來用每個測試樣本測試分類器的效果
                double predictValue = testingEvaluation.evaluateModelOnceAndRecordPrediction(cfs,
                        testInst);
                
                System.out.println(testInst.classValue()+"--"+predictValue);
            }

            System.out.println("分類器的正確率：" + (1 - testingEvaluation.errorRate()));

        } catch (Exception e) {
            e.printStackTrace();
        }

    }

}

步驟的詳細解釋：

　　1）arff文件中讀取數據集，並解析到數據結構Instances 里。

　　2）創建一個分類器 new IB1();

　　3）設置參數等操作 splitOptions 並且設置決策屬性，一般是最后一個屬性： ins.setClassIndex(ins.numAttributes() - 1);

　　4）創建一個評估器new Evaluation(ins)

　　5）交叉驗證，並輸出測試樣本的分類結果及評價參數。testingEvaluation.evaluateModelOnceAndRecordPrediction(cfs, testInst);

data數據集：

@relation contact-lenses

@attribute age             {young, pre-presbyopic, presbyopic}
@attribute spectacle-prescrip    {myope, hypermetrope}
@attribute astigmatism        {no, yes}
@attribute tear-prod-rate    {reduced, normal}
@attribute contact-lenses    {soft, hard, none}

@data
%
% 24 instances
%
young,myope,no,reduced,none
young,myope,no,normal,soft
young,myope,yes,reduced,none
young,myope,yes,normal,hard
young,hypermetrope,no,reduced,none
young,hypermetrope,no,normal,soft
young,hypermetrope,yes,reduced,none
young,hypermetrope,yes,normal,hard
pre-presbyopic,myope,no,reduced,none
pre-presbyopic,myope,no,normal,soft
pre-presbyopic,myope,yes,reduced,none
pre-presbyopic,myope,yes,normal,hard
pre-presbyopic,hypermetrope,no,reduced,none
pre-presbyopic,hypermetrope,no,normal,soft
pre-presbyopic,hypermetrope,yes,reduced,none
pre-presbyopic,hypermetrope,yes,normal,none
presbyopic,myope,no,reduced,none
presbyopic,myope,no,normal,none
presbyopic,myope,yes,reduced,none
presbyopic,myope,yes,normal,hard
presbyopic,hypermetrope,no,reduced,none
presbyopic,hypermetrope,no,normal,soft
presbyopic,hypermetrope,yes,reduced,none
presbyopic,hypermetrope,yes,normal,none

data詳細分析：

　　1）@relation contact-lenses 是表名

　　2）@attribute age {young, pre-presbyopic, presbyopic} 是屬性名和屬性類型

　　3）@data 是數據集，一個數組的形式。

若data是cvs的格式，weka也支持，最好使用weka的tools工具轉化為arff格式的數據集。

輸出結果為：

轉置請注明出處：http://www.cnblogs.com/rongyux/

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 機器學習中如何選擇分類器機器學習中的常用分類器機器學習：weka中添加自己的分類和聚類算法機器學習——朴素貝葉斯分類器機器學習：基於關聯規則的多標簽分類器 Python機器學習筆記(1)——貝葉斯分類器—MultinomialNB 機器學習系列-最近鄰分類器 Python機器學習(5)——朴素貝葉斯分類器機器學習sklearn分類器算法機器學習-分類器-級聯分類器訓練（Train CascadeClassifier ）