在opencv3.0中,提供了一個ml.cpp的文件,這里面全是機器學習的算法,共提供了這么幾種:
1、正態貝葉斯:normal Bayessian classifier 我已在另外一篇博文中介紹過:在opencv3中實現機器學習之:利用正態貝葉斯分類
2、K最近鄰:k nearest neighbors classifier
3、支持向量機:support vectors machine 請參考我的另外一篇博客:在opencv3中實現機器學習之:利用svm(支持向量機)分類
4、決策樹: decision tree
5、ADA Boost:adaboost
6、梯度提升決策樹:gradient boosted trees
7、隨機森林:random forest
8、人工神經網絡:artificial neural networks
9、EM算法:expectation-maximization
這些算法在任何一本機器學習書本上都可以介紹過,他們大致的分類過程都很相似,主要分為三個環節:
一、收集樣本數據sampleData
二、訓練分類器mode
三、對測試數據testData進行預測
不同的地方就是在opencv中的參數設定,假設訓練數據為trainingDataMat,且已經標注好labelsMat。待測數據為testMat.
1、正態貝葉斯
// 創建貝葉斯分類器 Ptr<NormalBayesClassifier> model=NormalBayesClassifier::create(); // 設置訓練數據 Ptr<TrainData> tData =TrainData::create(trainingDataMat, ROW_SAMPLE, labelsMat); //訓練分類器 model->train(tData); //預測數據 float response = model->predict(testMat);
2、K最近鄰
Ptr<KNearest> knn = KNearest::create(); //創建knn分類器 knn->setDefaultK(K); //設定k值 knn->setIsClassifier(true); // 設置訓練數據 Ptr<TrainData> tData = TrainData::create(trainingDataMat, ROW_SAMPLE, labelsMat); knn->train(tData); float response = knn->predict(testMat);
3、支持向量機
Ptr<SVM> svm = SVM::create(); //創建一個分類器 svm->setType(SVM::C_SVC); //設置svm類型 svm->setKernel(SVM::POLY); //設置核函數; svm->setDegree(0.5); svm->setGamma(1); svm->setCoef0(1); svm->setNu(0.5); svm->setP(0); svm->setTermCriteria(TermCriteria(TermCriteria::MAX_ITER+TermCriteria::EPS, 1000, 0.01)); svm->setC(C); Ptr<TrainData> tData = TrainData::create(trainingDataMat, ROW_SAMPLE, labelsMat); svm->train(tData); float response = svm->predict(testMat);
4、決策樹: decision tree
Ptr<DTrees> dtree = DTrees::create(); //創建分類器 dtree->setMaxDepth(8); //設置最大深度 dtree->setMinSampleCount(2); dtree->setUseSurrogates(false); dtree->setCVFolds(0); //交叉驗證 dtree->setUse1SERule(false); dtree->setTruncatePrunedTree(false); Ptr<TrainData> tData = TrainData::create(trainingDataMat, ROW_SAMPLE, labelsMat); dtree->train(tData); float response = dtree->predict(testMat);
5、ADA Boost:adaboost
Ptr<Boost> boost = Boost::create(); boost->setBoostType(Boost::DISCRETE); boost->setWeakCount(100); boost->setWeightTrimRate(0.95); boost->setMaxDepth(2); boost->setUseSurrogates(false); boost->setPriors(Mat()); Ptr<TrainData> tData = TrainData::create(trainingDataMat, ROW_SAMPLE, labelsMat); boost->train(tData); float response = boost->predict(testMat);
6、梯度提升決策樹:gradient boosted trees
此算法在opencv3.0中被注釋掉了,原因未知,因此此處提供一個老版本的算法。
GBTrees::Params params( GBTrees::DEVIANCE_LOSS, // loss_function_type 100, // weak_count 0.1f, // shrinkage 1.0f, // subsample_portion 2, // max_depth false // use_surrogates ) ); Ptr<TrainData> tData = TrainData::create(trainingDataMat, ROW_SAMPLE, labelsMat); Ptr<GBTrees> gbtrees = StatModel::train<GBTrees>(tData, params); float response = gbtrees->predict(testMat);
7、隨機森林:random forest
Ptr<RTrees> rtrees = RTrees::create(); rtrees->setMaxDepth(4); rtrees->setMinSampleCount(2); rtrees->setRegressionAccuracy(0.f); rtrees->setUseSurrogates(false); rtrees->setMaxCategories(16); rtrees->setPriors(Mat()); rtrees->setCalculateVarImportance(false); rtrees->setActiveVarCount(1); rtrees->setTermCriteria(TermCriteria(TermCriteria::MAX_ITER, 5, 0)); Ptr<TrainData> tData = TrainData::create(trainingDataMat, ROW_SAMPLE, labelsMat); rtrees->train(tData); float response = rtrees->predict(testMat);
8、人工神經網絡:artificial neural networks
Ptr<ANN_MLP> ann = ANN_MLP::create(); ann->setLayerSizes(layer_sizes); ann->setActivationFunction(ANN_MLP::SIGMOID_SYM, 1, 1); ann->setTermCriteria(TermCriteria(TermCriteria::MAX_ITER+TermCriteria::EPS, 300, FLT_EPSILON)); ann->setTrainMethod(ANN_MLP::BACKPROP, 0.001); Ptr<TrainData> tData = TrainData::create(trainingDataMat, ROW_SAMPLE, labelsMat); ann->train(tData); float response = ann->predict(testMat);
9、EM算法:expectation-maximization
EM算法與前面的稍微有點不同,它需要創建很多個model,將trainingDataMat分成很多個modelSamples,每個modelSamples訓練出一個model
訓練核心代碼為:
int nmodels = (int)labelsMat.size(); vector<Ptr<EM> > em_models(nmodels); Mat modelSamples; for( i = 0; i < nmodels; i++ ) { const int componentCount = 3; modelSamples.release(); for (j = 0; j < labelsMat.rows; j++) { if (labelsMat.at<int>(j,0)== i) modelSamples.push_back(trainingDataMat.row(j)); } // learn models if( !modelSamples.empty() ) { Ptr<EM> em = EM::create(); em->setClustersNumber(componentCount); em->setCovarianceMatrixType(EM::COV_MAT_DIAGONAL); em->trainEM(modelSamples, noArray(), noArray(), noArray()); em_models[i] = em; } }
預測:
Mat logLikelihoods(1, nmodels, CV_64FC1, Scalar(-DBL_MAX)); for( i = 0; i < nmodels; i++ ) { if( !em_models[i].empty() ) logLikelihoods.at<double>(i) = em_models[i]->predict2(testMat, noArray())[0]; }
這么多的機器學習算法,在實際用途中照我的理解其實只需要掌握svm算法就可以了。
ANN算法在opencv中也叫多層感知機,因此在訓練的時候,需要分多層。
EM算法需要為每一類創建一個model。
其中一些算法的具體代碼練習:在opencv3中的機器學習算法練習:對OCR進行分類