室內定位系列(四)——位置指紋法的實現(測試各種機器學習分類器)


位置指紋法中最常用的算法是k最近鄰(kNN)。本文的目的學習一下python機器學習scikit-learn的使用,嘗試了各種常見的機器學習分類器,比較它們在位置指紋法中的定位效果。

導入數據

數據來源說明:http://www.cnblogs.com/rubbninja/p/6118430.html

# 導入數據
import numpy as np
import scipy.io as scio
offline_data = scio.loadmat('offline_data_random.mat')
online_data = scio.loadmat('online_data.mat')
offline_location, offline_rss = offline_data['offline_location'], offline_data['offline_rss']
trace, rss = online_data['trace'][0:1000, :], online_data['rss'][0:1000, :]
del offline_data
del online_data
# 定位准確度定義
def accuracy(predictions, labels):
    return np.mean(np.sqrt(np.sum((predictions - labels)**2, 1)))

knn回歸

# knn回歸
from sklearn import neighbors
knn_reg = neighbors.KNeighborsRegressor(40, weights='uniform', metric='euclidean')
%time knn_reg.fit(offline_rss, offline_location)
%time predictions = knn_reg.predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, "m"
Wall time: 92 ms
Wall time: 182 ms
accuracy:  2.24421479398 m

Logistic regression (邏輯斯蒂回歸)

# 邏輯斯蒂回歸是用來分類的
labels = np.round(offline_location[:, 0]/100.0) * 100 + np.round(offline_location[:, 1]/100.0)
from sklearn.linear_model import LogisticRegressionCV
clf_l2_LR_cv = LogisticRegressionCV(Cs=20, penalty='l2', tol=0.001)
predict_labels = clf_l2_LR.fit(offline_rss, labels).predict(rss)
x = np.floor(predict_labels/100.0)
y = predict_labels - x * 100
predictions = np.column_stack((x, y)) * 100
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'
accuracy:  3.08581348591 m

Support Vector Machine for Regression (支持向量機)

from sklearn import svm
clf_x = svm.SVR(C=1000, gamma=0.01)
clf_y = svm.SVR(C=1000, gamma=0.01)
%time clf_x.fit(offline_rss, offline_location[:, 0])
%time clf_y.fit(offline_rss, offline_location[:, 1])
%time x = clf_x.predict(rss)
%time y = clf_y.predict(rss)
predictions = np.column_stack((x, y))
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, "m"
Wall time: 9min 27s
Wall time: 12min 42s
Wall time: 1.06 s
Wall time: 1.05 s
accuracy:  2.2468400825 m

Support Vector Machine for Classification (支持向量機)

from sklearn import svm
labels = np.round(offline_location[:, 0]/100.0) * 100 + np.round(offline_location[:, 1]/100.0)
clf_svc = svm.SVC(C=1000, tol=0.01, gamma=0.001)
%time clf_svc.fit(offline_rss, labels)
%time predict_labels = clf_svc.predict(rss)
x = np.floor(predict_labels/100.0)
y = predict_labels - x * 100
predictions = np.column_stack((x, y)) * 100
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'
Wall time: 1min 16s
Wall time: 15 s
accuracy:  2.50931890608 m

random forest regressor (隨機森林)

from sklearn.ensemble import RandomForestRegressor
estimator = RandomForestRegressor(n_estimators=150)
%time estimator.fit(offline_rss, offline_location)
%time predictions = estimator.predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'
Wall time: 58.6 s
Wall time: 196 ms
accuracy:  2.20778352008 m

random forest classifier (隨機森林)

from sklearn.ensemble import RandomForestClassifier
labels = np.round(offline_location[:, 0]/100.0) * 100 + np.round(offline_location[:, 1]/100.0)
estimator = RandomForestClassifier(n_estimators=20, max_features=None, max_depth=20) # 內存受限,tree的數量有點少
%time estimator.fit(offline_rss, labels)
%time predict_labels = estimator.predict(rss)
x = np.floor(predict_labels/100.0)
y = predict_labels - x * 100
predictions = np.column_stack((x, y)) * 100
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'
Wall time: 39.6 s
Wall time: 113 ms
accuracy:  2.56860790666 m

Linear Regression (線性回歸)

from sklearn.linear_model import LinearRegression
predictions = LinearRegression().fit(offline_rss, offline_location).predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'
accuracy:  3.83239841667 m

Ridge Regression (嶺回歸)

from sklearn.linear_model import RidgeCV
clf = RidgeCV(alphas=np.logspace(-4, 4, 10))
predictions = clf.fit(offline_rss, offline_location).predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'
accuracy:  3.83255676918 m

Lasso回歸

from sklearn.linear_model import MultiTaskLassoCV
clf = MultiTaskLassoCV(alphas=np.logspace(-4, 4, 10))
predictions = clf.fit(offline_rss, offline_location).predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'
accuracy:  3.83244688001 m

Elastic Net (彈性網回歸)

from sklearn.linear_model import MultiTaskElasticNetCV
clf = MultiTaskElasticNetCV(alphas=np.logspace(-4, 4, 10))
predictions = clf.fit(offline_rss, offline_location).predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, 'm'
accuracy:  3.832486036 m

Bayesian Ridge Regression (貝葉斯嶺回歸)

from sklearn.linear_model import BayesianRidge
from sklearn.multioutput import MultiOutputRegressor
clf = MultiOutputRegressor(BayesianRidge())
predictions = clf.fit(offline_rss, offline_location).predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, "m"
accuracy:  3.83243319129 m

Gradient Boosting for regression (梯度提升)

from sklearn import ensemble
from sklearn.multioutput import MultiOutputRegressor
clf = MultiOutputRegressor(ensemble.GradientBoostingRegressor(n_estimators=100, max_depth=10))
%time clf.fit(offline_rss, offline_location)
%time predictions = clf.predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, "m"
Wall time: 43.4 s
Wall time: 17 ms
accuracy:  2.22100945095 m

Multi-layer Perceptron regressor (神經網絡多層感知器)

from sklearn.neural_network import MLPRegressor
clf = MLPRegressor(hidden_layer_sizes=(100, 100))
%time clf.fit(offline_rss, offline_location)
%time predictions = clf.predict(rss)
acc = accuracy(predictions, trace)
print "accuracy: ", acc/100, "m"
Wall time: 1min 1s
Wall time: 6 ms
accuracy:  2.4517504109 m

總結

上面的幾個線性回歸模型顯然效果太差,這里匯總一下其他的一些回歸模型:

算法 定位精度
knn 2.24m
logistic regression 3.09m
support vector machine 2.25m
random forest 2.21m
Gradient Boosting for regression 2.22m
Multi-layer Perceptron regressor 2.45m

從大致的定位精度上看,KNN、SVM、RF、GBDT這四個模型比較好(上面很多算法並沒有仔細地調參數,這個結果也比較粗略,神經網絡完全不知道怎么去調...)。此外要注意的是,SVM訓練速度慢,調參太麻煩,KNN進行預測時的時間復雜度應該是和訓練數據量成正比的,從定位的實時性上應該不如RF和GBDT。


作者:[rubbninja](http://www.cnblogs.com/rubbninja/) 出處:[http://www.cnblogs.com/rubbninja/](http://www.cnblogs.com/rubbninja/) 關於作者:目前主要研究領域為機器學習與無線定位技術,歡迎討論與指正! 版權聲明:本文版權歸作者和博客園共有,轉載請注明出處。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM