本科畢業設計涉及用機器學習的方法訓練預測模型,線性回歸、SVM、RF等方法表現均不理想,於是需要用簡單的神經網絡方法做對比實驗。在對NN的優化沒有深入理解的情況下,直接調用了R包提供的接口,在此略作記錄,供以后反思改進。
主要用到了nnet、neuralnet、h2o這幾個包,具體的建模、預測、優化的方法在手冊中均能查到。nnet、neuralnet提供的都是單隱藏層的簡單神經網絡,h2o提供了DNN的方法。
1 library(nnet) 2 data<-read.csv("tomcat_done_1.csv",header=T) 3 4 total_size <-363 5 test_size <- 90 6 7 train=sample(1:dim(data)[1],total_size-test_size) 8 9 train_set<- data[train,] 10 11 test<-data[-train,1:35] 12 13 test_effort<- data[-train,36] 14 15 count <-0 16 17 18 m<-nnet(Effort~.,train_set,size=9,decay=0.015,maxit=10,linout=T,trace=F,MaxNWts=8000) 19 20 preds <- predict(m,test)
1 library(neuralnet) 2 data<-read.csv("tomcat_done_2.csv",header=T) 3 4 total_size <-363 5 test_size <- 90 6 7 train=sample(1:dim(data)[1],total_size-test_size) 8 9 train_set<- data[train,] 10 11 test<-data[-train,1:35] 12 13 test_effort<- data[-train,36] 14 15 count <-0 16 17 18 m<-neuralnet(Effort~CountDeclClass+CountDeclClassMethod+CountDeclClassVariable 19 +CountDeclFunction+CountDeclInstanceMethod+CountDeclInstanceVariable 20 +CountDeclMethod+CountDeclMethodDefault+CountDeclMethodPrivate 21 +CountDeclMethodProtected+CountDeclMethodPublic+CountLine 22 +CountLineBlank+CountLineCode+CountLineCodeDecl+CountLineCodeExe 23 +CountLineComment+CountSemicolon+CountStmt+CountStmtDecl+CountStmtExe 24 +SumCyclomatic+SumCyclomaticModified+SumCyclomaticStrict+SumEssential 25 +MaxCyclomatic+MaxCyclomaticModified+MaxCyclomaticStrict+MaxEssential 26 +MaxNesting+AvgCyclomatic+AvgCyclomaticModified+AvgCyclomaticStrict 27 +AvgEssential+RatioCommentToCode,data = train_set,hidden = 2) 28 29 30 31 preds <- compute(m,test)
數據需要按照模型的格式要求進行預處理再輸入,例如某些包要求label信息映射到[0,1]。多看手冊以及原始論文了解優化方法,切記!