本編博客繼續分享簡單的機器學習的R語言實現。
今天是關於簡單的線性回歸方程問題的優化問題
常用方法,我們會考慮隨機梯度遞降,好處是,我們不需要遍歷數據集中的所有元素,這樣可以大幅度的減少運算量。
具體的算法參考下面:
首先我們先定義我們需要的參數的Notation
上述算法中,為了避免過擬合,我們采用了L2的正則化,在更新步驟中,我們會發現,這個正則項目,對參數更新的影響
下面是代碼部分:
## Load Library library(ggplot2) library(reshape2) library(mvtnorm) ## Function for reading the data read_data <- function(fname, sc) { data <- read.csv(file=fname,head=TRUE,sep=",") nr = dim(data)[1] nc = dim(data)[2] x = data[1:nr,1:(nc-1)] y = data[1:nr,nc] if (isTRUE(sc)) { x = scale(x) ## Scale x y = scale(y) ## Scale y } return (list("x" = x, "y" = y)) }
我們定義了一個讀取數據的方程,這里,我們會把數據集給scale一下,可以保證進一步提高運算速度
## Matrix Product Function predict_func <- function(Phi, w){ return(Phi%*%w) } ## Function to compute the cost function train_obj_func <- function (Phi, w, label, lambda){ # Cost funtion including the L2 norm regulaztion return(.5 * mean((predict_func(Phi, w) - label)^2) + .5 * lambda * t(w) %*% w) } ## Return the errors for each iteration get_errors <- function(data, label, W) { n_weights = dim(W)[1] Phi <- cbind('X0' = 1, data) errors = matrix(,nrow=n_weights, ncol=2) for (tau in 1:n_weights) { errors[tau,1] = tau errors[tau,2] = train_obj_func(Phi, W[tau,],label, 0) ## Get the errors, set the lambda to 0 } return(errors) }
同時,我們定義了計算矩陣乘法,計算目標函數以及求誤差的方程。
sgd_train <- function(train_x, train_y, lambda, eta, epsilon, max_epoch) { ## Prepare the traindata ## Attach the 1 for X0 Phi <- as.matrix(cbind('X0'=1, train.data)) ## Calculate the max iteration time for the SGD train_len = dim(train_x)[1] tau_max = max_epoch * train_len W <- matrix(,nrow=tau_max, ncol=ncol(Phi)) set.seed(1234) ## Random Generate the start parameter W[1,] <- runif(ncol(Phi)) tau = 1 # counter ## Create a dateframe to store the value of cost function for each iteration obj_func_val <-matrix(,nrow=tau_max, ncol=1) obj_func_val[tau,1] = train_obj_func(Phi, W[tau,],train_y, lambda) while (tau <= tau_max){ # check termination criteria if (obj_func_val[tau,1]<=epsilon) {break} # shuffle data: train_index <- sample(1:train_len, train_len, replace = FALSE) # loop over each datapoint for (i in train_index) { # increment the counter tau <- tau + 1 if (tau > tau_max) {break} # make the weight update y_pred <- predict_func(Phi[i,], W[tau-1,]) W[tau,] <- sgd_update_weight(W[tau-1,], Phi[i,], train_y[i], y_pred, lambda, eta) # keep track of the objective funtion obj_func_val[tau,1] = train_obj_func(Phi, W[tau,],train_y, lambda) } } # resulting values for the training objective function as well as the weights return(list('vals'=obj_func_val,'W'=W)) } # updating the weight vector sgd_update_weight <- function(W_prev, x, y_true, y_pred, lambda, eta) { ## Computer the Gradient grad = - (y_true-y_pred) * x ## Here I just combine the regularisation term with prev w return(W_prev*(1-eta * lambda) - eta * grad) }
根據上述我們寫的計算更新目標函數和參數的方法,講算法用R實現
下面是實驗部分
## Load the train data and train label train.data <- read_data('assignment1_datasets/Task1C_train.csv',TRUE)$x train.label <- read_data('assignment1_datasets/Task1C_train.csv',TRUE)$y ## Load the testdata and test label test.data <- read_data('assignment1_datasets/Task1C_test.csv',TRUE)$x test.label <- read_data('assignment1_datasets/Task1C_test.csv',TRUE)$y # Set MAX EPOCH max_epoch = 18 ## Implement SGD with Ridge regression options(warn=-1) ## Initilize ## Set the related parameters epsilon = .001 ## Terimation threshold eta = .01 ## Learning Rate lambda= 0.5 ## Regularisation parmater ## Run SGD ## Cost function values train_res2 = sgd_train(train.data, train.label, lambda, eta, epsilon, max_epoch) ## Calulate the errors ## To be mentioned here, we will only visulisation for the train error to check the converge result errors2 = get_errors(train.data, train.label, train_res2$W)
接着,我們把SGD的error plot給繪制出來
## Visulastion for SGD plot(train_res2$val, main="SGD", type="l", col="blue", xlab="iteration", ylab="training objective function")
這里我們的方程比較簡單,可以看到,目標函數很快就收斂了。