最近在看《機器學習:實用案例解析》,做郵件過濾器的時候,參考書中的代碼讀取郵件文件進行分類器訓練,在讀取過程中會出現下面的錯誤:
seq.default(which(text == "")[1] + 1, length(text), 1) : 'from' cannot be NA, NaN or infinite
seq . default ( which ( text == "" ) [ 1 ] + 1 , length ( text ) , 1 ) : 'from' cannot be NA , NaN or infinite |
看了一下,應該是讀取文件的時候文件編碼的問題,具體鎖定的代碼如下:
R
get.msg <- function(path) { con <- file(path, open = "rt", encoding = "latin1") text <- readLines(con) msg <- text[seq(which(text == "")[1] + 1, length(text), 1)] close(con) return(paste(msg, collapse = "\n")) }
get . msg < - function ( path ) { con < - file ( path , open = "rt" , encoding = "latin1" ) text < - readLines ( con ) msg < - text [ seq ( which ( text == "" ) [ 1 ] + 1 , length ( text ) , 1 ) ] close ( con ) return ( paste ( msg , collapse = "\n" ) ) } |
懶得去研究是哪里的問題,加上我也是剛剛學習R,最簡單的方法就是做一個錯誤處理,捕獲錯誤然后處理了就OK,最簡單的莫過於tryCatch了。找了一下,R中的tryCatch使用方法如下:
R
result = tryCatch({
expr
}, warning = function(w) { warning-handler-code }, error = function(e) { error-handler-code }, finally = { cleanup-code }
result = tryCatch ( { expr } , warning = function ( w ) { warning - handler - code } , error = function ( e ) { error - handler - code } , finally = { cleanup - code } |
接下來就很簡單了,把代碼修改為下面的形式,問題解決:
R
get.msg <- function(path) { con <- file(path, open = "rt", encoding = "latin1") text <- readLines(con) msg <- tryCatch({ text[seq(which(text == "")[1] + 1, length(text), 1)] }, error = function(e) { "" }) close(con) return(paste(msg, collapse = "\n")) }
get . msg < - function ( path ) { con < - file ( path , open = "rt" , encoding = "latin1" ) text < - readLines ( con ) msg < - tryCatch ( { text [ seq ( which ( text == "" ) [ 1 ] + 1 , length ( text ) , 1 ) ] } , error = function ( e ) { "" } ) close ( con ) return ( paste ( msg , collapse = "\n" ) ) } |
總的來說,遇到這個問題我只是用來最簡單的方法跳過去了,如果是在真實的項目中,可能就需要去排查具體的問題,tryCatch只是用來預防一些極個別的錯誤情況用的方法。