R語言處理大規模數據速度不算快,通過安裝其他包比如data.table可以提升讀取處理速度。
案例,分別用read.csv和data.table包的fread函數讀取一個1.67萬行、230列的表格數據。
# 用read.csv讀取數據
timestart<-Sys.time() data <- read.csv("XXXXs.csv",header = T,stringsAsFactors = F) timeend<-Sys.time() runningtime<-timeend-timestart print(runningtime) # 返回 runningtime 結果: Time difference of 4.451127 secs
timestart<-Sys.time() data1<-fread("XXXXs.csv",header = T,stringsAsFactors = F) timeend<-Sys.time() runningtime<-timeend-timestart print(runningtime)
# 返回 runningtime 結果: Time difference of 0.9460249 secs
參考資料:
R語言data.table速查(博客園-Little_Rookie):https://www.cnblogs.com/nxld/p/6059570.html
https://zhuanlan.zhihu.com/p/22317779?refer=rdatamining
data.table的guideline: https://cran.r-project.org/web/packages/data.table/data.table.pdf