平時都是幾百萬的數據量,這段時間公司中了個大標,有上億的數據量。
現在情況是數據已經在數據庫里面了,需要用R分析,但是完全加載不進來內存。
面對現在這種情況,R提供了ff, ffbase , ETLUtils 的解決方案。
它可以很簡單的加載,轉換數據庫的數據進入R內存,ETLUtils 包現在已經擴展了read.odbc.ffdf
方法用來查詢Oracle, MySQL, PostgreSQL & sqlite databases.。
下面我們就來展示一個例子。
require(ETLUtils)
login <- list()
login$user <- "bnosac"
login$password <- "YourPassword"
login$dbname <- "YourDB"
login$host <- "localhost/IPaddress"
require(RMySQL)
x <- read.dbi.ffdf(
query = "select * from semetis.keywords_performance_endofday",
dbConnect.args = list(drv = dbDriver("MySQL"),
dbname = login$dbname, user = login$user, password = login$password, host = login$host),
VERBOSE=TRUE)
1> dim(x)
[1] 14969674 27
login <- list()
login$dsn <- "YourDSN"
login$uid <- "bnosac"
login$pwd <- "YourPassword"
require(RODBC)
x <- read.odbc.ffdf(
query = "select * from semetis.keywords_performance_endofday where date = CURRENT_DATE-1",
odbcConnect.args = list(dsn = login$dsn, uid = login$uid, pwd = login$pwd),
x = x,
VERBOSE=TRUE)
1> dim(x)
[1] 15062904 27
指定本地ff文件路徑
save.ffdf(ffdfname, dir=”/PATH/TO/STORE/FF/FILES”)
https://www.rdocumentation.org/packages/ffbase/versions/0.12.3/topics/save.ffdf
load.ffdf(dir=”/PATH/TO/STORE/FF/FILES”)
https://www.rdocumentation.org/packages/ffbase/versions/0.12.3/topics/load.ffdf
read.dbi.ffdf 更詳細的介紹
https://www.rdocumentation.org/packages/ETLUtils/versions/1.3/topics/read.dbi.ffdf
