Java和R語言各有側重,Java作為主流開發語言,擅長系統開發,R語言則擅長統計分析,將二者整合,Java負責系統的構建,R用來做分析引擎,從而實現具有分析功能的應用系統。
在Java代碼中調用R,可以通過兩種方式:Rserve和JRI
一、 Rserve(遠程通信模式)
Rserve是一個基於TCP/IP的服務器,通過二進制協議傳輸數據,可以提供遠程連接,使得客戶端語言能夠調用R。
1. 配置
目前Rserve作為一個package發布在CRAN上,可以直接使用install.packages("Rserve")進行安裝。需要使用時在R控制台下加載該包,然后輸入命令Rserve(),開啟服務器,就可以供客戶端調用。
Eclipse中需要添加REngine.jar和RserveEngine.jar兩個包依賴(jar包的獲取方式:a. R中安裝好Rserve以后,在library\Rserve\java目錄下有;b. 網站下載),如果是maven工程,在pom.xml文件中添加如下內容,即可。
<!-- https://mvnrepository.com/artifact/org.rosuda.REngine/Rserve --> <dependency> <groupId>org.rosuda.REngine</groupId> <artifactId>Rserve</artifactId> <version>1.8.1</version> </dependency>
2. 基本代碼
import org.rosuda.REngine.Rserve.RConnection; import org.rosuda.REngine.Rserve.RserveException; import org.rosuda.REngine.REXPMismatchException;; public class Temp { public static void main(String[] args) throws REXPMismatchException { // TODO Auto-generated method stub RConnection connection = null; System.out.println("平均值"); try { //創建對象 connection = new RConnection(); String vetor="c(1,2,3,4)"; connection.eval("meanVal<-mean("+vetor+")"); //System.out.println("the mean of given vector is="+mean); double mean=connection.eval("meanVal").asDouble(); System.out.println("the mean of given vector is="+mean); //connection.eval(arg0) } catch (RserveException e) { // TODO Auto-generated catch block e.printStackTrace(); } System.out.println("執行腳本"); try { connection.eval("source('D:/myAdd.R')");//此處路徑也可以這樣寫D:\\\\myAdd.R int num1=20; int num2=10; int sum=connection.eval("myAdd("+num1+","+num2+")").asInteger(); System.out.println("the sum="+sum); } catch (RserveException e) { // TODO Auto-generated catch block e.printStackTrace(); } connection.close(); } }
3. 多線程(unix)
在unix環境中,java應用可以多線程訪問一個Rserve實例。對於每一個新connection連接,Rserve都另起一個新進程。每一個新連接、新進程都有自己的工作目錄。
示例:
第一步啟動Rserve
R cmd Rserve --RS-port 1000
如下是java多線程代碼。代碼中,Rserve實例需要給四個java線程提供服務。
package com.studytrails.rserve; public class RServeMultiThreadClient { public static void main(String[] args) { RServeMultiThread thread1 = new RServeMultiThread(1000); RServeMultiThread thread2 = new RServeMultiThread(1000); RServeMultiThread thread3 = new RServeMultiThread(1000); RServeMultiThread thread4 = new RServeMultiThread(1000); thread1.start(); thread2.start(); thread3.start(); thread4.start(); try { thread1.join(); thread2.join(); thread3.join(); thread4.join(); } catch (InterruptedException e) { e.printStackTrace(); } } } package com.studytrails.rserve; import org.rosuda.REngine.REXP; import org.rosuda.REngine.REXPMismatchException; import org.rosuda.REngine.Rserve.RConnection; import org.rosuda.REngine.Rserve.RserveException; public class RServeMultiThread extends Thread { private int port = 0; public RServeMultiThread(int port) { this.port = port; } public void run() { try { RConnection c = new RConnection("localhost", port); c.eval("N = " + port); c.eval("x1=rnorm(N)"); c.eval("x2 = 1 + x1 + rnorm(N)"); c.eval("y <- 1 + x1 + x2"); c.eval("df <- data.frame(y,x1,x2)"); c.eval("fit <- lm(y ~ x1 + x2, data = df)"); REXP x1 = c.eval("fit[[1]][2]"); System.out.println("Thread with port " + port + " result: " + x1.asDouble()); Thread.sleep(5000); } catch (RserveException e1) { e1.printStackTrace(); } catch (InterruptedException e) { e.printStackTrace(); } catch (REXPMismatchException e) { e.printStackTrace(); } } }
參考鏈接:http://www.studytrails.com/r/rserve/r-java-multi-thread-using-rserve-unix/
4. Java代碼中自動啟動Rserve
實際應用中,手動管理Rserve的開啟是不可行的,需要讓Java代碼能夠自動啟動Rserve。代碼參見R安裝目錄下library\Rserve\client\java\Rserve\test\StartRserve.java文件,或https://github.com/s-u/REngine/blob/master/Rserve/test/StartRserve.java,代碼邏輯如下:
(1)通過運行RConnection c = new RConnection();看是否拋出異常,得知Rserve是否在運行中;
(2)如果不在運行,啟動Rserve:
a. Windows下,通過查詢注冊表找到R的安裝路徑,然后啟動;
Process rp = Runtime.getRuntime().exec("reg query HKLM\\Software\\R-core\\R"); StreamHog regHog = new StreamHog(rp.getInputStream(), true); rp.waitFor(); regHog.join(); installPath = regHog.getInstallPath(); launchRserve(installPath+"\\bin\\R.exe");
b. Unix下,嘗試如下路徑啟動(推測:如果命令行可以直接輸入R啟動R的交互式命令行,可以使用launchRserve("R"))
(launchRserve("R") || /* try some common unix locations of R */ ((new File("/Library/Frameworks/R.framework/Resources/bin/R")).exists() && launchRserve("/Library/Frameworks/R.framework/Resources/bin/R")) || ((new File("/usr/local/lib/R/bin/R")).exists() && launchRserve("/usr/local/lib/R/bin/R")) || ((new File("/usr/lib/R/bin/R")).exists() && launchRserve("/usr/lib/R/bin/R")) || ((new File("/usr/local/bin/R")).exists() && launchRserve("/usr/local/bin/R")) || ((new File("/sw/bin/R")).exists() && launchRserve("/sw/bin/R")) || ((new File("/usr/common/bin/R")).exists() && launchRserve("/usr/common/bin/R")) || ((new File("/opt/bin/R")).exists() && launchRserve("/opt/bin/R")) );
launchRserve函數首先使用R命令啟動Rserve,然后運行RConnection c = new RConnection();看是否拋出異常,判斷是否啟動成功
if (osname != null && osname.length() >= 7 && osname.substring(0,7).equals("Windows")) { isWindows = true; /* Windows startup */ p = Runtime.getRuntime().exec("\""+cmd+"\" -e \"library(Rserve);Rserve("+(debug?"TRUE":"FALSE")+",args='"+rsrvargs+"')\" "+rargs); } else /* unix startup */ p = Runtime.getRuntime().exec(new String[] { "/bin/sh", "-c", "echo 'library(Rserve);Rserve("+(debug?"TRUE":"FALSE")+",args=\""+rsrvargs+"\")'|"+cmd+" "+rargs });
int attempts = 5; /* try up to 5 times before giving up. We can be conservative here, because at this point the process execution itself was successful and the start up is usually asynchronous */ while (attempts > 0) { try { RConnection c = new RConnection(); System.out.println("Rserve is running."); c.close(); return true; } catch (Exception e2) { System.out.println("Try failed with: "+e2.getMessage()); } /* a safety sleep just in case the start up is delayed or asynchronous */ try { Thread.sleep(500); } catch (InterruptedException ix) { }; attempts--; } return false;
5. Windows下使用Rserve的限制
- no parallel connections are supported, subsequent connections share the same namespace
- sessions are not supported - this is a consequence of the fact that parallel connections are not supported
Since the Windows operating system doesn't support fork method for spawning copies of a process, it is not possible to initialize R and use initialized copies for all subsequent connections in parallel. Therefore the Rserve for Windows supports no concurrent connections. This implies that all subsequent connections share the same namespace and sessions (as in >=0.4 version on unix) cannot be supported. It is still possible to start multiple Rserves to handle multiple connections (just make sure you use different port for each one).
參考鏈接:http://rforge.net/Rserve/rserve-win.html
二、JRI(嵌入式模式)
JRI,全名是Java/R Interface,這是一種完全不同的方式,通過調用R的動態鏈接庫從而利用R中的函數等。
使用方法詳見https://www.cnblogs.com/tomcattd/p/3369938.html
此方法沒有實踐過,不再詳述
關於兩種方式優缺點比較的摘錄
摘錄一
1.1 JRI(嵌入式模式)我體會到最大的優點是它對中文的支持較好,但是使用JRI模式下很容易造成整個系統的崩潰,比如在java調用R的時候,中間出現了異常或者錯誤,這些錯誤大致都是致命的,導致java虛擬機崩潰,從而導致整個系統崩潰,這是一個可怕的噩夢。
1.2 Rserve(遠程通信模式) 在這種通信模式下,最大的優點是javaWeb項目不需要去維護R的運行,通過TCP/IP協議直接進行通訊,但是有一個很大的缺點是它對中文的支持很弱,尤其是在windows的環境中。基本是不支持中文的,在linux環境下,似乎對中文的支持稍微好些。不是完全支持中文的話,對返回有中文或者輸入有中文的系統將是不可用的。
小結:在項目的開發中,我首先使用的是JRI模式,將項目部署以后,經常出現崩潰問題,所以最終還是放棄了JRI調用模式,隨之使用了Rserve遠程調用模式,雖然不支持中文,但是項目本身的傳參是沒有中文的,返回的數據都由R處理以后,返回數據庫,只返回一個狀態量給web服務器。
摘錄二
Instead of RServe, you can use JRI, that is shipped with rJava package.
In my opinion JRI is better than RServe, because instead of creating a separate process it uses native calls to integrate Java and R.
With JRI you don't have to worry about ports, connections, watchdogs, etc... The calls to R are done using an operating system library (libjri).
The methods are pretty similar to RServe, and you can still use REXP objects.