最近需要做一個UI,在UI上做一個可以提交的spark程序的功能;
1-zeppelin就是這樣的一個工具,其內部也是比較繁瑣的。有興趣的可以了解下。
2-SparkLauncher,spark自帶的類
linux下其基本用法:
public static void main(String[] args) throws Exception { HashMap<String, String> envParams = new HashMap<>(); envParams.put("YARN_CONF_DIR", "/home/hadoop/cluster/hadoop-release/etc/hadoop"); envParams.put("HADOOP_CONF_DIR", "/home/hadoop/cluster/hadoop-release/etc/hadoop"); envParams.put("SPARK_HOME", "/home/hadoop/cluster/spark-new"); envParams.put("SPARK_PRINT_LAUNCH_COMMAND", "1"); SparkAppHandle spark = new SparkLauncher(envParams) .setAppResource("/home/hadoop/cluster/spark-new/examples/jars/spark-examples_2.11-2.2.1.jar") .setMainClass("org.apache.spark.examples.SparkPi") .setMaster("yarn") .startApplication(); Thread.sleep(100000); }
運行結果:
信息: 18/12/03 18:12:12 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 1.462 s 十二月 03, 2018 6:12:12 下午 org.apache.spark.launcher.OutputRedirector redirect 信息: 18/12/03 18:12:12 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 3.395705 s 十二月 03, 2018 6:12:12 下午 org.apache.spark.launcher.OutputRedirector redirect 信息: Pi is roughly 3.1461157305786527
windows下運行:
如果linux能運行,那就安裝windows下所依賴包,包含jdk,hadoop,scala,spark;
可以參考https://blog.csdn.net/u011513853/article/details/52865076
代碼貼上:
public class SparkLauncherTest { private static String YARN_CONF_DIR = null; private static String HADOOP_CONF_DIR = null; private static String SPARK_HOME = null; private static String SPARK_PRINT_LAUNCH_COMMAND = "1"; private static String Mater = null; private static String appResource = null; private static String mainClass = null; public static void main(String[] args) throws Exception { if (args.length != 1) { System.out.println("Usage: ServerStatisticSpark <local>"); System.exit(1); } TrackerConfig trackerConfig = TrackerConfig.loadConfig(); if ("local".equals(args[0])){ YARN_CONF_DIR="D:\\software\\hadoop-2.4.1\\etc\\hadoop"; HADOOP_CONF_DIR="D:\\software\\hadoop-2.4.1\\etc\\hadoop"; SPARK_HOME="D:\\spark-new"; Mater = "local"; appResource = "D:\\spark-new\\examples\\jars\\spark-examples_2.11-2.2.1.jar"; } else { YARN_CONF_DIR="/home/hadoop/cluster/hadoop-release/etc/hadoop"; HADOOP_CONF_DIR="/home/hadoop/cluster/hadoop-release/etc/hadoop"; SPARK_HOME="/home/hadoop/cluster/spark-new"; Mater = "yarn"; appResource = "/home/hadoop/cluster/spark-new/examples/jars/spark-examples_2.11-2.2.1.jar"; } HashMap<String, String> envParams = new HashMap<>(); envParams.put("YARN_CONF_DIR", YARN_CONF_DIR); envParams.put("HADOOP_CONF_DIR", HADOOP_CONF_DIR); envParams.put("SPARK_HOME", SPARK_HOME); envParams.put("SPARK_PRINT_LAUNCH_COMMAND", SPARK_PRINT_LAUNCH_COMMAND); mainClass = "org.apache.spark.examples.SparkPi"; SparkAppHandle spark = new SparkLauncher(envParams) .setAppResource(appResource) .setMainClass(mainClass) .setMaster(Mater) .startApplication(); Thread.sleep(100000); } }
運行結果:
信息: 18/12/04 17:01:11 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 0.808691 s 十二月 04, 2018 5:01:11 下午 org.apache.spark.launcher.OutputRedirector redirect 信息: Pi is roughly 3.1455757278786396
遇到的問題,sparkLauncher一直運行不了;
這時hadoop,jdk都用了很長時間,排除其原因;
本地可以編寫和運行scala,應該也不屬於其中的問題;
最后發現cmd運行spark\bin下的spark-submit會出現問題。於是重新拷貝linux下的spark包;
發現spark-shell可以正常運行,原來會報錯:不是內部或外部命令,也不是可運行的程序或批處理文件

現在還存在的問題:
打jar包時,會有部分類打不進去,報錯信息類沒有找到;
等UI做成后,會更新整個流程。
