轉】[1.0.2] 詳解基於maven管理-scala開發的spark項目開發環境的搭建與測試


 

 

  原博文出自於:  http://blog.csdn.net/pengych_321/article/details/52014249#comments        感謝!

 

 

場景

  好的,假設項目數據調研與需求分析已接近尾聲,馬上進入Coding階段了,辣么在Coding之前需要干馬呢?是的,“統一開發工具、開發環境的搭建與本地測試、測試環境的搭建與測試” - 本文詳細記錄實際Spark項目開發環境的搭建。

 

 

分析

開發工具

操作系統:win 10 
JDK 版本 :jdk1.8.0_91 
Scala版本:2.10.6 
MAVEN版本:apache-maven-3.3.9 
集成開發工具:IntelliJ IDEA 2016.1.3 
開發主要語言:scala

 

 

開發環境的搭建與測試

一. 搭建過程文檔 
1、新建一個Maven工程 
這里以新建一個名稱為fantasia的maven工程為例加以說明。

 

 

 設置完了,選擇下一步

 

 

 設置完了,選擇下一步

 

 

 

點擊 finish 后idea會加載maven與junit等相關的插件,可能需要30分鍾左右的時間(網速決定)。

2、自定義maven的repository目錄 
idea內置了maven插件,且默認repository目錄為C:\Users\${username}\.m2\repository ,這里我們為項目指定一個新的repository,以方便管理依賴的jar包:

 

 

3、在pom.xml文件中配置相關依賴包 
這里一次性導入項目可能用到的jar包,具體內容如下:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

  <modelVersion>4.0.0</modelVersion>

  <groupId>com.pl.bdeu.bigdata</groupId>

  <artifactId>fantasia</artifactId>

  <version>1.0-SNAPSHOT</version>

  <inceptionYear>2008</inceptionYear>

  <properties>

    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

    <scala.version>2.10.6</scala.version>

    <spark.version>1.6.2</spark.version>

    <hadoop.version>2.6.0</hadoop.version>

  </properties>

  <repositories>

    <repository>

      <id>scala-tools.org</id>

      <name>Scala-Tools Maven2 Repository</name>

      <url>http://scala-tools.org/repo-releases</url>

    </repository>

  </repositories>

 

  <pluginRepositories>

    <pluginRepository>

      <id>scala-tools.org</id>

      <name>Scala-Tools Maven2 Repository</name>

      <url>http://scala-tools.org/repo-releases</url>

    </pluginRepository>

  </pluginRepositories>

 

  <dependencies>

    <dependency>

      <groupId>org.scala-lang</groupId>

      <artifactId>scala-library</artifactId>

      <version>${scala.version}</version>

    </dependency>

    <dependency>

      <groupId>junit</groupId>

      <artifactId>junit</artifactId>

      <version>3.8.1</version>

      <scope>test</scope>

    </dependency>

    <dependency>

      <groupId>org.specs</groupId>

      <artifactId>specs</artifactId>

      <version>1.2.5</version>

      <scope>test</scope>

    </dependency>

    <dependency>

      <groupId>org.apache.spark</groupId>

      <artifactId>spark-core_2.10</artifactId>

      <version>${spark.version}</version>

    </dependency>

    <dependency>

      <groupId>org.apache.spark</groupId>

      <artifactId>spark-sql_2.10</artifactId>

      <version>${spark.version}</version>

    </dependency>

    <dependency>

      <groupId>org.apache.spark</groupId>

      <artifactId>spark-hive_2.10</artifactId>

      <version>${spark.version}</version>

    </dependency>

    <dependency>

      <groupId>org.apache.spark</groupId>

      <artifactId>spark-streaming_2.10</artifactId>

      <version>${spark.version}</version>

    </dependency>

    <dependency>

      <groupId>org.apache.spark</groupId>

      <artifactId>spark-mllib_2.10</artifactId>

      <version>${spark.version}</version>

    </dependency>

    <dependency>

      <groupId>org.apache.hadoop</groupId>

      <artifactId>hadoop-client</artifactId>

      <version>${hadoop.version}</version>

    </dependency>

    <dependency>

      <groupId>org.apache.spark</groupId>

      <artifactId>spark-streaming-kafka_2.10</artifactId>

      <version>${spark.version}</version>

    </dependency>

    <dependency>

      <groupId>mysql</groupId>

      <artifactId>mysql-connector-java</artifactId>

      <version>5.1.6</version>

    </dependency>

    <dependency>

      <groupId>org.json</groupId>

      <artifactId>json</artifactId>

      <version>20090211</version>

    </dependency>

    <dependency>

      <groupId>com.fasterxml.jackson.core</groupId>

      <artifactId>jackson-core</artifactId>

      <version>2.4.3</version>

    </dependency>

    <dependency>

      <groupId>com.fasterxml.jackson.core</groupId>

      <artifactId>jackson-databind</artifactId>

      <version>2.4.3</version>

    </dependency>

    <dependency>

      <groupId>com.fasterxml.jackson.core</groupId>

      <artifactId>jackson-annotations</artifactId>

      <version>2.4.3</version>

    </dependency>

    <dependency>

      <groupId>com.alibaba</groupId>

      <artifactId>fastjson</artifactId>

      <version>1.1.41</version>

    </dependency>

    <dependency>

      <groupId>fastutil</groupId>

      <artifactId>fastutil</artifactId>

      <version>5.0.9</version>

    </dependency>

  </dependencies>

  <build>

    <sourceDirectory>src/main/scala</sourceDirectory>

    <testSourceDirectory>src/test/scala</testSourceDirectory>

    <plugins>

      <plugin>

        <groupId>org.scala-tools</groupId>

        <artifactId>maven-scala-plugin</artifactId>

        <executions>

          <execution>

            <goals>

              <goal>compile</goal>

              <goal>testCompile</goal>

            </goals>

          </execution>

        </executions>

        <configuration>

          <scalaVersion>${scala.version}</scalaVersion>

          <args>

            <arg>-target:jvm-1.5</arg>

          </args>

        </configuration>

      </plugin>

      <plugin>

        <groupId>org.apache.maven.plugins</groupId>

        <artifactId>maven-eclipse-plugin</artifactId>

        <configuration>

          <downloadSources>true</downloadSources>

          <buildcommands>

            <buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand>

          </buildcommands>

          <additionalProjectnatures>

            <projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature>

          </additionalProjectnatures>

          <classpathContainers>

            <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>

            <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer>

          </classpathContainers>

        </configuration>

      </plugin>

    </plugins>

  </build>

  <reporting>

    <plugins>

      <plugin>

        <groupId>org.scala-tools</groupId>

        <artifactId>maven-scala-plugin</artifactId>

        <configuration>

          <scalaVersion>${scala.version}</scalaVersion>

        </configuration>

      </plugin>

    </plugins>

  </reporting>

</project>

 

 

4、項目基礎架構 
新建兩個子包:collector與 core 
collector:存放 數據采集相關spark作業 
core:存放核心業務類spark作業 
resource目錄下存放相關配置文件:數據庫連接信息,kafka環境信息等, 
其他的后續根據具體模塊功能個再自行定義。

 

 

 

 

5、本地環境測試 
編寫 FrameworkExeTest類對框架可用性進行測試

 

 

package com.pl.bdeu.bigdata

 

import org.apache.commons.logging.LogFactory
import org.apache.spark.{SparkConf, SparkContext}
/**
* author pengych@pl.com
* date 2016/7/24
* function 框架可用性測試
*
執行結果:
(hello,2)
(pl,1)
(fantasia,1)
*/
object FrameworkExeTest {

 

def main(args: Array[String]) {
val log = LogFactory.getLog("FrameworkExeTest")

 

val conf = new SparkConf().setMaster("local[*]").setAppName("fantasia framework test")
val sc = new SparkContext(conf)
if(log.isDebugEnabled){
log.debug(" SparkContext initialized")
}

 

val linesRDD= sc.textFile("E:\\wordcount.txt")
linesRDD.flatMap(line => line.split(" ") ).map( word => (word,1) ).reduceByKey(_+_).
collect.foreach(println)
sc.stop()
}
}

 

 

 

總結

  • 耐心很重要,因為網速很可能很慢 
    別在idea加載依賴包的時候手動干掉正在加載的進程,這樣很可能導致各種找不到包的情況.

  • 在maven的安裝目錄: ~\apache-maven-3.3.9\conf\settings.xml的標簽里自定義repository路徑 
    本文指定repository的路徑為:E:\apache-maven-3.3.9\repository

<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd"> <localRepository>E:\apache-maven-3.3.9\repository</localRepository>

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM