Spark+IDEA單機版環境搭建+IDEA快捷鍵


1. IDEA中配置Spark運行環境

    請參考博文:http://www.cnblogs.com/jackchen-Net/p/6867838.html

3.1.Project Struct查看項目的配置信息

 

3.2.IDEA中如果沒有默認安裝Scala,可在本地安裝即可

   如果需要安裝多版本的scala請注意:

   如果您在本地已經安裝了msi結尾的scala,還需要安裝第二個版本,建議下載zip包,優點是直接解壓在IDEA中配置即可。如第3步所示。

   注意:scala下載地址:http://www.scala-lang.org/download/2.10.4.html

3.3.查看scala環境配置,可以通過下圖綠色的”+”添加本地已經下載的scala安裝包

   

   

3.4.特別注意,如果在執行spark代碼遇到如下問題,請更改scala版本

Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
at akka.actor.ActorCell$.<init>(ActorCell.scala:336)
at akka.actor.ActorCell$.<clinit>(ActorCell.scala)
at akka.actor.RootActorPath.$div(ActorPath.scala:159)
at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:464)
at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:452)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)
at scala.util.Try$.apply(Try.scala:191)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at scala.util.Success.flatMap(Try.scala:230)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)
at akka.actor.ActorSystemImpl.liftedTree1$1(ActorSystem.scala:584)
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:577)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:108)
at akka.Akka$.delayedEndpoint$akka$Akka$1(Akka.scala:11)
at akka.Akka$delayedInit$body.apply(Akka.scala:9)
at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:383)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at akka.Akka$.main(Akka.scala:9)
at akka.Akka.main(Akka.scala)

  解決方法是將scala2.11版本改為2.10版本即可。(注意:spark版本為1.6.0

3.5.導入程序運行所需要的jar包

  •  通過libary,點擊”+”spark-assembly-1.6.0-hadoop2.6.0.jar導入Classes位置
  •  通過spark官網下載spark1.6.0的源碼文件(spark1.6.0-src.tgz)解壓在windows本地后,通過點擊最右側的”+”導入所有的源碼包,從而可以查看源代碼。

 

 3.6.建立一個scala文件並寫代碼

package com.bigdata.demo
import org.apache.spark.{SparkConf, SparkContext}
/**
  * Created by SimonsZhao on 3/25/2017.
  */
object  wordCount {
  def main(args: Array[String]) {
    val conf =new SparkConf().setMaster("local").setAppName("wordCount")
    val sc =new SparkContext(conf)
    val data=sc.textFile("E://scala//spark//testdata//word.txt")
    data.flatMap(_.split("\t")).map((_,1)).reduceByKey(_+_).collect().foreach(println)
  }
}

3.7.運行結果

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/03/25 17:25:49 INFO SparkContext: Running Spark version 1.6.0
17/03/25 17:25:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/03/25 17:25:50 INFO SecurityManager: Changing view acls to: SimonsZhao
17/03/25 17:25:50 INFO SecurityManager: Changing modify acls to: SimonsZhao
17/03/25 17:25:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(SimonsZhao); users with modify permissions: Set(SimonsZhao)
17/03/25 17:25:51 INFO Utils: Successfully started service 'sparkDriver' on port 53279.
17/03/25 17:25:51 INFO Slf4jLogger: Slf4jLogger started
17/03/25 17:25:51 INFO Remoting: Starting remoting
17/03/25 17:25:51 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.191.1:53292]
17/03/25 17:25:51 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 53292.
17/03/25 17:25:51 INFO SparkEnv: Registering MapOutputTracker
17/03/25 17:25:51 INFO SparkEnv: Registering BlockManagerMaster
17/03/25 17:25:51 INFO DiskBlockManager: Created local directory at C:\Users\SimonsZhao\AppData\Local\Temp\blockmgr-7e548732-b1db-4e3c-acdb-37e686b10dff
17/03/25 17:25:51 INFO MemoryStore: MemoryStore started with capacity 2.4 GB
17/03/25 17:25:51 INFO SparkEnv: Registering OutputCommitCoordinator
17/03/25 17:25:51 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/03/25 17:25:51 INFO SparkUI: Started SparkUI at http://192.168.191.1:4040
17/03/25 17:25:52 INFO Executor: Starting executor ID driver on host localhost
17/03/25 17:25:52 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 53299.
17/03/25 17:25:52 INFO NettyBlockTransferService: Server created on 53299
17/03/25 17:25:52 INFO BlockManagerMaster: Trying to register BlockManager
17/03/25 17:25:52 INFO BlockManagerMasterEndpoint: Registering block manager localhost:53299 with 2.4 GB RAM, BlockManagerId(driver, localhost, 53299)
17/03/25 17:25:52 INFO BlockManagerMaster: Registered BlockManager
17/03/25 17:25:52 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 153.6 KB, free 153.6 KB)
17/03/25 17:25:52 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 167.5 KB)
17/03/25 17:25:52 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:53299 (size: 13.9 KB, free: 2.4 GB)
17/03/25 17:25:52 INFO SparkContext: Created broadcast 0 from textFile at wordCount.scala:11
17/03/25 17:25:54 WARN : Your hostname, SimonsCJ resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:c0a8:bf01%30, but we couldn't find any external IP address!
17/03/25 17:25:55 INFO FileInputFormat: Total input paths to process : 1
17/03/25 17:25:55 INFO SparkContext: Starting job: collect at wordCount.scala:12
17/03/25 17:25:55 INFO DAGScheduler: Registering RDD 3 (map at wordCount.scala:12)
17/03/25 17:25:55 INFO DAGScheduler: Got job 0 (collect at wordCount.scala:12) with 1 output partitions
17/03/25 17:25:55 INFO DAGScheduler: Final stage: ResultStage 1 (collect at wordCount.scala:12)
17/03/25 17:25:55 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
17/03/25 17:25:55 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
17/03/25 17:25:55 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at wordCount.scala:12), which has no missing parents
17/03/25 17:25:55 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.1 KB, free 171.6 KB)
17/03/25 17:25:55 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free 173.9 KB)
17/03/25 17:25:55 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:53299 (size: 2.3 KB, free: 2.4 GB)
17/03/25 17:25:55 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
17/03/25 17:25:55 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at wordCount.scala:12)
17/03/25 17:25:55 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/03/25 17:25:55 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2129 bytes)
17/03/25 17:25:55 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/03/25 17:25:55 INFO HadoopRDD: Input split: file:/E:/scala/spark/testdata/word.txt:0+19
17/03/25 17:25:55 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
17/03/25 17:25:55 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
17/03/25 17:25:55 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
17/03/25 17:25:55 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
17/03/25 17:25:55 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
17/03/25 17:25:55 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2253 bytes result sent to driver
17/03/25 17:25:55 INFO DAGScheduler: ShuffleMapStage 0 (map at wordCount.scala:12) finished in 0.228 s
17/03/25 17:25:55 INFO DAGScheduler: looking for newly runnable stages
17/03/25 17:25:55 INFO DAGScheduler: running: Set()
17/03/25 17:25:55 INFO DAGScheduler: waiting: Set(ResultStage 1)
17/03/25 17:25:55 INFO DAGScheduler: failed: Set()
17/03/25 17:25:55 INFO DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at wordCount.scala:12), which has no missing parents
17/03/25 17:25:55 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 194 ms on localhost (1/1)
17/03/25 17:25:55 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/03/25 17:25:55 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.6 KB, free 176.4 KB)
17/03/25 17:25:55 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1600.0 B, free 178.0 KB)
17/03/25 17:25:55 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:53299 (size: 1600.0 B, free: 2.4 GB)
17/03/25 17:25:55 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
17/03/25 17:25:55 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at wordCount.scala:12)
17/03/25 17:25:55 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
17/03/25 17:25:55 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, partition 0,NODE_LOCAL, 1894 bytes)
17/03/25 17:25:55 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
17/03/25 17:25:55 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
17/03/25 17:25:55 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 6 ms
17/03/25 17:25:55 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1349 bytes result sent to driver
17/03/25 17:25:55 INFO DAGScheduler: ResultStage 1 (collect at wordCount.scala:12) finished in 0.059 s
17/03/25 17:25:55 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 59 ms on localhost (1/1)
17/03/25 17:25:55 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/03/25 17:25:55 INFO DAGScheduler: Job 0 finished: collect at wordCount.scala:12, took 0.532461 s
(you,1) (hello,2) (me,1) 17/03/25 17:25:55 INFO SparkContext: Invoking stop() from shutdown hook
17/03/25 17:25:56 INFO SparkUI: Stopped Spark web UI at http://192.168.191.1:4040
17/03/25 17:25:56 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/03/25 17:25:56 INFO MemoryStore: MemoryStore cleared
1/03/25 17:25:56 INFO BlockManager: BlockManager stopped
17/03/25 17:25:56 INFO BlockManagerMaster: BlockManagerMaster stopped
17/03/25 17:25:56 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/03/25 17:25:56 INFO SparkContext: Successfully stopped SparkContext
17/03/25 17:25:56 INFO ShutdownHookManager: Shutdown hook called
17/03/25 17:25:56 INFO ShutdownHookManager: Deleting directory C:\Users\SimonsZhao\AppData\Local\Temp\spark-220c67fe-f2c3-400b-bfe1-fe833e33e74f

2.Sparkwindows環境搭建

 2.1.進入spark官網下載對應hadoop版本的spark安裝文件

     http://spark.apache.org/docs/latest/

 2.2.在windows下面配置環境變量(新建SPARK_HOME系統變量,輸入spark安裝文件路徑,在PATH中加入%SPARK_HOME%\bin;變量即可。)

       進入windows控制台中,直接輸入spark-shell即可顯示如下圖。

     否則,需要進入下載的spark1.6.0的下載安裝目錄中,執行spark-shell

       執行結果如下:

 

3.測試是否正確:

    3.1.准備數據

  E:\scala\spark\testdata中的work.txt文件中寫入以下文件

      Hello you

      Hello me

    3.2.輸入並查看結果輸出

3.其他可能碰到的問題   

java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Curr

  正常情況下是可以運行成功並進入到Spark的命令行環境下的,但是對於有些用戶可能會遇到空指針的錯誤。這個時候,主要是因為Hadoop的bin目錄下沒有winutils.exe文件的原因造成的。這里的解決辦法是: 

  • 前往 https://github.com/steveloughran/winutils 下載該項目的zip包在你的系統中,然后選擇你安裝的Hadoop版本號,然后進入到bin目錄下,將找到的winutils.exe這個文件放入到Hadoop的bin目錄下,我這里是F:\hadoop\bin。 
  • 在打開的cmd中輸入F:\hadoop\bin\winutils.exe chmod 777 /tmp/Hive     這個操作是用來修改權限的。注意前面的F:\hadoop\bin部分要對應的替換成實際你所安裝的bin目錄所在位置。
  • 經過這幾個步驟之后,然后再次開啟一個新的cmd窗口,如果正常的話,應該就可以通過直接輸入spark-shell來運行Spark了。 

END~

4.IDEA 快捷鍵常用集錦

/**
* IDEA快捷鍵
* Alt+enter 導入包,自動修正
* ctrl+alt+L 自動格式化代碼
* alt+insert 自動生成構造器、getter/setter等等常用方法
* ctrl+d 復制當前行到下一行
* shift+enter 另起一行
* ctrl+N 查找類
* 雙擊shift 在項目的所有目錄查找,就是你想看到你不想看到的和你沒想過你能看到的都給你找出來
* Ctrl+Alt+O 優化導入的類和包
* Ctrl+J 自動代碼
* 按Ctrl-J組合鍵來執行一些你記不起來的Live Template縮寫。比如,鍵“it”然后按Ctrl-J看看有什么發生。
* Ctrl+O可以選擇父類的方法進行重寫
* Ctrl+Q可以看JavaDoc
* Ctrl+Alt+M 抽取方法
* Ctrl+Alt+V 抽取局部變量
* Ctrl+Alt+C 抽取常量
* Ctrl+Alt+F 抽取實例變量
* Ctrl+Alt+t 使用if-else trycatch方法
* shift+alt+下箭頭
*/


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM