Hadoop偽分布式環境部署及Spark、Intellij IDEA安裝


Hadoop偽分布式環境部署及Spark、Intellij IDEA安裝

環境信息及軟件准備

系統信息:

  • Linux promote 4.1.12-1-default #1 SMP PREEMPT Thu Oct 29 06:43:42 UTC 2015 (e24bad1) x86_64 x86_64 x86_64 GNU/Linux

需要的軟件:

  • jdk-8u101-linux-x64.rpm
  • scala-2.11.8.rpm
  • hadoop-2.6.4.tar.gz
  • spark-2.0.0-bin-hadoop2.6.tgz
  • ideaIC-2016.2.2.tar.gz

創建spark用戶

使用root用戶,執行以下命令,創建spark用戶,spark用戶的家目錄下文以 $HOME 代替

useradd -m -d /home/spark -s /bin/bash spark
passwd spark

配置SSH

使用spark用戶執行以下命令

ssh-keygen -t rsa -P ""  
cd /home/spark/.ssh 
cat id_rsa.pub >> authorized_keys

執行以下命令,查看SSH配置是否成功

ssh localhost

如果配置成功,會成功登錄系統並顯示歡迎信息

spark@promote:~/.ssh> ssh localhost
Last failed login: Fri Aug 19 23:13:27 CST 2016 from localhost on ssh:notty
There were 3 failed login attempts since the last successful login.
Have a lot of fun...
spark@promote:~>

安裝JDK

將jdk-8u101-linux-x64.rpm上傳到單板上,以root用戶執行以下命令,安裝jdk

rpm -ivh jdk-8u101-linux-x64.rpm

編輯 /etc/profile 文件,在文件的末尾添加以下內容

export JAVA_HOME=/usr/java/jdk1.8.0_101
export PATH=$JAVA_HOME/bin:$PATH

執行以下命令,查看安裝是否成功

su - spark
echo $JAVA_HOME
java -version

安裝成功的話,會看到以下顯示

promote:~ # su - spark
spark@promote:~> echo $JAVA_HOME
/usr/java/jdk1.8.0_101
spark@promote:~> java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)

安裝Scala

將scala-2.11.8.rpm上傳到單板上,以root用戶執行以下命令,安裝Scala

rpm -ivh scala-2.11.8.rpm

編輯 /etc/profile 文件,在文件的末尾添加以下內容

export SCALA_HOME=/usr/share/scala
export PATH=$SCALA_HOME/bin:$PATH

執行以下命令,查看安裝是否成功

su - spark
echo $SCALA_HOME
scala -version

如果安裝成功,會看到以下顯示

promote:~ # su - spark
spark@promote:~> echo $SCALA_HOME
/usr/share/scala
spark@promote:~> scala -version
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL

安裝Hadoop

本節的操作都以spark用戶執行
將hadoop-2.6.4.tar.gz上傳到$HOME目錄,執行以下操作,將壓縮包解壓

tar zxvf hadoop-2.6.4.tar.gz

編輯$HOME/.profile文件,在文件末尾添加以下內容

export HADOOP_HOME=/home/spark/hadoop-2.6.4
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export PATH=$HADOOP_HOME/bin:$PATH

執行以下命令,驗證hadoop的環境變量信息配置是否正確

source $HOME/.profile
hadoop version

如果配置正確,會看到以下顯示

spark@promote:~> hadoop version
Hadoop 2.6.4
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 5082c73637530b0b7e115f9625ed7fac69f937e6
Compiled by jenkins on 2016-02-12T09:45Z
Compiled with protoc 2.5.0
From source with checksum 8dee2286ecdbbbc930a6c87b65cbc010
This command was run using /home/spark/hadoop-2.6.4/share/hadoop/common/hadoop-common-2.6.4.jar

修改Hadoop配置文件

由於本文檔指導的是偽分布式環境的安裝,所以,需要修改以下配置文件

  • $HADOOP_HOME/etc/hadoop/core-site.xml
  • $HADOOP_HOME/etc/hadoop/hdfs-site.xml
  • $HADOOP_HOME/etc/hadoop/mapred-site.xml
  • $HADOOP_HOME/etc/hadoop/yarn-site.xml

配置core-site.xml

編輯$HADOOP_HOME/etc/hadoop/core-site.xml,將<configuration/>標簽配置成以下內容

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/spark/hadoop-2.6.4/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

配置hdfs-site.xml

編輯$HADOOP_HOME/etc/hadoop/hdfs-site.xml,將<configuration/>標簽配置成以下內容

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/home/spark/hadoop-2.6.4/hdfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/home/spark/hadoop-2.6.4/hdfs/data</value>
    </property>
</configuration>

配置mapred-site.xml

執行以下命令,新增mapred-site.xml文件

cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml

編輯$HADOOP_HOME/etc/hadoop/mapred-site.xml,將<configuration/>標簽配置成以下內容

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

配置yarn-site.xml

編輯$HADOOP_HOME/etc/hadoop/yarn-site.xml,將<configuration/>標簽配置成以下內容

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

格式化NameNode

執行以下命令,將NameNode格式化

hdfs namenode -format

啟動Hadoop

執行以下命令,啟動Hadoop

cd $HADOOP_HOME/sbin
./start-all.sh

啟動過程中如果顯示以下信息,請輸入yes 並回車

Are you sure you want to continue connecting (yes/no)?

如果啟動成功,會看到以下顯示

spark@promote:~> cd $HADOOP_HOME/sbin
spark@promote:~/hadoop-2.6.4/sbin> ./start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/spark/hadoop-2.6.4/logs/hadoop-spark-namenode-promote.out
localhost: starting datanode, logging to /home/spark/hadoop-2.6.4/logs/hadoop-spark-datanode-promote.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is 0f:2a:39:18:ac:17:70:0f:24:d7:45:3c:d6:c7:16:59 [MD5].
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/spark/hadoop-2.6.4/logs/hadoop-spark-secondarynamenode-promote.out
starting yarn daemons
starting resourcemanager, logging to /home/spark/hadoop-2.6.4/logs/yarn-spark-resourcemanager-promote.out
localhost: starting nodemanager, logging to /home/spark/hadoop-2.6.4/logs/yarn-spark-nodemanager-promote.out
spark@promote:~/hadoop-2.6.4/sbin>

執行 jps 命令,可以看到所有的進程如下

spark@promote:~> jps
6946 Jps
6647 NodeManager
6378 SecondaryNameNode
6203 DataNode
6063 NameNode
6527 ResourceManager

訪問 http://localhost:50070 ,可以查看Hadoop的Web頁面
hadoop_overview

訪問 http://localhost:8088 ,可以查看Yarn的資源管理頁面
hadoop_cluster

HDFS操作

在HDFS上創建用戶目錄

執行以下命令,在HDFS上創建用戶目錄

hdfs dfs -mkdir -p /user/liyanjie

上傳本地文件到HDFS對應用戶目錄

執行以下命令,將本地文件上傳到HDFS

hdfs dfs -put $HADOOP_HOME/etc/hadoop/*.xml /user/liyanjie

查看HDFS文件列表

執行以下命令,查看HDFS上的文件列表

hdfs dfs -ls /user/liyanjie

如果命令執行成功,將會看到以下顯示

spark@promote:~> hdfs dfs -ls /user/liyanjie
Found 9 items
-rw-r--r--   1 spark supergroup       4436 2016-08-19 23:41 /user/liyanjie/capacity-scheduler.xml
-rw-r--r--   1 spark supergroup       1077 2016-08-19 23:41 /user/liyanjie/core-site.xml
-rw-r--r--   1 spark supergroup       9683 2016-08-19 23:41 /user/liyanjie/hadoop-policy.xml
-rw-r--r--   1 spark supergroup       1130 2016-08-19 23:41 /user/liyanjie/hdfs-site.xml
-rw-r--r--   1 spark supergroup        620 2016-08-19 23:41 /user/liyanjie/httpfs-site.xml
-rw-r--r--   1 spark supergroup       3523 2016-08-19 23:41 /user/liyanjie/kms-acls.xml
-rw-r--r--   1 spark supergroup       5511 2016-08-19 23:41 /user/liyanjie/kms-site.xml
-rw-r--r--   1 spark supergroup        862 2016-08-19 23:41 /user/liyanjie/mapred-site.xml
-rw-r--r--   1 spark supergroup        758 2016-08-19 23:41 /user/liyanjie/yarn-site.xml

運行wordcount實例

執行以下命令,運行wordcount實例

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount /user/liyanjie /output

命令執行成功,將會看到以下顯示

spark@promote:~/hadoop-2.6.4/sbin> hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount /user/liyanjie /output
16/08/20 11:30:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/20 11:30:44 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/08/20 11:30:45 INFO input.FileInputFormat: Total input paths to process : 9
16/08/20 11:30:45 INFO mapreduce.JobSubmitter: number of splits:9
16/08/20 11:30:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1471663809891_0001
16/08/20 11:30:46 INFO impl.YarnClientImpl: Submitted application application_1471663809891_0001
16/08/20 11:30:46 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1471663809891_0001/
16/08/20 11:30:46 INFO mapreduce.Job: Running job: job_1471663809891_0001
16/08/20 11:30:52 INFO mapreduce.Job: Job job_1471663809891_0001 running in uber mode : false
16/08/20 11:30:52 INFO mapreduce.Job:  map 0% reduce 0%
16/08/20 11:31:02 INFO mapreduce.Job:  map 44% reduce 0%
16/08/20 11:31:03 INFO mapreduce.Job:  map 67% reduce 0%
16/08/20 11:31:05 INFO mapreduce.Job:  map 78% reduce 0%
16/08/20 11:31:06 INFO mapreduce.Job:  map 89% reduce 0%
16/08/20 11:31:08 INFO mapreduce.Job:  map 100% reduce 0%
16/08/20 11:31:09 INFO mapreduce.Job:  map 100% reduce 100%
16/08/20 11:31:09 INFO mapreduce.Job: Job job_1471663809891_0001 completed successfully
16/08/20 11:31:09 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=21822
                FILE: Number of bytes written=1111057
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=28641
                HDFS: Number of bytes written=10525
                HDFS: Number of read operations=30
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=9
                Launched reduce tasks=1
                Data-local map tasks=9
                Total time spent by all maps in occupied slots (ms)=50573
                Total time spent by all reduces in occupied slots (ms)=4388
                Total time spent by all map tasks (ms)=50573
                Total time spent by all reduce tasks (ms)=4388
                Total vcore-milliseconds taken by all map tasks=50573
                Total vcore-milliseconds taken by all reduce tasks=4388
                Total megabyte-milliseconds taken by all map tasks=51786752
                Total megabyte-milliseconds taken by all reduce tasks=4493312
        Map-Reduce Framework
                Map input records=789
                Map output records=2880
                Map output bytes=36676
                Map output materialized bytes=21870
                Input split bytes=1041
                Combine input records=2880
                Combine output records=1262
                Reduce input groups=603
                Reduce shuffle bytes=21870
                Reduce input records=1262
                Reduce output records=603
                Spilled Records=2524
                Shuffled Maps =9
                Failed Shuffles=0
                Merged Map outputs=9
                GC time elapsed (ms)=1389
                CPU time spent (ms)=5120
                Physical memory (bytes) snapshot=2571784192
                Virtual memory (bytes) snapshot=18980491264
                Total committed heap usage (bytes)=1927282688
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=27600
        File Output Format Counters 
                Bytes Written=10525

執行以下命令,可以看到wordcount的結果文件已經在/output目錄下生成

hdfs dfs -ls /output

屏幕顯示

spark@promote:~> hdfs dfs -ls /output
Found 2 items
-rw-r--r--   1 spark supergroup          0 2016-08-20 11:17 /output/_SUCCESS
-rw-r--r--   1 spark supergroup      10525 2016-08-20 11:17 /output/part-r-00000

執行以下命令,將wordcount的結果文件下載到$HOME

hdfs dfs -get /output/part-r-00000 ${HOME}

也可以通過訪問 http://localhost:50070 ,使用菜單欄的“Utilities”→“Browse the file system”,將wordcount的結果文件下載下來。

安裝Spark

本節的操作都以spark用戶執行
將spark-2.0.0-bin-hadoop2.6.tgz上傳到$HOME,執行以下命令,將壓縮包解壓

tar zxvf spark-2.0.0-bin-hadoop2.6.tgz

編輯$HOME/.profile文件,在文件末尾添加以下內容

export SPARK_HOME=/home/spark/spark-2.0.0-bin-hadoop2.6
export PATH=$SPARK_HOME/bin:$PATH

執行以下命令,驗證hadoop的環境變量信息配置是否正確

source $HOME/.profile
echo $SPARK_HOME

如果配置正確,會看到以下顯示

spark@promote:~> source $HOME/.profile
spark@promote:~> echo $SPARK_HOME
/home/spark/spark-2.0.0-bin-hadoop2.6

執行以下命令,將spark-env.sh.template復制並重命名為spark-env.sh

cp $SPARK_HOME/conf/spark-env.sh.template $SPARK_HOME/conf/spark-env.sh

編輯$SPARK_HOME/conf/spark-env.sh,在文件末尾添加以下內容

export SPARK_MASTER_IP=localhost
export SPARK_WORKER_MEMORY=1000m

執行以下命令,啟動Spark

cd $SPARK_HOME/sbin
./start-all.sh

執行以下命令,測試Spark是否安裝成功

cd $SPARK_HOME/bin
./run-example SparkPi

若Spark安裝成功,看看到以下顯示

spark@promote:~/spark-2.0.0-bin-hadoop2.6/bin> ./run-example SparkPi
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/08/20 11:48:54 INFO SparkContext: Running Spark version 2.0.0
16/08/20 11:48:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/20 11:48:54 WARN Utils: Your hostname, promote resolves to a loopback address: 127.0.0.1; using 192.168.0.108 instead (on interface eth0)
16/08/20 11:48:54 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/08/20 11:48:54 INFO SecurityManager: Changing view acls to: spark
16/08/20 11:48:54 INFO SecurityManager: Changing modify acls to: spark
16/08/20 11:48:54 INFO SecurityManager: Changing view acls groups to: 
16/08/20 11:48:54 INFO SecurityManager: Changing modify acls groups to: 
16/08/20 11:48:54 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(spark); groups with view permissions: Set(); users  with modify permissions: Set(spark); groups with modify permissions: Set()
16/08/20 11:48:55 INFO Utils: Successfully started service 'sparkDriver' on port 54474.
16/08/20 11:48:55 INFO SparkEnv: Registering MapOutputTracker
16/08/20 11:48:55 INFO SparkEnv: Registering BlockManagerMaster
16/08/20 11:48:55 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-5aa2d4f8-4ccb-48dd-b1e7-95ba3d75fa9c
16/08/20 11:48:55 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
16/08/20 11:48:55 INFO SparkEnv: Registering OutputCommitCoordinator
16/08/20 11:48:55 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/08/20 11:48:55 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.108:4040
16/08/20 11:48:55 INFO SparkContext: Added JAR file:/home/spark/spark-2.0.0-bin-hadoop2.6/examples/jars/scopt_2.11-3.3.0.jar at spark://192.168.0.108:54474/jars/scopt_2.11-3.3.0.jar with timestamp 1471664935636
16/08/20 11:48:55 INFO SparkContext: Added JAR file:/home/spark/spark-2.0.0-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.0.0.jar at spark://192.168.0.108:54474/jars/spark-examples_2.11-2.0.0.jar with timestamp 1471664935636
16/08/20 11:48:55 INFO Executor: Starting executor ID driver on host localhost
16/08/20 11:48:55 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45881.
16/08/20 11:48:55 INFO NettyBlockTransferService: Server created on 192.168.0.108:45881
16/08/20 11:48:55 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.0.108, 45881)
16/08/20 11:48:55 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.108:45881 with 366.3 MB RAM, BlockManagerId(driver, 192.168.0.108, 45881)
16/08/20 11:48:55 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.0.108, 45881)
16/08/20 11:48:55 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
16/08/20 11:48:55 INFO SharedState: Warehouse path is 'file:/home/spark/spark-2.0.0-bin-hadoop2.6/bin/spark-warehouse'.
16/08/20 11:48:56 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
16/08/20 11:48:56 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
16/08/20 11:48:56 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
16/08/20 11:48:56 INFO DAGScheduler: Parents of final stage: List()
16/08/20 11:48:56 INFO DAGScheduler: Missing parents: List()
16/08/20 11:48:56 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
16/08/20 11:48:56 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 366.3 MB)
16/08/20 11:48:56 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1169.0 B, free 366.3 MB)
16/08/20 11:48:56 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.108:45881 (size: 1169.0 B, free: 366.3 MB)
16/08/20 11:48:56 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
16/08/20 11:48:56 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34)
16/08/20 11:48:56 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/08/20 11:48:56 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0, PROCESS_LOCAL, 5478 bytes)
16/08/20 11:48:56 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1, PROCESS_LOCAL, 5478 bytes)
16/08/20 11:48:56 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
16/08/20 11:48:56 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/08/20 11:48:56 INFO Executor: Fetching spark://192.168.0.108:54474/jars/spark-examples_2.11-2.0.0.jar with timestamp 1471664935636
16/08/20 11:48:56 INFO TransportClientFactory: Successfully created connection to /192.168.0.108:54474 after 45 ms (0 ms spent in bootstraps)
16/08/20 11:48:56 INFO Utils: Fetching spark://192.168.0.108:54474/jars/spark-examples_2.11-2.0.0.jar to /tmp/spark-b6f9abc1-cef6-4c72-a66d-ffc727f27d86/userFiles-a9db2c3e-c81e-4b2c-ac24-4742aa25bf42/fetchFileTemp8233164352392794360.tmp
16/08/20 11:48:56 INFO Executor: Adding file:/tmp/spark-b6f9abc1-cef6-4c72-a66d-ffc727f27d86/userFiles-a9db2c3e-c81e-4b2c-ac24-4742aa25bf42/spark-examples_2.11-2.0.0.jar to class loader
16/08/20 11:48:56 INFO Executor: Fetching spark://192.168.0.108:54474/jars/scopt_2.11-3.3.0.jar with timestamp 1471664935636
16/08/20 11:48:56 INFO Utils: Fetching spark://192.168.0.108:54474/jars/scopt_2.11-3.3.0.jar to /tmp/spark-b6f9abc1-cef6-4c72-a66d-ffc727f27d86/userFiles-a9db2c3e-c81e-4b2c-ac24-4742aa25bf42/fetchFileTemp1317824548914840322.tmp
16/08/20 11:48:56 INFO Executor: Adding file:/tmp/spark-b6f9abc1-cef6-4c72-a66d-ffc727f27d86/userFiles-a9db2c3e-c81e-4b2c-ac24-4742aa25bf42/scopt_2.11-3.3.0.jar to class loader
16/08/20 11:48:56 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 959 bytes result sent to driver
16/08/20 11:48:56 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 959 bytes result sent to driver
16/08/20 11:48:57 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 321 ms on localhost (1/2)
16/08/20 11:48:57 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 495 ms on localhost (2/2)
16/08/20 11:48:57 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
16/08/20 11:48:57 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.531 s
16/08/20 11:48:57 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 0.818402 s
Pi is roughly 3.139915699578498
16/08/20 11:48:57 INFO SparkUI: Stopped Spark web UI at http://192.168.0.108:4040
16/08/20 11:48:57 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/08/20 11:48:57 INFO MemoryStore: MemoryStore cleared
16/08/20 11:48:57 INFO BlockManager: BlockManager stopped
16/08/20 11:48:57 INFO BlockManagerMaster: BlockManagerMaster stopped
16/08/20 11:48:57 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/08/20 11:48:57 INFO SparkContext: Successfully stopped SparkContext
16/08/20 11:48:57 INFO ShutdownHookManager: Shutdown hook called
16/08/20 11:48:57 INFO ShutdownHookManager: Deleting directory /tmp/spark-b6f9abc1-cef6-4c72-a66d-ffc727f27d86

安裝Intellij IDEA

將ideaIC-2016.2.2.tar.gz上傳到$HOME目錄,執行以下命令,將壓縮包解壓

tar zxvf ideaIC-2016.2.2.tar.gz

執行以下命令,安裝IDEA

cd $HOME/idea-IC-162.1628.40/bin
./idea.sh

此時會彈出安裝界面
設置導入以前的配置
IDEA_install_1

接受協議
IDEA_install_2

選擇界面風格
IDEA_install_3

生成界面入口
IDEA_install_4

生成啟動腳本,若勾選,則在完成安裝時需要輸入root用戶的密碼
IDEA_install_5

選擇自己需要的工具進行安裝
IDEA_install_6

選擇安裝插件,點擊Scala下方的Install按鈕,安裝Scala插件,安裝完成后,點擊“Start using Intellij IDEA”
IDEA_install_7

輸入root用戶的密碼,點擊“OK”
IDEA_install_8

此時IDEA會啟動,顯示歡迎界面
IDEA_use_1

創建一個Maven管理的Scala工程

點擊IDEA歡迎界面的“Create New Project”,進入工程創建頁面
左側工程類型選擇“Maven”,點擊“Project SDK”右側的“New...”添加當前系統的SDK,點擊“Next”
IDEA_use_2

輸入GroupId、ArtifactId與Version,點擊“Next”
IDEA_use_3

設置工程的名稱及存儲位置,然后點擊“Finish”
IDEA_use_4

工程創建完成,但現在還不能在工程中創建Scala類,需要為工程添加ScalaSDK到External Libraries
點擊菜單欄“File”→“Project Structure...”進入工程設置頁面
IDEA_use_6

進入“Libraries”標簽頁,點擊綠色的+號,添加Scala SDK
IDEA_use_7

直接點擊OK即可
IDEA_use_8

點擊OK,將Scala SDK添加到當前工程
IDEA_use_9

點擊OK即可
IDEA_use_10

編寫scala代碼時,我們可能會需要將其單獨放在一個名為“scala”的srouces root下,添加方法如下
在src目錄上點擊右鍵,添加一個目錄,名為scala
IDEA_use_11

然后在scala目錄上點擊右鍵,選擇“Mark Directory as”→“Sources Root”
IDEA_use_12

在scala目錄下新建包 com.liyanjie.test,然后在這個包中添加一個scala類,類型為Object
IDEA_use_13

IDEA_use_14

編寫HelloWrold類,然后通過菜單欄的“Run”或者“Alt+Shift+F10”來運行。
IDEA_use_15


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM