一 集群規划
使用standalone 模式.18台機器,一台master,17台slave
二 版本
scala-2.11.7.tgz
spark-1.4.1-bin-hadoop2.6.tgz
三 安裝
默認hadoop已經安裝完成,沒有安裝的看hadoop安裝那篇
3.1 安裝scala
$ cd /opt/soft $ tar /home/hadoop/scala-2.11.7.tgz $ mv scala-2.11.7/ scala
3.2 安裝spark
$ tar /home/hadoop/spark-1.4.1-bin-hadoop2.6.tgz $ mv spark-1.4.1-bin-hadoop2.6/ spark
3.3 添加環境變量
/etc/profile 增加如下內容
export SCALA_HOME=/opt/soft/scala export SPARK_HOME=/opt/soft/spark export PATH=$SCALA_HOME/bin:$SPARK_HOME/bin:$PATH
四 配置spark
4.1 配置slaves
$ cd /opt/soft/spark/conf $ cp slaves.template slaves $ cat slaves # A Spark Worker will be started on each of the machines listed below. a02 a03 a04 a05 a06 a07 a08 a09 a10 a11 a12 a13 a14 a15 a16 a17 a18
4.2 配置spark-env.sh
$ cp spark-env.sh.template spark-env.sh $ vim spark-env.sh #公共配置 export SCALA_HOME=/opt/soft/scala/ export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/ export SPARK_LOCAL_DIRS=/opt/soft/spark/ export SPARK_CONF_DIR=/opt/soft/spark/conf/ export SPARK_PID_DIR=/opt/spark/pid_file/ #standalone export SPARK_MASTER_IP=a01 export SPARK_MASTER_PORT=7077 #每個Worker進程所需要的CPU核的數目 export SPARK_WORKER_CORES=4 #每個Worker進程所需要的內存大小 export SPARK_WORKER_MEMORY=9g #每個Worker節點上運行Worker進程的數目 export SPARK_WORKER_INSTANCES=6 #work執行任務使用本地磁盤的位置 export SPARK_WORKER_DIR=/opt/spark/local #web ui端口 export SPARK_MASTER_WEBUI_PORT=8099 #Spark History Server配置 export SPARK_HISTORY_OPTS="-Dspark.history.retainedApplications=20 -Dspark.history.fs.logDirectory=hdfs://a01:9000/user/spark/applicationHistory"
這里配置的是standalone模式,每個配置的值根據具體的機器硬件來配置,但是一定要保證
SPARK_WORKER_CORES * SPARK_WORKER_INSTANCES <= 單台機器cpu總核數
SPARK_WORKER_MEMORY * SPARK_WORKER_INSTANCES <= 單台機器總內存
還有更多配置項可以參考 spark-env.sh.template里
SPARK_HISTORY_OPTS 是配置歷史記錄,詳細的可以參考 http://www.cnblogs.com/luogankun/p/3981645.html
4.3 配置spark-defaults.conf
$ cp spark-defaults.conf.template spark-defaults.conf $ vim spark-defaults.conf #默認使用 standalone 模式 spark.master spark://a01:7077 #Spark History Server 設置 spark.eventLog.enabled true spark.eventLog.dir hdfs://a01:9000/user/spark/applicationHistory
全部配置完成,將spark 重新打包傳到slave節點.
slave節點安裝先做第三步,再解壓剛傳過來的spark即可
五 啟動
$ /opt/soft/spark/sbin/start-all.sh
查看各個機器上的進程是否都有了
5.1 手工啟動worker
在使用start-all.sh啟動的時候,有時候會出現個別worker啟動失敗。或者生產環境中出現有worker下線的情況
這個時候,不想重啟整個集群,把這個worker重新啟動。
a.先找出失敗的worker
這個在webui中可以查找哪台機器上的worker數與實際數不符合的,再去這台機器上查看worker日志就知道是那個worker出問題了,直接kill進程即可
b.重新啟動這個worker
使用命令
$SPARK_HOME/sbin/spark-daemon.sh [--config <conf-dir>] (start|stop|status) <spark-command> <spark-instance-number> <args...>
第一個參數 : --config $SPARK_HOME/conf
第二個參數 : start
第三個參數 : org.apache.spark.deploy.worker.Worker(worker類的路徑)
第四個參數 : 這個worker的號碼,根據機器上已有的worker數來看
第五個參數 : 啟動時的參數,下面是源碼解析參數類 WorkerArguments.scala 中截取,都很清楚,傳自己需要的參數即可
case ("--ip" | "-i") :: value :: tail => Utils.checkHost(value, "ip no longer supported, please use hostname " + value) host = value parse(tail) case ("--host" | "-h") :: value :: tail => Utils.checkHost(value, "Please use hostname " + value) host = value parse(tail) case ("--port" | "-p") :: IntParam(value) :: tail => port = value parse(tail) case ("--cores" | "-c") :: IntParam(value) :: tail => cores = value parse(tail) case ("--memory" | "-m") :: MemoryParam(value) :: tail => memory = value parse(tail) case ("--work-dir" | "-d") :: value :: tail => workDir = value parse(tail) case "--webui-port" :: IntParam(value) :: tail => webUiPort = value parse(tail) case ("--properties-file") :: value :: tail => propertiesFile = value parse(tail)
一個例子
sbin/spark-daemon.sh --config conf/ start org.apache.spark.deploy.worker.Worker 2 --webui-port 8082 -c 4 -m 9G spark://a01:7077
注意 最后master地址是必須要加的
六 jobserver 安裝
jobServer依賴sbt,所以必須先裝好sbt
rpm -ivh https://dl.bintray.com/sbt/rpm/sbt-0.13.7.rpm
安裝git,從git上拉取代碼,啟動
yum install git # 下面clone這個項目 SHELL$ git clone https://github.com/ooyala/spark-jobserver.git # 在項目根目錄下,進入sbt SHELL$ sbt ...... [info] Loading project definition from /home/pingjie/wordspace/spark-jobserver/project > #在本地啟動jobServer(開發者模式) >re-start --- -Xmx4g ...... #此時會下載spark-core,jetty和liftweb等相關模塊。 job-server-extras Starting spark.jobserver.JobServer.main() [success] Total time: 111 s, completed 2015-9-22 9:59:21
然后訪問http://localhost:8090 可以看到Web UI
安裝完成
6.2 API
JARS
GET /jars 列出所有上傳的jars與上次更新時間
POST /jars/<appName> 查出指定名稱appName的jar
Contexts
GET /contexts - 列出當前所有contexts
POST /contexts/<name> - 創建一個新的contexts
DELETE /contexts/<name> - 刪除一個contexts,並停止上面所有的任務
Jobs
GET /jobs
查詢所有jobPOST /jobs
提交一個新jobGET /jobs/
<jobId>
查詢某一任務的結果和狀態GET /jobs/<jobId>/config 查詢job的配置
DELETE /jobs/<jobId> 刪除指定job
6.3 熟悉jobserver的命令
拿job-server-tests測試,先編譯打包,命令跟maven很像
pingjie@pingjie-youku:~/wordspace/spark-jobserver$ sbt job-server-tests/package [info] Loading project definition from /home/pingjie/wordspace/spark-jobserver/project Missing bintray credentials /home/pingjie/.bintray/.credentials. Some bintray features depend on this. Missing bintray credentials /home/pingjie/.bintray/.credentials. Some bintray features depend on this. Missing bintray credentials /home/pingjie/.bintray/.credentials. Some bintray features depend on this. Missing bintray credentials /home/pingjie/.bintray/.credentials. Some bintray features depend on this. [info] Set current project to root (in build file:/home/pingjie/wordspace/spark-jobserver/) [info] scalastyle using config /home/pingjie/wordspace/spark-jobserver/scalastyle-config.xml [info] Processed 5 file(s) [info] Found 0 errors [info] Found 0 warnings [info] Found 0 infos [info] Finished in 4 ms [success] created output: /home/pingjie/wordspace/spark-jobserver/job-server-tests/target [warn] Credentials file /home/pingjie/.bintray/.credentials does not exist [info] Updating {file:/home/pingjie/wordspace/spark-jobserver/}job-server-tests... [info] Resolving org.fusesource.jansi#jansi;1.4 ... [info] Done updating. [info] scalastyle using config /home/pingjie/wordspace/spark-jobserver/scalastyle-config.xml [info] Processed 3 file(s) [info] Found 0 errors [info] Found 0 warnings [info] Found 0 infos [info] Finished in 0 ms [success] created output: /home/pingjie/wordspace/spark-jobserver/job-server-api/target [info] Compiling 5 Scala sources to /home/pingjie/wordspace/spark-jobserver/job-server-tests/target/scala-2.10/classes... [warn] Multiple main classes detected. Run 'show discoveredMainClasses' to see the list [info] Packaging /home/pingjie/wordspace/spark-jobserver/job-server-tests/target/scala-2.10/job-server-tests_2.10-0.5.3-SNAPSHOT.jar ... [info] Done packaging. [success] Total time: 41 s, completed 2015-9-22 10:06:19
顯示成功,在target目錄下已經生成jar包了
#提交一個新的jars
pingjie@pingjie-youku:~/wordspace/spark-jobserver$ curl --data-binary @job-server-tests/target/scala-2.10/job-server-tests_2.10-0.5.3-SNAPSHOT.jar localhost:8090/jars/test OK
#查看當前所有的jars
pingjie@pingjie-youku:~/wordspace/spark-jobserver$ curl localhost:8090/jars { "test": "2015-09-22T10:10:29.815+08:00" }
#提交一個新job,不指定context,會默認創建一個contexts
pingjie@pingjie-youku:~/wordspace/spark-jobserver$ curl -d "input.string= hello job server " 'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample'
{ "status": "STARTED", "result": { "jobId": "64196fca-80da-4c74-9b6f-27c5954ee25c", "context": "bf196647-spark.jobserver.WordCountExample" } }
#提交一個job,不指定context,會默認創建一個contexts
pingjie@pingjie-youku:~/wordspace/spark-jobserver$ curl -X POST -d "input.string= hello job server " 'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample'
{
"status": "STARTED",
"result": {
"jobId": "d09ec0c4-91db-456d-baef-633b5c0ff504",
"context": "7500533c-spark.jobserver.WordCountExample"
}
}
#查看所有job,已經有上面新建的那個了
pingjie@pingjie-youku:~/wordspace/spark-jobserver$ curl 'localhost:8090/jobs' [{ "duration": "0.715 secs", "classPath": "spark.jobserver.WordCountExample", "startTime": "2015-09-22T10:19:34.591+08:00", "context": "bf196647-spark.jobserver.WordCountExample", "status": "FINISHED", "jobId": "64196fca-80da-4c74-9b6f-27c5954ee25c" }]
#查看所有contexts,現在是空的
pingjie@pingjie-youku:~/wordspace/spark-jobserver$ curl 'localhost:8090/contexts' []
#新建一個contexts,並指定使用的cpu數與每個work使用的內存 pingjie@pingjie-youku:~/wordspace/spark-jobserver$ curl -d "" 'localhost:8090/contexts/test-contexts?num-cpu-cores=1&mem-per-node=512m' OK
#再次查看,已經有剛才新建的context了 pingjie@pingjie-youku:~/wordspace/spark-jobserver$ cur'localhost:8090/contexts' ["test-contexts"]
#提交任務,指定contexts
pingjie@pingjie-youku:~/wordspace/spark-jobserver$ curl -X POST -d "input.string= hello job server " 'localhost:8090/jobs?appName=test&classPath=spark.jobserve r.WordCountExample&context=test-contexs&sync=true'
{
"status": "OK",
"result": {
"job": 1,
"hello": 1,
"server": 1
}
}
在jobserver上提交一個任務的是順序應該是
1.提交jar包
2.創建context
3.提交job
也可以創建不創建contexts,可以像上面那樣的方式直接提交job,那樣就會默認創建一個context,並且會占用jobserver剩下的所有資源.
6.4 配置文件
打開配置文件,可以發現master設置為local[4],可以將其改為我們的集群地址。
vim spark-jobserver/config/local.conf.template master = "local[4]"
此外,關於數據對象的存儲方法和路徑:
jobdao = spark.jobserver.io.JobFileDAO filedao { rootdir = /tmp/spark-job-server/filedao/data }
默認context設置,該設置可以被
下面再次在sbt中啟動REST接口的中的參數覆蓋。
# universal context configuration. These settings can be overridden, see README.md context-settings { num-cpu-cores = 2 # 使用的總cpu數. Required. memory-per-node = 512m # 對應spark每個exector節點上使用的內存, -Xmx style eg 512m, #1G, etc. # in case spark distribution should be accessed from HDFS (as opposed to being installed on every mesos slave) # spark.executor.uri = "hdfs://namenode:8020/apps/spark/spark.tgz" # uris of jars to be loaded into the classpath for this context # dependent-jar-uris = ["file:///some/path/present/in/each/mesos/slave/somepackage.jar"] }
基本的使用到此為止,jobServer的部署和項目使用將之后介紹。
6.5 部署
復制config/local.sh.template到local.sh ,並且設置相關參數。 可以在多個主機上配置jobserver,並指定安裝路徑,Spark Home, Spark Conf等參數。
# Environment and deploy file # For use with bin/server_deploy, bin/server_package etc. DEPLOY_HOSTS="a01" APP_USER=hadoop APP_GROUP=hadoop # optional SSH Key to login to deploy server #SSH_KEY=/path/to/keyfile.pem INSTALL_DIR=/opt/soft/job-server LOG_DIR=/opt/soft/job-server/logs PIDFILE=spark-jobserver.pid SPARK_HOME=/opt/soft/spark SPARK_CONF_DIR=$SPARK_HOME/conf # Only needed for Mesos deploys #SPARK_EXECUTOR_URI=/usr/spark/spark-1.4.0-bin-hadoop2.4.tgz # Only needed for YARN running outside of the cluster # You will need to COPY these files from your cluster to the remote machine # Normally these are kept on the cluster in /etc/hadoop/conf # YARN_CONF_DIR=/pathToRemoteConf/conf SCALA_VERSION=2.11.7
部署jobserver,需要漫長的等待。為了配置方便,最好配置好ssh互信
6.6 啟動
./server_start.sh