說明:單機版的Spark的機器上只需要安裝Scala和JDK即可,其他諸如Hadoop、Zookeeper之類的東西可以一概不安裝
只需下載如下三個包
1.安裝jdk
配置環境變量
vim /etc/profile
路徑根據自己的解壓路徑配置
之后將其生效
source /etc/profile
2安裝scala
配置環境變量
同樣執行命令source /etc/profile
3,最后安裝spark
同樣配置環境變量,執行命令使其生效,ps,path中的$PATH必須要加,否則bash腳本失效
那么現在看spark是否能成功啟動
cd之spark的bin目錄,執行./bin/spark-shell
則進入scala交互環境,則成功啟動
寫個python腳本測試下
# _*_ coding:utf-8 _*_ from __future__ import print_function from pyspark.sql import SparkSession from pyspark.sql import Row def json_dataset_example(spark): sc = spark.sparkContext #讀取json串 path = "/home/hadoop/spark-2.2.0-bin-hadoop2.7/mydemo/employees.json" peopleDF = spark.read.json(path) peopleDF.printSchema() peopleDF.createOrReplaceTempView("employees") teenagerNamesDF = spark.sql("SELECT name FROM employees WHERE salary BETWEEN 3500 AND 4500") teenagerNamesDF.show() #直接字符串 jsonStrings = ['{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}'] otherPeopleRDD = sc.parallelize(jsonStrings) otherPeople = spark.read.json(otherPeopleRDD) otherPeople.show() if __name__ == "__main__": spark = SparkSession \ .builder \ .appName("myPeople demo") \ .getOrCreate() json_dataset_example(spark) spark.stop()
提交測試腳本
輸出
沒毛病,收工