开始我使用的python3.7 spark2.1 利用sparkStreaming 时出现错误( RuntimeError: generator raised StopIteration):
如下:
python 代码:
import os JAVA_HOME = '/usr/local/java/jdk1.8.0_131' PYSPARK_PYTHON = "/usr/local/python3/python" SPARK_HOME = "/bigdata/spark-2.1.2-bin-hadoop2.3" os.environ["JAVA_HOME"] = JAVA_HOME os.environ["PYSPARK_PYTHON"] = PYSPARK_PYTHON # os.environ["PYSPARK_DRIVER_PYTHON"] = PYSPARK_PYTHON os.environ["PYSPARK_DRIVER_PYTHON"] = PYSPARK_PYTHON # os.environ["SPARK_HOME"] = SPARK_HOME from pyspark import SparkContext from pyspark.streaming import StreamingContext if __name__ == '__main__': sc = SparkContext("local[2]", appName="NetworkWordCount") # 参数2:指定执行计算的时间间隔 ssc = StreamingContext(sc, 1) # 监听ip,端口上的上的数据 lines = ssc.socketTextStream('localhost', 9999) # 将数据按空格进行拆分为多个单词 words = lines.flatMap(lambda line: line.split(" ")) # 将单词转换为(单词,1)的形式 pairs = words.map(lambda word: (word, 1)) # 统计单词个数 wordCounts = pairs.reduceByKey(lambda x, y: x + y) # 打印结果信息,会使得前面的transformation操作执行 wordCounts.pprint() # 启动StreamingContext ssc.start() # 等待计算结束 ssc.awaitTermination()
报错:
发现了老外的网页:https://stackoverflow.com/questions/56591963/runtimeerror-generator-raised-stopiteration-how-to-fix-this-python-issue
说是python3.7 和spark2.1 不兼容
好家伙,去官网下载了spark3.3 就解决了(https://spark.apache.org/downloads.html)
顺便记录一哈,我用的是python,如何安装使用spark
1: 官网下载上传解压
2:配置环境变量 ~/bashrc
3: 将/spark/python/pyspark 复制到 python 的安装包中
4: 运行 /spark/bin 下的 pyspark