我今天下載了greenplum-spark-connector
官方介紹用的是java調用訪問(https://greenplum.cn/2020/03/27/greenplum-spark-connector/),
我用python試了一下也是可以的:
import os
from pyspark.sql import SparkSession
# 指定運行的python版本,可以在環境中配置
os.environ["SPARK_HOME"] = "/opt/cloudera/parcels/CDH/lib/spark" #-6.2.0-1.cdh6.2.0.p0.967373
os.environ["PYSPARK_PYTHON"] = "/usr/bin/python3"
spark = SparkSession.builder.appName('local').getOrCreate()
# 此時需要將greenplum-spark_2.11-1.6.2.jar驅動放到每個節點的/opt/cloudera/parcels/CDH/lib/spark/jars
url = 'jdbc:postgresql://192.168.1.214:5432/xxgl'
table = 'users'
properties = {"user":"gpadmin","password":"11111111"}
df = spark.read.jdbc(url, table, properties=properties)
df.show()
