Spark(直接讀取mysql中的數據)
兩種方法的目的:進行mysql數據的數據清洗
方法一:
①執行
[root@head42 spark]# spark-shell --jars /opt/spark/jars/mysql-connector-java-5.1.38.jar
②執行
val df=spark.read.format("jdbc").option("delimiter",",").option("header","true").option("url","jdbc:mysql://192.168.56.103:3306/test").option("dbtable","customer").option("user","root").option("password","ok").load()
#"dbtable":mysql表名
df.show
若是出現:java.sql.SQLException: No suitable driver
執行:
[root@head42 ~]# cd /opt/hive/lib/
[root@head42 lib]# cp mysql-connector-java-5.1.38.jar /opt/spark/jars/
再重新運行上面代碼
============================================================
方法二:
①創建sqoop,執行sqoop
#!/bin/bash
sqoop import \
--connect jdbc:mysql://localhost:3306/test \ #test:mysql的數據庫
--table table_name \
--username root \
--password ok \
--target-dir /data/mydata13 \ #指定數據存儲在hdfs的路徑
-m 1 #指定分幾塊
②進入hive創建外部表(外部表的數據是存儲在hdfs上的)
create external table orders(
order_id int,
order_date timestamp,
order_customer_id int,
order_status string
)
row format delimited
fields terminated by ','
location '/data/mydata1'
然后在執行以下命令就可以在spark上進行數據的清洗
Spark 連接hive 元數據庫(mysql)
方法一:
1)打開Hive metastore
[root@head42 ~]# hive --service metastore &
netstat -ano|grep 9083 ???
2)開啟spark連接Mysql
[root@head42 ~]# spark-shell --conf spark.hadoop.hive.metastore.uris=thrift://localhost:9083
3)scala> spark.sql("show tables").show
spark.sql("select * from database_name.table_name")//訪問其他數據庫
+--------+--------------+-----------+
|database| tableName|isTemporary|
+--------+--------------+-----------+
| default| customer| false|
| default|text_customers| false|
+--------+--------------+-----------+
這樣就Ok了!
方法二:
1)拷貝hive的hive-site.xml文件到spark的conf目錄下
2)修改spark中hive-site.xml文件
添加以下:
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
</property>
</configuration>
3)另建窗口啟動:
[root@head42 conf]# hive --service metastore
4)啟動spark:
[root@head42 conf]# spark-shell
5)測試:
spark.sql("select * from database_name.table_name").show//訪問其他數據庫的表格
scala> spark.sql("show tables").show
+--------+--------------+-----------+
|database| tableName|isTemporary|
+--------+--------------+-----------+
| default| customer| false|
| default|text_customers| false|
+--------+--------------+-----------+
這樣就OK了!
