Spark-Sql整合hive,在spark-sql命令和spark-shell命令下執行sql命令和整合調用hive


1.安裝Hive 
如果想創建一個數據庫用戶,並且為數據庫賦值權限,可以參考:http://blog.csdn.net/tototuzuoquan/article/details/52785504

2.將配置好的hive-site.xml、core-site.xml、hdfs-site.xml放入$SPARK_HOME/conf目錄下

[root@hadoop1 conf]# cd /home/tuzq/software/hive/apache-hive-1.2.1-bin [root@hadoop1 conf]# cp hive-site.xml $SPARK_HOME/conf [root@hadoop1 spark-1.6.2-bin-hadoop2.6]# cd $HADOOP_HOME [root@hadoop1 hadoop]# cp core-site.xml $SPARK_HOME/conf [root@hadoop1 hadoop]# cp hdfs-site.xml $SPARK_HOME/conf 同步spark集群中的conf中的配置 [root@hadoop1 conf]# scp -r * root@hadoop2:$PWD [root@hadoop1 conf]# scp -r * root@hadoop3:$PWD [root@hadoop1 conf]# scp -r * root@hadoop4:$PWD [root@hadoop1 conf]# scp -r * root@hadoop5:$PWD
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

放入進去之后,注意重新啟動Spark集群,關於集群啟動和停止,可以參考:

http://blog.csdn.net/tototuzuoquan/article/details/74481570
  • 1
  • 1

修改spark的log4j打印輸出的日志錯誤級別為Error。修改內容為: 
這里寫圖片描述

3.啟動spark-shell時指定MySQL連接驅動位置

bin/spark-shell --master spark://hadoop1:7077,hadoop2:7077 --executor-memory 1g --total-executor-cores 2 --driver-class-path /home/tuzq/software/spark-1.6.2-bin-hadoop2.6/lib/mysql-connector-java-5.1.38.jar
  • 1
  • 1

如果啟動的過程中報如下錯: 
這里寫圖片描述

可以按照上面的紅框下的url進行檢查: 
https://wiki.apache.org/hadoop/ConnectionRefused 
這里寫圖片描述

4.使用sqlContext.sql調用HQL 
在使用之前先要啟動hive,創建person表:

hive> create table person(id bigint,name string,age int) row format delimited fields terminated by " " ; OK Time taken: 2.152 seconds hive> show tables; OK func person wyp Time taken: 0.269 seconds, Fetched: 3 row(s) hive>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

查看hdfs中person的內容:

[root@hadoop3 ~]# hdfs dfs -cat /person.txt 1 zhangsan 19 2 lisi 20 3 wangwu 28 4 zhaoliu 26 5 tianqi 24 6 chengnong 55 7 zhouxingchi 58 8 mayun 50 9 yangliying 30 10 lilianjie 51 11 zhanghuimei 35 12 lian 53 13 zhangyimou 54 [root@hadoop3 ~]# hdfs dfs -cat hdfs://mycluster/person.txt 1 zhangsan 19 2 lisi 20 3 wangwu 28 4 zhaoliu 26 5 tianqi 24 6 chengnong 55 7 zhouxingchi 58 8 mayun 50 9 yangliying 30 10 lilianjie 51 11 zhanghuimei 35 12 lian 53 13 zhangyimou 54
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28

load數據到person表中:

hive> load data inpath '/person.txt' into table person; Loading data to table default.person Table default.person stats: [numFiles=1, totalSize=193] OK Time taken: 1.634 seconds hive> select * from person; OK 1 zhangsan 19 2 lisi 20 3 wangwu 28 4 zhaoliu 26 5 tianqi 24 6 chengnong 55 7 zhouxingchi 58 8 mayun 50 9 yangliying 30 10 lilianjie 51 11 zhanghuimei 35 12 lian 53 13 zhangyimou 54 Time taken: 0.164 seconds, Fetched: 13 row(s) hive>http://www.woaipu.com/shops/zuzhuan/61406
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
如果是spark-2.1.1-bin-hadoop2.7,它沒有sqlContext,所以要先執行:val sqlContext = new org.apache.spark.sql.SQLContext(sc)
如果是spark-1.6.2-bin-hadoop2.6,不用執行:val sqlContext = new org.apache.spark.sql.SQLContext(sc)
scala> sqlContext.sql("select * from person limit 2") +---+--------+---+ | id| name|age| +---+--------+---+ | 1|zhangsan| 19| | 2| lisi| 20| +---+--------+---+ scala>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

或使用org.apache.spark.sql.hive.HiveContext (同樣是在spark-sql這個shell命令下)

scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext

scala> val hiveContext = new HiveContext(sc)
Wed Jul 12 12:43:36 CST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. Wed Jul 12 12:43:36 CST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@6d9a46d7 scala> hiveContext.sql("select * from person") res2: org.apache.spark.sql.DataFrame = [id: bigint, name: string, age: int] scala> hiveContext.sql("select * from person").show +---+-----------+---+ | id| name|age| +---+-----------+---+ | 1| zhangsan| 19| | 2| lisi| 20| | 3| wangwu| 28| | 4| zhaoliu| 26| | 5| tianqi| 24| | 6| chengnong| 55| | 7|zhouxingchi| 58| | 8| mayun| 50| | 9| yangliying| 30| | 10| lilianjie| 51| | 11|zhanghuimei| 35| | 12| lian| 53| | 13| zhangyimou| 54| +---+-----------+---+ scala>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32

bin/spark-sql \ 
–master spark://hadoop1:7077,hadoop2:7077 \ 
–executor-memory 1g \ 
–total-executor-cores 2 \ 
–driver-class-path /home/tuzq/software/spark-1.6.2-bin-hadoop2.6/lib/mysql-connector-Java-5.1.38.jar

5、啟動spark-shell時指定mysql連接驅動位置

bin/spark-shell --master spark://hadoop1:7077,hadoop2:7077 --executor-memory 1g --total-executor-cores 2 --driver-class-path /home/tuzq/software/spark-1.6.2-bin-hadoop2.6/lib/mysql-connector-java-5.1.38.jar
  • 1
  • 1

5.1.使用sqlContext.sql調用HQL(這里是在spark-shell中執行的命令)

scala> sqlContext.sql("select * from person limit 2")
res0: org.apache.spark.sql.DataFrame = [id: bigint, name: string, age: int]

scala> sqlContext.sql("select * from person limit 2").show +---+--------+---+ | id| name|age| +---+--------+---+ | 1|zhangsan| 19| | 2| lisi| 20| +---+--------+---+ http://www.woaipu.com/shops/zuzhuan/61406 scala>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

或使用org.apache.spark.sql.hive.HiveContext

scala> import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.hive.HiveContext scala> val hiveContext = new HiveContext(sc) 這里是日志,略去 scala> hiveContext.sql("select * from person") res2: org.apache.spark.sql.DataFrame = [id: bigint, name: string, age: int] scala> hiveContext.sql("select * from person").show +---+-----------+---+ | id| name|age| +---+-----------+---+ | 1| zhangsan| 19| | 2| lisi| 20| | 3| wangwu| 28| | 4| zhaoliu| 26| | 5| tianqi| 24| | 6| chengnong| 55| | 7|zhouxingchi| 58| | 8| mayun| 50| | 9| yangliying| 30| | 10| lilianjie| 51| | 11|zhanghuimei| 35| | 12| lian| 53| | 13| zhangyimou| 54| +---+-----------+---+
http://www.woaipu.com/shops/zuzhuan/61406
http://www.woaipu.com/shops/zuzhuan/61406


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM