spark由於一些鏈式的操作,spark 2.1目前只支持hive1.2.1
- hive 1.2安裝
到http://mirror.bit.edu.cn/apache/hive/hive-1.2.1/ 網址下載hive1.2.1的部署包
2.配置系統環境變量/etc/profile
export HIVE_HOME=/opt/hive-1.2.1 export PATH=$PATH:$HIVE_HOME/bin source /etc/profile 使剛剛的配置生效
3. 解壓
tar -xvf apache-hive-1.2.1-bin.tar
mv apache-hive-1.2.1-bin /opt/hive-1.2.1
4.修改配置文件
可不做任何修改hive也能運行,默認的配置元數據是存放在Derby數據庫里面的,大多數人都不怎么熟悉,我們得改用mysql來存儲我們的元數據,以及修改數據存放位置和日志存放位置等使得我們必須配置自己的環境,下面介紹如何配置。
cp $HIVE_HOME/conf/hive-default.xml.template $HIVE_HOME/conf/hive-site.xml cp $HIVE_HOME/conf/hive-env.sh.template $HIVE_HOME/conf/hive-env.sh cp $HIVE_HOME/conf/hive-exec-log4j.properties.template $HIVE_HOME/conf/hive-exec-log4j.properties cp $HIVE_HOME/conf/hive-log4j.properties.template $HIVE_HOME/conf/hive-log4j.properties
修改 hive-env.sh
vi $HIVE_HOME/conf/hive-env.sh
export HADOOP_HOME=/root/hadoop
export HIVE_CONF_DIR=/opt/hive-1.2.1/conf
修改 hive-log4j.properties
mkdir $HIVE_HOME/logs vi $HIVE_HOME/conf/hive-log4j.properties
修改log.dir的目錄 hive.log.dir=/opt/hive-1.2.1/logs
修改 hive-site.xml
rm -rf hive-site.xml
vim hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description> Enforce metastore schema version consistency. True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures proper metastore schema migration. (Default) False: Warn if the version information stored in metastore doesn't match with one from in Hive jars. </description>
</property>
//mysql服務器地址
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatebaseIfNotExist=true</value>
<description>jdbc</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>driver class</description> </property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>admin</value>
<description>jdbc</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>admin</value> <description>jdbc</description>
</property>
</configuration>
導入mysql連接包
cp mysql-connector-java-5.1.17.jar $HIVE_HOME/lib //復制mysql驅動包
5.
//該bin目錄實在hive下的 ./bin/schematool -initSchema -dbType mysql
6.啟動:
./bin/hive
spark-2.0.0與hive-1.2.1整合
SparkSQL與Hive的整合
1. 拷貝$HIVE_HOME/conf/hive-site.xml和hive-log4j.properties到 $SPARK_HOME/conf/
2. 在$SPARK_HOME/conf/目錄中,修改spark-env.sh,添加
export HIVE_HOME=/opt/hive-1.2.1
export SPARK_CLASSPATH=$HIVE_HOME/lib:$SPARK_CLASSPATH
3. 另外也可以設置一下Spark的log4j配置文件,使得屏幕中不打印額外的INFO信息:
log4j.rootCategory=WARN, console
(但還是有信息提示)
進入$SPARK_HOME/bin執行 spark-sql