安裝Hive2.1
1. 准備工作:安裝JDK、Hadoop
2. 下載並解壓Hive,設置環境變量 HIVE_HOME、PATH
3. 設置Hadoop環境變量
./hadoop fs -mkdir /tmp
./hadoop fs -mkdir /usr/hive/warehouse
./hadoop fs -chmod g+w /tmp
./hadoop fs -chmod g+w /usr/hive/warehouse
4. 修改Hive的配置文件
conf/hive-default.xml.template -> hive-site.xml
conf/hive-log4j.properties.template -> hive-log4j.properties
conf/hive-exec-log4j.properties.template -> hive-exec-log4j.properties
5. 修改 hive-site.xml
替換${system:java.io.tmpdir} 和 ${system:user.name}
:%s@\${system:java.io.tmpdir}@/home/c3/hive_tmp@g
:%s@\${system:user.name}@/c3@g
啟動Hive時報錯
Caused by: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql))
這是由於沒有初始化Hive元數據的數據庫,默認情況下,Hive的元數據保存在了內嵌的derby數據庫里
schematool -initSchema -dbType derby
元數據放入MySQL
1、將mysql-connector-java-5.1.30-bin.jar 放入 $HIVE_HOME/lib下
2、修改 $HIVE_HOME/conf/hive-site.xml里的 數據庫連接串、驅動、用戶和密碼
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://10.1.195.50:3306/hivedb?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>umobile</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>umobile</value> <description>password to use against metastore database</description> </property>
3、初始化Hive在mysql里的腳本 $HIVE_HOME/scripts
schematool -initSchema -dbType mysql
測試Hive
在Hive里創建的表和數據,都保存在了Hadoop里的hdfs上面,hive-site.xml里的 hive.user.install.directory 參數,定義了HDFS的路徑,默認/user
hive 回車
hive> create database hello;
$ hadoop fs -ls /usr/hive/warehouse
Found 1 items drwxrwxr-x - c3 supergroup 0 2016-06-30 17:18 /user/hive/warehouse/hello.db
創建庫表並插入數據
create database test; use test; create table test_table (id int,name string,no int) row format delimited fields terminated by ',' stored as textfile select * from test_table; insert into test_table values (1, 'test', 1);
報如下錯誤:
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
我太激進了,不該用這么新的Hive版本。。
將Hive的版本降到1.2再裝
schematool -initSchema -dbType mysql 報錯:
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
解決方法:
將hive下的新版本jline的JAR包拷貝到hadoop下
hive/lib/jline-2.12.jar 拷貝到 hadoop/share/hadoop/yarn/lib/
hadoop下的jline-0.9.94.jar 重命名為 jline-0.9.94.jar.bak
hive> CREATE TABLE student(id STRING, name String) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; bash> echo 1,wang >> student.txt bash> echo 2,wang >> student.txt bash> echo 3,wang >> student.txt hive> load data local inpath '/home/c3/student.txt' into table student; 要用絕對路徑 hive> select * from student;
注意:不能使用insert into values 語句
以hiveserver方式啟動hive
bash> hive --service hiveserver
報錯:Exception in thread "main" java.lang.ClassNotFoundException: org.apache.hadoop.hive.service.HiveServer
HiveServer本身存在很多問題(比如:安全性、並發性等),針對這些問題,Hive0.11.0版本提供了一個全新的服務:HiveServer2,這個很好的解決HiveServer存在的安全性、並發性等問題。這個服務啟動程序在${HIVE_HOME}/bin/hiveserver2里面,你可以通過下面的方式來啟動HiveServer2服務。
bash> hive --service hiveserver2
bash> hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10001 #指定端口
啟動成功之后,就可以用DBVisualizer訪問Hive了,就像訪問MySQL
Tools > DriverManager... > 新建Hive
創建好Hive驅動,就可以創建Hive庫的鏈接了
Database Type: Generic
Driver:上面創建的驅動
userid/password:隨意輸入
數據准備,先在HDFS上准備文本文件,逗號分割,並上傳到/1目錄,然后在Hive里創建table,表名和文件名要相同,表的存儲路徑也是目錄/1
bash> vim table_test d1,user1,1000 d1,user2,2000 d1,user3,3000 d2,user4,4000 d2,user5,5000 bash> hadoop fs -mkdir /1 bash> hadoop fs -put table_test /1 hive> CREATE EXTERNAL TABLE table_test ( dept STRING, userid string, sal INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile location '/1'; hive> select * from table_test;
另外,SparkSQL也可以作為 JDBC Server,這種方式與上面的Hive作為server的區別在於,SparkSQL使用Spark引擎來執行SQL,而Hive使用MR來執行SQL。
1、hive-site.xml拷貝將hive-site.xml拷貝到¥SPARK_HOME/conf下
2、需要在$SPARK_HOME/conf/spark-env.sh中的SPARK_CLASSPATH添加jdbc驅動的jar包
export SPARK_CLASSPATH=.:/home/Hadoop/software/mysql-connector-java-5.1.27-bin.jar
3、啟動:./start-thriftserver.sh --hiveconf hive.server2.thrift.port=10001 --hiveconf hive.server2.thrift.bind.host=rti9
4、查看日志:tail -f /home/c3/apps/spark-1.6.1-bin-hadoop2.6/logs/spark-c3-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-rti9.out