hive的安裝簡單一些,使用也比較簡單,基礎hadoop搭建好之后,只要初始化一些目錄和數據庫就好了
安裝需要做幾件事:
1.設立一個數據源作為元數據存儲的地方,默認是derby內嵌數據庫,不過不允許遠程連接,所以換成mysql
2.配置java路徑和classpath路徑
下載地址: http://mirrors.shuosc.org/apache/hive/hive-2.3.2/
發現一個問題:該地址會變化,所以不一定有效,可以到官網選擇: http://www.apache.org/dyn/closer.cgi/hive/
解壓后先配置hive環境變量
vi /etc/profile
添加:
export HIVE_HOME=/home/sri_udap/app/apache-hive-2.3.2-bin
export PATH=$PATH:$HIVE_HOME/bin
生效:
source /etc/profile
在conf目錄下,拷貝模板進行配置:
mv hive-default.xml.template hive-site.xml mv hive-env.sh.template hive-env.sh
先修改其他兩個配置文件:
修改hadoop的配置文件hadoop-env.sh,修改內容如下:
export HADOOP_CLASSPATH=.:$CLASSPATH:$HADOOP_CLASSPATH:$HADOOP_HOME/bin
這里配置的classpath后,在后面執行hive初始化時仍然一直報java的類錯誤,查閱資料后,把他改成另一種更可靠的方式:
for f in $HADOOP_HOME/hadoop-*.jar; do CLASSPATH=${CLASSPATH}:$f done for f in $HADOOP_HOME/lib/*.jar; do CLASSPATH=${CLASSPATH}:$f done for f in $HIVE_HOME/lib/*.jar; do CLASSPATH=${CLASSPATH}:$f done
在目錄$HIVE_HOME/bin下面,修改文件hive-env.sh,增加以下內容:
export HADOOP_HOME=/home/sri_udap/app/hadoop-2.7.2
export HIVE_CONF_DIR=/home/sri_udap/app/apache-hive-2.3.2-bin/conf
export HIVE_AUX_JARS_PATH=/home/sri_udap/app/apache-hive-2.3.2-bin/lib
修改hive-site.xml文件,修改內容如下:
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hivetest</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hivetest</value> </property>
拷貝一個mysql的連接jar包到lib目錄下,我用的是 mysql-connector-java-5.1.30.jar
然后到hdfs上建立一些基礎目錄hive-site.xml中配置的倉庫地址等,手工創建(包括配置的hive的數據目錄,倉庫地址,日志等,並賦權):
bin/hadoop fs -mkdir -p /user/hive/warehouse bin/hadoop fs -mkdir -p /user/hive/tmp bin/hadoop fs -mkdir -p /user/hive/log bin/hadoop fs -chmod -R 777 /user/hive/warehouse bin/hadoop fs -chmod -R 777 /user/hive/tmp bin/hadoop fs -chmod -R 777 /user/hive/log
這樣就可以開始初始化了,先啟動hadoop,然后在bin目錄下執行命令
./schematool -initSchema -dbType mysql
此時應該有個錯誤:
Exception in thread "main"java.lang.RuntimeException: java.lang.IllegalArgumentException:java.net.URISyntaxException: Relative path in absolute URI:${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D atorg.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:444) atorg.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:672) atorg.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616) atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) atjava.lang.reflect.Method.invoke(Method.java:606) atorg.apache.hadoop.util.RunJar.main(RunJar.java:160) Caused by: java.lang.IllegalArgumentException:java.net.URISyntaxException: Relative path in absolute URI:${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D atorg.apache.hadoop.fs.Path.initialize(Path.java:148) atorg.apache.hadoop.fs.Path.<init>(Path.java:126) atorg.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:487) atorg.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:430) ... 7more Caused by: java.net.URISyntaxException:Relative path in absolute URI:${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D atjava.net.URI.checkPath(URI.java:1804) atjava.net.URI.<init>(URI.java:752) atorg.apache.hadoop.fs.Path.initialize(Path.java:145) ... 10more
這是因為無法識別"system:java.io.tmpdir",換成自己建立的臨時目錄就好,比如我的是:/home/sri_udap/app/apache-hive-2.3.2-bin/temp.
把hive-site.xml中有這個配置的都換掉.其實${system:user.name}這個變量也是不識別的,勤快的話把這個也替換一下,把system:去掉即可,否則會出現跟我一樣的情況,會建立奇怪的目錄:
[root@master temp]# ls
9c9855ee-f160-48d4-ab74-9d597c81bb13_resources c1d48876-f1c9-4f97-bc3a-f9743fecc417_resources ${system:user.name}
再進行一次初始化,然后可以看到mysql中建立了一些表,這樣就完成了建立工作
簡單使用:
建立幾張表:(hive建立表后會在hdfs上多出一個和表明一樣的目錄,然后加載數據后會在目錄下多出文件,在hive中,數據就是目錄和文件)
新建兩張表:
hive>CREATE TABLE t1(id int); // 創建內部表t1,只有一個int類型的id字段 hive>CREATE TABLE t2(id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; // 創建內部表t2,有兩個字段,它們之間通過tab分隔
然后,按照字段分隔要求弄兩個txt文件,並加載到表里面:
[root@master temp]# cat t1.txt
1
2
3
4
5
6
7
9
[root@master temp]# cat t2.txt 1 a 2 b 3 c 9 x
加載數據:
hive>LOAD DATA LOCAL INPATH '/t1.txt' INTO TABLE t1; // 從本地文件加載 hive>LOAD DATA INPATH 't2.txt' INTO TABLE t1; // 從HDFS中加載
此時可以用一些簡單的查詢語句來查詢hive,但是為了生成MapReduce作業,我們將語句寫得稍微復雜些:
hive> select t2.name from t1 left join t2 on t1.id = t2.id; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = root_20171228104347_a63966e5-d32a-41c9-a363-79aef39cac63 Total jobs = 1 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/sri_udap/app/apache-hive-2.3.2-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/sri_udap/app/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] 2017-12-28 10:43:53 Starting to launch local task to process map join; maximum memory = 932184064 2017-12-28 10:43:54 Dump the side-table for tag: 1 with group count: 4 into file: file:/home/sri_udap/app/apache-hive-2.3.2-bin/temp/${system:user.name}/9c9855ee-f160-48d4-ab74-9d597c81bb13/hive_2017-12-28_10-43-47_556_6806677688398200490-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile31--.hashtable 2017-12-28 10:43:54 Uploaded 1 File to: file:/home/sri_udap/app/apache-hive-2.3.2-bin/temp/${system:user.name}/9c9855ee-f160-48d4-ab74-9d597c81bb13/hive_2017-12-28_10-43-47_556_6806677688398200490-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile31--.hashtable (364 bytes) 2017-12-28 10:43:54 End of local task; Time Taken: 1.103 sec. Execution completed successfully MapredLocal task succeeded Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1514424221956_0004, Tracking URL = http://master:8088/proxy/application_1514424221956_0004/ Kill Command = /home/sri_udap/app/hadoop-2.7.2/bin/hadoop job -kill job_1514424221956_0004 Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0 2017-12-28 10:44:10,516 Stage-3 map = 0%, reduce = 0% 2017-12-28 10:44:16,416 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 1.88 sec MapReduce Total cumulative CPU time: 1 seconds 880 msec Ended Job = job_1514424221956_0004 MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 1.88 sec HDFS Read: 5568 HDFS Write: 205 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 880 msec OK a b c
完,有問題歡迎交流