基於Hadoop2.7.3集群數據倉庫Hive1.2.2的部署及使用

本文轉載自查看原文 2017-12-06 15:41 1050

HBase是一種分布式、面向列的NoSQL數據庫，基於HDFS存儲，以表的形式存儲數據，表由行和列組成，列划分到列族中。HBase不提供類SQL查詢語言，要想像SQL這樣查詢數據，可以使用Phonix，讓SQL查詢轉換成hbase的掃描和對應的操作，也可以使用現在說講Hive倉庫工具，讓HBase作為Hive存儲。

Hive是運行在Hadoop之上的數據倉庫，將結構化的數據文件映射為一張數據庫表，提供簡單類SQL查詢語言，稱為HQL，並將SQL語句轉換成MapReduce任務運算。有利於利用SQL語言查詢、分析數據，適於處理不頻繁變動的數據。Hive底層可以是HBase或者HDFS存儲的文件。
兩者都是基於Hadoop上不同的技術，相互結合使用，可處理企業中不同類型的業務，利用Hive處理非結構化離線分析統計，利用HBase處理在線查詢。

1.安裝hive通過二進制包安裝
下載地址:http://mirrors.shuosc.org/apache/hive/stable/apache-hive-1.2.2-bin.tar.gz
tar -zxf apache-hive-1.2.2-bin.tar.gz

配置環境變量

# vi /etc/profile
HIVE_HOME=/data/yunva/apache-hive-1.2.2-bin
PATH=$PATH:$HIVE_HOME/bin
export HIVE_NAME PATH
# source /etc/profile

2.安裝mysql，存儲hive相關的信息(此處因為資源使用問題，mysql安裝在了另外的服務器中)

# yum install -y mariadb mariadb-server
# systemctl start mariadb

在MySQL創建Hive元數據存放庫和連接用戶

mysql>create database hive;
mysql>grant all on *.* to'hive'@'%' identified by 'hive';
mysql>flush privileges;

3.配置hive

cd /data/yunva/apache-hive-1.2.2-bin/conf
cp hive-default.xml.template hive-default.xml

配置hive連接mysql的信息
# vim hive-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://10.10.11.214:3306/hive?createDatabaseIfNotExist=true</value>
        <description>JDBC connect string for a JDBC metastore</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore</description>  
    </property>          

    <property> 
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
        <description>username to use against metastore database</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>hive</value>
        <description>password to use against metastore database</description>
    </property>
</configuration>

4.安裝java連接mysql的驅動
下載地址：https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.45.tar.gz
將解壓的mysql-connector-java-5.1.45-bin.jar放到/data/yunva/apache-hive-1.2.2-bin/lib目錄

5.啟動Hive服務

# hive --service metastore &

[root@test3 apache-hive-1.2.2-bin]# ps -ef|grep hive
root      4302  3176 99 14:09 pts/0    00:00:06 /usr/java/jdk1.8.0_65/bin/java -Xmx256m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/yunva/hadoop-2.7.3/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/data/yunva/hadoop-2.7.3 -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /data/yunva/apache-hive-1.2.2-bin/lib/hive-service-1.2.2.jar org.apache.hadoop.hive.metastore.HiveMetaStore
root      4415  3176  0 14:09 pts/0    00:00:00 grep hive
[root@test3 apache-hive-1.2.2-bin]# jps
15445 HRegionServer
4428 Jps
4302 RunJa # hive會啟動叫做RunJa的程序

客戶端配置，需要集成Hadoop環境
scp -P 48490 -r apache-hive-1.2.2-bin 10.10.114.112:/data/yunva

配置環境變量：
vim /etc/profile

# hive client
HIVE_HOME=/data/yunva/apache-hive-1.2.2-bin
PATH=$PATH:$HIVE_HOME/bin
export HIVE_NAME PATH

# vi hive-site.xml(或者直接使用原有配置不變，此時hive就有兩個服務端了)

<configuration>
<!--通過thrift方式連接hive-->
   <property>
       <name>hive.metastore.uris</name>
        <value>thrift://hive_server_ip:9083</value>
   </property>
</configuration>

簡單測試：
執行hive命令會進入命令界面：

[root@test3 apache-hive-1.2.2-bin]# hive

Logging initialized using configuration in jar:file:/data/yunva/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 1.158 seconds, Fetched: 1 row(s)

hive> create database yunvatest;
hive> use yunvatest;
OK
Time taken: 0.021 seconds
hive> show databases;
OK
default
yunvatest
Time taken: 0.225 seconds, Fetched: 2 row(s)
hive> create table table_test(id string,name string);
OK
Time taken: 0.417 seconds
hive> show tables;
OK
table_test
Time taken: 0.033 seconds, Fetched: 1 row(s)

6.Hive常用SQL命令
6.1先創建一個測試庫

hive> create database test;
hive> use test;

創建tb1表，並指定字段分隔符為tab鍵（否則會插入NULL）

hive> create table tb1(id int,name string) row format delimited fields terminated by '\t';

如果想再創建一個表，而且表結構和tb1一樣，可以這樣：
hive> create table table2 like tb1;

查看下表結構：
hive> describe table2;
OK
id int
name string
Time taken: 0.126 seconds, Fetched: 2 row(s)

6.2從本地文件中導入數據到Hive表
先創建數據文件，鍵值要以tab鍵空格：

# cat seasons.txt
1    spring
2    summer
3    autumn
4    winter

再導入數據：
hive> load data local inpath'/root/seasons.txt' overwrite into table tb1;
查詢是否導入成功

hive> select * from tb1;
OK
1    spring
2    summer
3    autumn
4    winter

6.3從HDFS中導入數據到Hive表:

列出hdfs文件系統根目錄下的目錄
hadoop fs -ls /

創建test根目錄
hadoop fs -mkdir /test
put 命令向/test目錄寫入文件為siji.txt
hadoop fs -put /root/seasons.txt /test/siji.txt

查看siji.txt文件內容

# hadoop fs -cat /test/siji.txt
17/12/06 14:54:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1    spring
2    summer
3    autumn
4    winte

hive> load data inpath '/test/siji.txt' overwrite into table table2;
Loading data to table test.table2
Table test.table2 stats: [numFiles=1, numRows=0, totalSize=36, rawDataSize=0]
OK
Time taken: 0.336 seconds

查詢是否導入成功

hive> select * from table2;
OK
1    spring
2    summer
3    autumn
4    winter
Time taken: 0.074 seconds, Fetched: 4 row(s)

6.4上面是基本表的簡單操作，為了提高處理性能，Hive引入了分區機制，那我們就了解分區表概念：

1>.分區表是在創建表時指定的分區空間
2>.一個表可以有一個或多個分區，意思把數據划分成塊
3>.分區以字段的形式在表結構中，不存放實際數據內容
分區表優點：將表中數據根據條件分配到不同的分區中，縮小查詢范圍，提高檢索速度和處理性能

6.5單分區表：
創建單分區表tb2（HDFS表目錄下只有一級目錄）：
hive> create table tb2(id int,name string) partitioned by (dt string) row format delimited fields terminated by '\t';

注：dt可以理解為分區名稱。

從文件中把數據導入到Hive分區表，並定義分區信息(需要已經存在的表)

hive> load data local inpath '/root/seasons.txt' into table tb2 partition (dt='2017-12-06');
hive> load data local inpath '/root/seasons.txt' into table tb2 partition (dt='2017-12-07');

查看表數據

hive> select * from tb2;
OK
1    spring    2017-12-06
2    summer    2017-12-06
3    autumn    2017-12-06
4    winter    2017-12-06
1    spring    2017-12-07
2    summer    2017-12-07
3    autumn    2017-12-07
4    winter    2017-12-07
Time taken: 0.086 seconds, Fetched: 8 row(s)

查看HDFS倉庫中表目錄變化

[root@test4_haili_dev ~]# hadoop fs -ls -R /user/hive/warehouse/test.db/tb2
17/12/06 15:09:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
drwxrwxrwx   - root supergroup          0 2017-12-06 15:07 /user/hive/warehouse/test.db/tb2/dt=2017-12-06
-rwxrwxrwx   3 root supergroup         36 2017-12-06 15:07 /user/hive/warehouse/test.db/tb2/dt=2017-12-06/seasons.txt
drwxrwxrwx   - root supergroup          0 2017-12-06 15:07 /user/hive/warehouse/test.db/tb2/dt=2017-12-07
-rwxrwxrwx   3 root supergroup         36 2017-12-06 15:07 /user/hive/warehouse/test.db/tb2/dt=2017-12-07/seasons.txt

可以看到tb2表導入的數據根據日期將數據划分到不同目錄下

6.6多分區表：
創建多分區表tb3（HDFS表目錄下有一級目錄，一級目錄下再有子級目錄）

hive> create table table3(id int,name string) partitioned by (dt string,location string) row format delimited fields terminated by '\t';

從文件中把數據導入到Hive分區表，並定義分區信息

hive> load data local inpath '/root/seasons.txt' into table table3 partition (dt='2017-12-06',location='guangzhou');
hive> load data local inpath '/root/seasons.txt' into table table3 partition (dt='2017-12-07',location='shenzhen');

查看表數據

hive> select * from table3;
OK
1    spring    2017-12-06    guangzhou
2    summer    2017-12-06    guangzhou
3    autumn    2017-12-06    guangzhou
4    winter    2017-12-06    guangzhou
1    spring    2017-12-07    shenzhen
2    summer    2017-12-07    shenzhen
3    autumn    2017-12-07    shenzhen
4    winter    2017-12-07    shenzhen

查看HDFS倉庫中表目錄變化

[root@test3 yunva]# hadoop fs -ls -R /user/hive/warehouse/test.db/table3
17/12/06 15:22:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
drwxrwxrwx   - root supergroup          0 2017-12-06 15:19 /user/hive/warehouse/test.db/table3/dt=2017-12-06
drwxrwxrwx   - root supergroup          0 2017-12-06 15:19 /user/hive/warehouse/test.db/table3/dt=2017-12-06/location=guangzhou
-rwxrwxrwx   3 root supergroup         36 2017-12-06 15:19 /user/hive/warehouse/test.db/table3/dt=2017-12-06/location=guangzhou/seasons.txt
drwxrwxrwx   - root supergroup          0 2017-12-06 15:20 /user/hive/warehouse/test.db/table3/dt=2017-12-07
drwxrwxrwx   - root supergroup          0 2017-12-06 15:20 /user/hive/warehouse/test.db/table3/dt=2017-12-07/location=shenzhen
-rwxrwxrwx   3 root supergroup         36 2017-12-06 15:20 /user/hive/warehouse/test.db/table3/dt=2017-12-07/location=shenzhen/seasons.txt

可以看到表中一級dt分區目錄下又分成了location分區。

查看表分區信息
hive> show partitions table3;
OK
dt=2017-12-06/location=guangzhou
dt=2017-12-07/location=shenzhen
Time taken: 0.073 seconds, Fetched: 2 row(s)

根據分區查詢數據

hive> select name from table3 where dt='2017-12-06';
OK
spring
summer
autumn
winter
Time taken: 0.312 seconds, Fetched: 4 row(s)

重命名分區
hive> alter table table3 partition (dt='2017-12-06',location='guangzhou') rename to partition(dt='20171206',location='shanghai');

刪除分區
hive> alter table table3 drop partition(dt='2017-12-06',location='guangzhou');
OK
Time taken: 0.113 seconds
可以看到已經查不出來了
hive> select name from table3 where dt='2017-12-06';
OK
Time taken: 0.078 seconds

模糊搜索表
hive> show tables 'tb*';
OK
tb1
tb2

給表新添加一列

hive> alter table tb1 add columns (comment string);
OK
Time taken: 0.106 seconds
hive> describe tb1;
OK
id                      int                                         
name                    string                                      
comment                 string                                      
Time taken: 0.079 seconds, Fetched: 3 row(s)

重命名表
hive> alter table tb1 rename to new_tb1;
OK
Time taken: 0.095 seconds
hive> show tables;
OK
new_tb1
table2
table3
tb2

刪除表
hive> drop table new_tb1;
OK
Time taken: 0.094 seconds
hive> show tables;
OK
table2
table3
tb2

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Linux下Hadoop2.7.3集群環境的搭建 Hadoop2.7.3集群安裝scala-2.12.8 和spark2.7 Hadoop2.3+Hive0.12集群部署 Hadoop整理五（基於Hadoop的數據倉庫Hive）安裝hadoop2.7.3 Hive1.2.2的安裝和配置 Hive和SparkSQL：基於 Hadoop 的數據倉庫工具 hadoop--hive數據倉庫 hadoop2.7.3 搭建 Hadoop2.7.3分布式集群安裝