hive 傳遞變量的兩種方式

本文轉載自查看原文 2016-03-04 13:35 5103 hive

在使用hive開發數據分析代碼時，經常會遇到需要改變運行參數的情況，比如select語句中對日期字段值的設定，可能不同時間想要看不同日期的數據，這就需要能動態改變日期的值。如果開發量較大、參數多的話，使用變量來替代原來的字面值非常有必要，本文總結了幾種可以向hive的SQL中傳入參數的方法，以滿足類似的需要。

准備測試表和測試數據

第一步先准備測試表和測試數據用於后續測試：

hive> create database test;

Time taken: 2.606 seconds

然后執行建表和導入數據的sql文件：

[czt@www.crazyant.net testHivePara]$ hive -f student.sql

Hive history file=/tmp/crazyant.net/hive_job_log_czt_201309131615_1720869864.txt

Time taken: 2.131 seconds

Time taken: 0.878 seconds

Copying data from file:/home/users/czt/testdata_student

Copying file: file:/home/users/czt/testdata_student

Loading data to table test.student

Time taken: 1.76 seconds

其中student.sql內容如下：

use test;

---學生信息表

create table IF NOT EXISTS student(

sno bigint comment '學號' ,

sname string comment '姓名' ,

sage bigint comment '年齡' ,

pdate string comment '入學日期'

)

COMMENT '學生信息表'

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

LINES TERMINATED BY '\n'

STORED AS TEXTFILE;

LOAD DATA LOCAL INPATH

'/home/users/czt/testdata_student'

INTO TABLE student;

testdata_student測試數據文件內容如下：

1 name1 21 20130901

2 name2 22 20130901

3 name3 23 20130901

4 name4 24 20130901

5 name5 25 20130902

6 name6 26 20130902

7 name7 27 20130902

8 name8 28 20130902

9 name9 29 20130903

10 name10 30 20130903

11 name11 31 20130903

12 name12 32 20130904

13 name13 33 20130904

方法1：shell中設置變量，hive -e中直接使用

測試的shell文件名：

#!/bin/bash

tablename="student"

limitcount="8"

hive -S -e "use test; select * from ${tablename} limit ${limitcount};"

運行結果：

[czt@www.crazyant.net testHivePara]$ sh -x shellhive.sh

+ tablename=student

+ limitcount=8

+ hive -S -e 'use test; select * from student limit 8;'

1 name1 21 20130901

2 name2 22 20130901

3 name3 23 20130901

4 name4 24 20130901

5 name5 25 20130902

6 name6 26 20130902

7 name7 27 20130902

8 name8 28 20130902

由於hive自身是類SQL語言，缺乏shell的靈活性和對過程的控制能力，所以采用shell+hive的開發模式非常常見，在shell中直接定義變量，在hive -e語句中就可以直接引用；

注意：使用-hiveconf定義，在hive -e中是不能使用的

修改一下剛才的shell文件，采用-hiveconf的方法定義日期參數：

#!/bin/bash

tablename="student"

limitcount="8"

hive -S \

-hiveconf enter_school_date="20130902" \

-hiveconf min_age="26" \

-e \

" use test; \

select * from ${tablename} \

where \

pdate='${hiveconf:enter_school_date}' \

and \

sage>'${hiveconf:min_age}' \

limit ${limitcount};"

運行會失敗，因為該腳本在shell環境中運行的，於是shell試圖去解析${hiveconf:enter_school_date}和${hiveconf:min_age}變量，但是這兩個SHELL變量並沒有定義，所以會以空字符串放在這個位置。

運行時該SQL語句會被解析成下面這個樣子：

1	+ hive -S -hiveconf enter_school_date=20130902 -hiveconf min_age=26 -e 'use test; explain select * from student where pdate='\'''\'' and sage>'\'''\'' limit 8;'

方法2：使用-hiveconf定義，在SQL文件中使用

因為換行什么的很不方便，hive -e只適合寫少量的SQL代碼，所以一般都會寫很多hql文件，然后使用hive –f的方法來調用，這時候可以通過-hiveconf定義一些變量，然后在SQL中直接使用。

先編寫調用的SHELL文件：

#!/bin/bash

hive -hiveconf enter_school_date="20130902" -hiveconf min_ag="26" -f testvar.sql

被調用的testvar.sql文件內容：

use test;

select * from student

where

pdate='${hiveconf:enter_school_date}'

and

sage > '${hiveconf:min_ag}'

limit 8;

執行過程：

[czt@www.crazyant.net testHivePara]$ sh -x shellhive.sh

+ hive -hiveconf enter_school_date=20130902 -hiveconf min_ag=26 -f testvar.sql

Hive history file=/tmp/czt/hive_job_log_czt_201309131651_2035045625.txt

Time taken: 2.143 seconds

Total MapReduce jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there's no reduce operator

Kill Command = hadoop job -kill job_20130911213659_42303

2013-09-13 16:52:00,300 Stage-1 map = 0%, reduce = 0%

2013-09-13 16:52:14,609 Stage-1 map = 28%, reduce = 0%

2013-09-13 16:52:24,642 Stage-1 map = 71%, reduce = 0%

2013-09-13 16:52:34,639 Stage-1 map = 98%, reduce = 0%

Ended Job = job_20130911213659_42303

7 name7 27 20130902

8 name8 28 20130902

Time taken: 54.268 seconds

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 向docker鏡像中傳遞變量的兩種方式 Intent傳遞數據的兩種方式為每個請求分配traceId的兩種方式及父子線程本地變量傳遞訪問成員變量的兩種方式： Hive兩種訪問方式：HiveServer2 和 Hive Client 給成員變量賦值的兩種方式。 String變量的兩種創建方式 Python如何通過引用傳遞變量？ Restful傳遞數組參數的兩種方式多線程傳遞參數的兩種方式