在cube build完成后,我的工作是寫sql生成數據分析郵件報表。但是,問題是這種重復勞動效率低、易出錯、浪費時間。還好Kylin提供RESTful API,可以將這種數據分析需求轉換成HTTP請求。
1. RESTful API
Kylin的認證是basic authentication,加密算法是Base64,加密的明文為username:password
;在POST的header進行用戶認證:
curl -c cookiefile.txt -X POST -H "Authorization: Basic QURNSU46S1lMSU4=" -H 'Content-Type: application/json' http://<host>:7070/kylin/api/user/authentication
在認證完成之后,可以復用cookie文件(不再需要重新認證),向Kylin發送GET或POST請求,比如,查詢cube的信息:
curl -b cookiefile.txt -H 'Content-Type: application/json' http://<host>:7070/kylin/api/cubes/kylin_sales_cube
若要向Kylin發送sql query,則POST請求中的data應遵從如下JSON規范:
{
"sql":"select * from TEST_KYLIN_FACT",
"offset":0,
"limit":50000,
"acceptPartial":false,
"project":"DEFAULT"
}
其中,offset為sql中相對記錄首行的偏移量,limit為限制記錄條數;二者在后台處理時都會拼接到sql中去。發送sql query的curl命令:
curl -b cookiefile.txt -X POST -H 'Content-Type: application/json' -d '{"sql":"select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers from kylin_sales group by part_dt", "offset":0, "limit":50000, "acceptPartial":false, "project":"learn_kylin"}' http://<host>:7070/kylin/api/query
curl -b cookiefile.txt -X POST -H 'Content-Type: application/json' -d @sql.json http://<host>:7070/kylin/api/query
2. Python實踐
Python的神模塊requests已封裝好了HTTP請求與返回,好用到爆!Session對象解決了認證、cookie持久化(persistent)的問題:
s = requests.session()
headers = {'Authorization': 'Basic QURNSU46S1lMSU4='}
s.post(url, headers=headers)
Session對象能復用TCP連接,不用生成cookie文件,而進行下一步HTTP請求:
# query cube info
url2 = 'http://<host>:7070/kylin/api/cubes/kylin_sales_cube'
r = s.get(url2)
r.json()
# sql query
url3 = 'http://<host>:7070/kylin/api/query'
sql_str = 'select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers from kylin_sales group by part_dt'
json_str = '{"sql":"' + sql_str + '", "offset": 0, "limit": 50000, acceptPartial": false, "project": "learn_kylin"}'
r = s.post(url3, data=json_str)
results = r.json()['results']
Kylin的sql query的查詢結果在results
,其類型為list[list]。因此,封裝Kylin的認證與sql查詢接口如下:
import requests
def authenticate():
"""
authenticate user
:return: session
"""
url = 'http://<host>:7070/kylin/api/user/authentication'
headers = {'Authorization': 'Basic QURNSU46S1lMSU4='}
s = requests.session()
s.headers.update({'Content-Type': 'application/json'})
s.post(url, headers=headers)
return s
def query(sql_str, session):
"""
sql query
:param sql_str: string of sql
:param session: session object
:return: results(type is list)
"""
url = 'http://<host>:7070/kylin/api/query'
json_str = '{"sql":"' + sql_str + '", "offset": 0, "limit": 50000, ' \
'"acceptPartial": false, "project": "xxx"}'
r = session.post(url, data=json_str)
results = r.json()['results']
return results
后面郵件報表的生成,得具體聯系業務需求。這里,分享一下添加郵件附件的方法:
msg = MIMEMultipart()
att1 = MIMEText(open('./resources/xxx.csv', 'rb').read(), 'base64', 'gb2312')
att1["Content-Type"] = 'application/octet-stream'
att1["Content-Disposition"] = 'attachment; filename="xxx.csv"'
msg.attach(att1)