【Kylin實戰】郵件報表生成


在cube build完成后,我的工作是寫sql生成數據分析郵件報表。但是,問題是這種重復勞動效率低、易出錯、浪費時間。還好Kylin提供RESTful API,可以將這種數據分析需求轉換成HTTP請求。

1. RESTful API

Kylin的認證是basic authentication,加密算法是Base64,加密的明文為username:password;在POST的header進行用戶認證:

curl -c cookiefile.txt -X POST -H "Authorization: Basic QURNSU46S1lMSU4=" -H 'Content-Type: application/json' http://<host>:7070/kylin/api/user/authentication

在認證完成之后,可以復用cookie文件(不再需要重新認證),向Kylin發送GET或POST請求,比如,查詢cube的信息:

curl -b cookiefile.txt -H 'Content-Type: application/json' http://<host>:7070/kylin/api/cubes/kylin_sales_cube

若要向Kylin發送sql query,則POST請求中的data應遵從如下JSON規范:

{  
   "sql":"select * from TEST_KYLIN_FACT",
   "offset":0,
   "limit":50000,
   "acceptPartial":false,
   "project":"DEFAULT"
}

其中,offset為sql中相對記錄首行的偏移量,limit為限制記錄條數;二者在后台處理時都會拼接到sql中去。發送sql query的curl命令:

curl -b cookiefile.txt -X POST -H 'Content-Type: application/json' -d '{"sql":"select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers from kylin_sales group by part_dt", "offset":0, "limit":50000, "acceptPartial":false, "project":"learn_kylin"}' http://<host>:7070/kylin/api/query

curl -b cookiefile.txt -X POST -H 'Content-Type: application/json' -d @sql.json http://<host>:7070/kylin/api/query

2. Python實踐

Python的神模塊requests已封裝好了HTTP請求與返回,好用到爆!Session對象解決了認證、cookie持久化(persistent)的問題:

s = requests.session()
headers = {'Authorization': 'Basic QURNSU46S1lMSU4='}
s.post(url, headers=headers)

Session對象能復用TCP連接,不用生成cookie文件,而進行下一步HTTP請求:

# query cube info
url2 = 'http://<host>:7070/kylin/api/cubes/kylin_sales_cube'
r = s.get(url2)
r.json()

# sql query
url3 = 'http://<host>:7070/kylin/api/query'
sql_str = 'select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers from kylin_sales group by part_dt'
json_str = '{"sql":"' + sql_str + '", "offset": 0, "limit": 50000, acceptPartial": false, "project": "learn_kylin"}'
r = s.post(url3, data=json_str)
results = r.json()['results']

Kylin的sql query的查詢結果在results,其類型為list[list]。因此,封裝Kylin的認證與sql查詢接口如下:

import requests


def authenticate():
    """
    authenticate user
    :return: session
    """
    url = 'http://<host>:7070/kylin/api/user/authentication'
    headers = {'Authorization': 'Basic QURNSU46S1lMSU4='}
    s = requests.session()
    s.headers.update({'Content-Type': 'application/json'})
    s.post(url, headers=headers)
    return s


def query(sql_str, session):
    """
    sql query
    :param sql_str: string of sql 
    :param session: session object
    :return: results(type is list)
    """
    url = 'http://<host>:7070/kylin/api/query'
    json_str = '{"sql":"' + sql_str + '", "offset": 0, "limit": 50000, ' \
                                      '"acceptPartial": false, "project": "xxx"}'
    r = session.post(url, data=json_str)
    results = r.json()['results']
    return results

后面郵件報表的生成,得具體聯系業務需求。這里,分享一下添加郵件附件的方法:

msg = MIMEMultipart()
att1 = MIMEText(open('./resources/xxx.csv', 'rb').read(), 'base64', 'gb2312')
att1["Content-Type"] = 'application/octet-stream'
att1["Content-Disposition"] = 'attachment; filename="xxx.csv"'
msg.attach(att1)


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM