hive 之將excel數據導入hive中 : excel 轉 txt

本文轉載自查看原文 2020-12-24 09:53 1516 hive/ python

一、需求：

1、客戶每月上傳固定格式的excel文件到指定目錄。每月上傳的文件名只有結尾月份不同，如： 10月文件名： zhongdiangedan202010.xlsx ， 11月文件名： zhongdiangedan202011.xlsx

2、將上傳的excel文件導入hive中，在做進一步數據分析。

二、思路：

1、通過python的pandas模塊將excel文件轉換為txt文件；

2、編寫shell腳本，使用 hdfs dfs -put 將txt文件導入指定表（方便每月執行）。

三、pandas.read_excel 包：

def read_excel(io,sheet_name=0,header=0,names=None,index_col=None,usecols=None,squeeze=False,dtype=None,engine=None,converters=None,true_values=None,false_values=None,skiprows=None,nrows=None,na_values=None,keep_default_na=True,na_filter=True,verbose=False,parse_dates=False,date_parser=None,thousands=None,comment=None,skipfooter=0,convert_float=True,mangle_dupe_cols=True)

常用屬性字段說明，如下：

屬性字段	默認字段	含義
io	無，必選	excel文件路徑
sheetname	0	取第一個sheet的內容；取多個sheet內容：sheetname=[0,1]；取全表：sheetname=None
header	0	指定為表頭的行，取第一行為表頭 header = 0；無表頭：header = None；
names	None	提取指定列名的數據
index_col	None	指定多少列為索引列，或指定對應列名為索引列。
usecols	None	指定要取的列，None：取所有列；只能是列表。注：使用次屬性，sheet_name 也必須使用。
skiprows	0	忽略（不取）指定行數的數據，行數為列表形式。如，忽略第1行,第9行數據： skiprows=[1,9]
skip_footer	0	忽略倒數n行數據

四、實現代碼：

excel_to_txt.py

#!/user/bin env python
import pandas as pd
import sys
def excel_to_txt(monthNo):
    df = pd.read_excel('/data/excelfile/zhongdiangedan%s.xlsx'%(monthNo),header=None)  # 使用pandas模塊讀取數據， header 默認0，指定表頭行，None:不取表頭
    print('開始寫入txt文件')
    df.to_csv('/data/txtlfile/zhongdiangedan%s.txt'%(monthNo), header=None, sep='\t', index=False)        # 寫入txt中，tab分隔
    print('文件寫入成功!')

if __name__ == '__main__':
      monthNo=sys.argv[1]
    excel_to_txt(monthNo)

執行python： python excel_to_txt.py 202010

建表：

create table impt_excel_data_zhongdiangedan(
col1 string,
col2 string,
col3 string,
col4 string,
col5 string,
col6 string
)partitioned by (month_id string)
row format delimited fields terminated by '\t'
lines terminated by '\n'
stored as textfile
location '/data/hive/database_name/impt/impt_excel_data_zhongdiangedan';

txt_to_hive.sh

#!/bin/bash
monthNo=$1
hive -e"
        use database_name; -- 指定數據庫 database_name
        alter table tmp_content_zhongdiangedan drop if exists partition (month_id=$monthNo); -- 刪除分區
        alter table tmp_content_zhongdiangedan add partition (month_id=$monthNo); -- 添加分區
"
# 導入數據
hdfs dfs -put /data/txtlfile/zhongdiangedan$monthNo.txt /data/hive/database_name/impt/impt_excel_data_zhongdiangedan/month_id=$monthNo

執行shell: sh txt_to_hive.sh 202010

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Excel數據導入hive步驟將csv或者Excel文件導入到hive Hive 將本地數據導入hive表中 MySQL批量導入Excel、txt數據 excel文件與txt文件互轉，並且把excel里的數據導入到oracle中 python txt文件數據轉excel Excel批量轉txt 如何將mysql數據導入hive中 Hive建表與導入文件中的數據利用Sqoop將MySQL數據導入Hive中

hive 之 將excel數據導入hive中 : excel 轉 txt

免責聲明！

hive 之將excel數據導入hive中 : excel 轉 txt