pandas dataframe寫入hdfs csv文件的兩種方式:
1、
from hdfs.client import Client
cleint.write(hdfs_url, df.to_csv(idnex=False), overwrite=True, encoding='utf-8')
2、
with client.write(hdfs_url, overwrite=True) as writer:
df.to_csv(writer, encoding='utf-8', index=False)
推薦使用方法二,寫入效率要比方法一高得多。
從hdfs讀文本數據
from hdfs.client import Client
client = Client("http://localhost:50070")
filepath="test.txt"
with client.read(filepath) as fs:
content = fs.read()
print(content)
從hdfs讀excel
with client.read(filepath) as fs:
content = fs.read()
table = pd.read_excel(content)