本文的文字及圖片來源於網絡,僅供學習、交流使用,不具有任何商業用途,版權歸原作者所有,如有問題請及時聯系我們以作處理
本文章來自騰訊雲 作者:Python進階者
准備數據
import pandas as pd from datetime import datetime, date df = pd.DataFrame({'Date and time': [datetime(2015, 1, 1, 11, 30, 55), datetime(2015, 1, 2, 1, 20, 33), datetime(2015, 1, 3, 11, 10), datetime(2015, 1, 4, 16, 45, 35), datetime(2015, 1, 5, 12, 10, 15)], 'Dates only': [date(2015, 2, 1), date(2015, 2, 2), date(2015, 2, 3), date(2015, 2, 4), date(2015, 2, 5)], 'Numbers': [1010, 2020, 3030, 2020, 1515], 'Percentage': [.1, .2, .33, .25, .5], }) df['final'] = [f"=C{i}*D{i}" for i in range(2, df.shape[0]+2)] df
結果
Pandas直接保存數據
對於這個pandas對象,如果我們需要將其保存為excel,有那些操作方式呢?首先,最簡單的,直接保存:
df.to_excel("demo1.xlsx", sheet_name='Sheet1', index=False)
效果如下:
這說明對於日期類型數據,都可以通過這兩個參數指定特定的顯示格式,那么我們采用以下方式才創建ExcelWriter,並保存結果:
writer = pd.ExcelWriter("demo1.xlsx", datetime_format='mmm d yyyy hh:mm:ss', date_format='mmmm dd yyyy') df.to_excel(writer, sheet_name='Sheet1', index=False) writer.save()
可以看到excel保存的結果中,格式已經確實的發生了改變:
Pandas的Styler對表格着色輸出
如果我們想對指定的列的數據設置文字顏色或背景色,可以直接pandas.io.formats.style工具,該工具可以直接對指定列用指定的規則着色:
df_style = df.style.applymap(lambda x: 'color:red', subset=["Date and time"]) \ .applymap(lambda x: 'color:green', subset=["Dates only"]) \ .applymap(lambda x: ' color: rgb(153, 153, 153); font-variant-ligatures: normal !important; font-variant-numeric: normal !important; font-variant-east-asian: normal !important; font-stretch: normal !important; line-height: normal !important;">, subset=["Numbers"]) \ .background_gradient(cmap="PuBu", low=0, high=0.5, subset=["Percentage"]) df_style
顯示效果:
writer = pd.ExcelWriter("demo_style.xlsx", datetime_format='mmm d yyyy hh:mm:ss', date_format='mmmm dd yyyy') df_style.to_excel(writer, sheet_name='Sheet1', index=False) writer.save()
保存效果:
雖然Pandas的Styler樣式還包括設置顯示格式、條形圖等功能,但寫入到excel卻無效,所以我們只能借助Pandas的Styler實現作色的功能,而且只能對數據着色,不能對表頭作色。
Pandas使用xlsxwriter引擎保存數據
進一步的,我們需要將數值等其他類型的數據也修改一下顯示格式,這時就需要從ExcelWriter拿出其中的workbook進行操作:
writer = pd.ExcelWriter("demo1.xlsx") workbook = writer.book workbook
結果:
<xlsxwriter.workbook.Workbook at 0x52fde10>
從返回的結果可以看到這是一個xlsxwriter對象,說明pandas默認的excel寫出引擎是xlsxwriter,即上面的ExcelWriter創建代碼其實等價於:
pd.ExcelWriter("demo1.xlsx", engine='xlsxwriter')
關於xlsxwriter可以參考官方文檔:https://xlsxwriter.readthedocs.org/
下面的代碼即可給數值列設置特定的格式:
writer = pd.ExcelWriter("demo1.xlsx", engine='xlsxwriter', datetime_format='mmm d yyyy hh:mm:ss', date_format='mmmm dd yyyy') df.to_excel(writer, sheet_name='Sheet1', index=False) workbook = writer.book worksheet = writer.sheets['Sheet1'] worksheet.set_column('A:A', 19) worksheet.set_column('B:B', 17) format1 = workbook.add_format({'num_format': '#,##0.00'}) format2 = workbook.add_format({'num_format': '0%'}) worksheet.set_column('C:C', 8, format1) worksheet.set_column('D:D', 11, format2) worksheet.set_column('E:E', 6, format1) writer.save()
xlsxwriter按照指定樣式寫出Pandas對象的數據
假如,我現在希望能夠定制excel表頭的樣式,並給數據添加邊框。我翻遍了xlsxwriter的API文檔發現,並沒有一個可以修改指定范圍樣式的API,要修改樣式只能通過set_column修改列,或者通過set_row修改行,這種形式的修改都是針對整行和整列,對於顯示格式還能滿足條件,但對於背景色和邊框之類的樣式就不行了,這點上確實不如openpyxl方便,但xlsxwriter還有個優勢,就是寫出數據時可以直接指定樣式。
下面看看如何直接通過xlsxwriter保存指定樣式的數據吧:
import xlsxwriter workbook = xlsxwriter.Workbook('demo2.xlsx') worksheet = workbook.add_worksheet('sheet1') # 創建列名的樣式 header_format = workbook.add_format({ 'bold': True, 'text_wrap': True, 'valign': 'top', 'fg_color': '#D7E4BC', 'border': 1}) # 從A1單元格開始寫出一行數據,指定樣式為header_format worksheet.write_row(0, 0, df.columns, header_format) # 創建一批樣式對象 format1 = workbook.add_format({'border': 1, 'num_format': 'mmm d yyyy hh:mm:ss'}) format2 = workbook.add_format({'border': 1, 'num_format': 'mmmm dd yyyy'}) format3 = workbook.add_format({'border': 1, 'num_format': '#,##0.00'}) format4 = workbook.add_format({'border': 1, 'num_format': '0%'}) # 從第2行(角標從0開始)開始,分別寫出每列的數據,並指定特定的樣式 worksheet.write_column(1, 0, df.iloc[:, 0], format1) worksheet.write_column(1, 1, df.iloc[:, 1], format2) worksheet.write_column(1, 2, df.iloc[:, 2], format3) worksheet.write_column(1, 3, df.iloc[:, 3], format4) worksheet.write_column(1, 4, df.iloc[:, 4], format3) # 設置對應列的列寬,單位是字符長度 worksheet.set_column('A:A', 19) worksheet.set_column('B:B', 17) worksheet.set_column('C:C', 8) worksheet.set_column('D:D', 12) worksheet.set_column('E:E', 6) workbook.close()
import itertools from openpyxl.styles import Alignment, Font, PatternFill, Border, Side, PatternFill font = Font(name="微軟雅黑", bold=True) alignment = Alignment(vertical="top", wrap_text=True) pattern_fill = PatternFill(fill_type="solid", fgColor="D7E4BC") side = Side(style="thin") border = Border(left=side, right=side, top=side, bottom=side) for cell in itertools.chain(*worksheet["A1:E1"]): cell.font = font cell.alignment = alignment cell.fill = pattern_fill cell.border = border
上述代碼引入的了itertools.chain方便迭代出每個單元格,而不用寫多重for循環。
下面再修改數值列的格式:
for cell in itertools.chain(*worksheet["A2:E6"]): cell.border = border for cell in itertools.chain(*worksheet["C2:C6"], *worksheet["E2:E6"]): cell.number_format = '#,##0.00' for cell in itertools.chain(*worksheet["D2:D6"]): cell.number_format = '0%'
最后給各列設置一下列寬:
worksheet.column_dimensions["A"].width = 20 worksheet.column_dimensions["B"].width = 17 worksheet.column_dimensions["C"].width = 10 worksheet.column_dimensions["D"].width = 12 worksheet.column_dimensions["E"].width = 8
最后保存即可:
writer.save()
整體完整代碼:
from openpyxl.styles import Alignment, Font, PatternFill, Border, Side, PatternFill import itertools writer = pd.ExcelWriter("demo3.xlsx", engine='openpyxl', datetime_format='mmm d yyyy hh:mm:ss', date_format='mmmm dd yyyy') df.to_excel(writer, sheet_name='Sheet1', index=False) workbook = writer.book worksheet = writer.sheets['Sheet1'] font = Font(name="微軟雅黑", bold=True) alignment = Alignment(vertical="top", wrap_text=True) pattern_fill = PatternFill(fill_type="solid", fgColor="D7E4BC") side = Side(style="thin") border = Border(left=side, right=side, top=side, bottom=side) for cell in itertools.chain(*worksheet["A1:E1"]): cell.font = font cell.alignment = alignment cell.fill = pattern_fill cell.border = border for cell in itertools.chain(*worksheet["A2:E6"]): cell.border = border for cell in itertools.chain(*worksheet["C2:C6"], *worksheet["E2:E6"]): cell.number_format = '#,##0.00' for cell in itertools.chain(*worksheet["D2:D6"]): cell.number_format = '0%' worksheet.column_dimensions["A"].width = 20 worksheet.column_dimensions["B"].width = 17 worksheet.column_dimensions["C"].width = 10 worksheet.column_dimensions["D"].width = 12 worksheet.column_dimensions["E"].width = 8 writer.save()
最終效果:
from openpyxl import load_workbook workbook = load_workbook('template.xlsx') worksheet = workbook["Sheet1"] # 添加數據列,i表示當前的行號,用於后續格式設置 for i, row in enumerate(df.values, 2): worksheet.append(row.tolist()) # 批量修改給寫入的數據的單元格范圍加邊框 side = Side(style="thin") border = Border(left=side, right=side, top=side, bottom=side) for cell in itertools.chain(*worksheet[f"A2:E{i}"]): cell.border = border # 批量給各列設置指定的自定義格式 for cell in itertools.chain(*worksheet[f"A2:A{i}"]): cell.number_format = 'mmm d yyyy hh:mm:ss' for cell in itertools.chain(*worksheet[f"B2:B{i}"]): cell.number_format = 'mmmm dd yyyy' for cell in itertools.chain(*worksheet[f"C2:C{i}"], *worksheet[f"E2:E{i}"]): cell.number_format = '#,##0.00' for cell in itertools.chain(*worksheet[f"D2:D{i}"]): cell.number_format = '0%' workbook.save(filename="demo4.xlsx")
最終效果:
可以明顯的看到openpyxl在加載模板后,可以省掉表頭設置和列寬設置的代碼。
Pandas自適應列寬保存數據
大多數時候我們並不需要設置自定義樣式,也不需要寫出公式字符串,而是直接寫出最終的結果文本,這時我們就可以使用pandas計算一下各列的列寬再保存excel數據。
例如我們有如下數據:
df = pd.DataFrame({ 'Region': ['East', 'East', 'South', 'North', 'West', 'South', 'North', 'West', 'West', 'South', 'West', 'South'], 'Item': ['Apple', 'Apple', 'Orange', 'Apple', 'Apple', 'Pear', 'Pear', 'Orange', 'Grape', 'Pear', 'Grape', 'Orange'], 'Volume': [9000, 5000, 9000, 2000, 9000, 7000, 9000, 1000, 1000, 10000, 6000, 3000], 'Month': ['July', 'July', 'September', 'November', 'November', 'October', 'August', 'December', 'November', 'April', 'January', 'May'] }) df
# 計算表頭的字符寬度 column_widths = ( df.columns.to_series() .apply(lambda x: len(x.encode('gbk'))).values ) # 計算每列的最大字符寬度 max_widths = ( df.astype(str) .applymap(lambda x: len(x.encode('gbk'))) .agg(max).values ) # 計算整體最大寬度 widths = np.max([column_widths, max_widths], axis=0) widths
結果:
array([6, 6, 6, 9], dtype=int64)
下面將改造一下前面的代碼。
首先,使用xlsxwriter引擎自適應列寬保存數據:
writer = pd.ExcelWriter("auto_column_width1.xlsx", engine='xlsxwriter') df.to_excel(writer, sheet_name='Sheet1', index=False) worksheet = writer.sheets['Sheet1'] for i, width in enumerate(widths): worksheet.set_column(i, i, width) writer.save()
然后,使用openpyxl引擎自適應列寬保存數據(openpyxl引擎設置字符寬度時會縮水0.5左右個字符,所以干脆+1):
from openpyxl.utils import get_column_letter writer = pd.ExcelWriter("auto_column_width2.xlsx", engine='openpyxl') df.to_excel(writer, sheet_name='Sheet1', index=False) worksheet = writer.sheets['Sheet1'] for i, width in enumerate(widths, 1): worksheet.column_dimensions[get_column_letter(i)].width = width+1 writer.save()
可以看到列寬設置的都比較准確。