下載 Pdftk server:https://www.pdflabs.com/tools/pdftk-server/
如果有密碼,先把帶密碼的PDF的轉成無密碼的PDF
pdftk 有密碼.pdf input_pw 密碼 output 無密碼.pdf
如果不帶密碼,上一步可以跳過
提取附件(必須不帶密碼)
pdftk 無密碼.pdf unpack_files 解壓目錄
如果python cmd命令時顯示不存在命令,
加入 os.chdir(pdftk的bin目錄)
完整代碼:
import os def get_attachment(pdf_path,psd,pdftk_bin_folder): pdf_folder_path=pdf_path.strip(pdf_path.split("\\")[-1]) tem_pdf_path=pdf_folder_path+"temp.pdf" decrypt_command=f"pdftk {pdf_path} input_pw {psd} output {tem_pdf_path}" extract_command=f"pdftk {tem_pdf_path} unpack_files output {pdf_folder_path}" os.chdir(pdftk_bin_folder) os.system(decrypt_command) os.system(extract_command) if __name__ == '__main__': # pdf_path = r"C:\Users\86173\Desktop\test\word\2-protected.pdf" # psd = "dfcver" pdf_path = r"C:\Users\86173\Desktop\test\word\無密碼1.pdf" psd = "" pdftk_bin_folder = r"C:\Program Files (x86)\PDFtk Server\bin" try: get_attachment(pdf_path,psd,pdftk_bin_folder) print("提取成功") except Exception as e: print("提取失敗") print(e)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
如果PDF加密等級為1和2,解密可以用PyPDF2,PyPDF3這兩個模塊,如果加密等級為4(包括4)可以用pdffk,如果加密等級為5,可以用pikepdf解密
獲取PDF加密等級可以通過,PyPDF2,3報異常的形式獲取
NotImplementedError: only algorithm code 1 and 2 are supported. This PDF uses code 5