About
想要將HTML轉為PDF,可以使用前端的插件,但身為一個Python狗,前端又菜的不要不要的,只能想招了.....
經過一番如此這般后,我決定使用wkhtmltopdf
這個非常強大的工具。
又經過一番如此這般后,我准備使用wkhtmltopdf
封裝的Python包PDFKit
。
install wkhtmltopdf
install pdfkit
pip install pdfkit
usage wkhtmltopdf
API說明
我們常用PDFKit
的三個API:
- from_url:將遠程URL頁面導出為PDF。
- from_file:將HTML文件導出為PDF。
- from_string:將字符串導出為PDF。
import pdfkit
pdfkit.from_url('https://www.google.com.hk','out1.pdf')
pdfkit.from_file('123.html','out2.pdf')
pdfkit.from_string('Hello!','out3.pdf')
from_url
def from_url(url, output_path, options=None, toc=None, cover=None,
configuration=None, cover_first=False):
"""
Convert file of files from URLs to PDF document
:param url: 將一個或多個url頁面導出PDF
:param output_path: 導出PDF的文件路徑,如果為False,將以字符串形式返回。
:param options: 可選的 options參數,比如設置編碼
:param toc: (可選)為PDF文件生成目錄
:param cover: (可選),使用HTML文件作為封面。它會帶頁眉和頁腳的TOC之前插入
:param configuration: (可選) 一些配置,來自 pdfkit.configuration.Configuration()
:param configuration_first: (optional) if True, cover always precedes TOC
Returns: True on success
"""
r = PDFKit(url, 'url', options=options, toc=toc, cover=cover,
configuration=configuration, cover_first=cover_first)
return r.to_pdf(output_path)
示例:
import pdfkit
# 需要指定wkhtmltopdf.exe的路徑,就算你添加了path.....
config_pdf = pdfkit.configuration(wkhtmltopdf=r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe')
pdfkit.from_url(url='https://www.cnblogs.com/Neeo/articles/11566990.html', output_path='p3.pdf', configuration=config_pdf)
from_file
def from_file(input, output_path, options=None, toc=None, cover=None, css=None,
configuration=None, cover_first=False):
"""
Convert HTML file or files to PDF document
:param input: path to HTML file or list with paths or file-like object
:param output_path: path to output PDF file. False means file will be returned as string.
:param options: (optional) dict with wkhtmltopdf options, with or w/o '--'
:param toc: (optional) dict with toc-specific wkhtmltopdf options, with or w/o '--'
:param cover: (optional) string with url/filename with a cover html page
:param css: (optional) string with path to css file which will be added to a single input file
:param configuration: (optional) instance of pdfkit.configuration.Configuration()
:param configuration_first: (optional) if True, cover always precedes TOC
Returns: True on success
"""
r = PDFKit(input, 'file', options=options, toc=toc, cover=cover, css=css,
configuration=configuration, cover_first=cover_first)
return r.to_pdf(output_path)
示例:
import pdfkit
config_pdf = pdfkit.configuration(wkhtmltopdf=r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe')
pdfkit.from_file(input='h.html', output_path='p2.pdf', configuration=config_pdf)
可以有多個文件:
import pdfkit
options = {
"encoding": "UTF-8",
"custom-header": [('Accept-Encoding', 'gzip')],
'page-size': 'Letter',
'margin-top': '0.75in',
'margin-right': '0.75in',
'margin-bottom': '0.75in',
'margin-left': '0.75in',
# 'encoding': "UTF-8",
'no-outline': False
}
config_pdf = pdfkit.configuration(wkhtmltopdf=r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe')
pdfkit.from_file(input=['h.html', 'w.html'], output_path='p2.pdf', configuration=config_pdf, options=options)
可以添加css文件:
css='example.css'
pdfkit.from_file('file.html', options=options, css=css)
# Multiple CSS files
css=['example.css','example2.css']
pdfkit.from_file('file.html', options=options, css=css)
from_string
def from_string(input, output_path, options=None, toc=None, cover=None, css=None,
configuration=None, cover_first=False):
"""
Convert given string or strings to PDF document
:param input: 帶有所需文本的字符串。可以是原始文本或html文件
:param output_path: 輸出PDF文件的路徑。False表示文件將作為字符串返回
:param options: (optional) dict with wkhtmltopdf options, with or w/o '--'
:param toc: (optional) dict with toc-specific wkhtmltopdf options, with or w/o '--'
:param cover: (optional) string with url/filename with a cover html page
:param css: (optional) 將添加到輸入字符串的css文件的路徑
:param configuration: (optional) instance of pdfkit.configuration.Configuration()
:param configuration_first: (optional) if True, cover always precedes TOC
Returns: True on success
"""
r = PDFKit(input, 'string', options=options, toc=toc, cover=cover, css=css,
configuration=configuration, cover_first=cover_first)
return r.to_pdf(output_path)
示例:
import pdfkit
config_pdf = pdfkit.configuration(wkhtmltopdf=r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe')
pdfkit.from_string(input='hello pdfkit wkhtmltopdf', output_path='p4.pdf', configuration=config_pdf
歡迎斧正,that's all see also:[PDF之pdfkit](https://www.cnblogs.com/niejinmei/p/8157680.html) | [Python抓取網頁並保存為PDF](