【譯】Asp.net mvc 使用ITextSharp PDF to HTML （解決img標簽問題）

本文轉載自查看原文 2016-05-30 16:46 2706 HTML轉PDF/ asp.net

前言：因項目需求，需要將HTML代碼轉成PDF。大致上已經實現了，可以是發現使用ITextSharp（我現在的版本是5.5.9）的時候，img標簽中的src只能跟絕對路徑。

在百度上找了一個上午，有一點關聯的解決方案都沒有。最后去谷歌求助，終於找到了。

這是原文：http://www.am22tech.com/html-to-pdf/（可能需要翻牆）

這是我總結后做的一個例子（使用的是第二個解決方法）：http://files.cnblogs.com/files/zuochengsi-9/HTML%E8%BD%ACPDF.zip

不懂的也可以參考我的這篇博客：http://www.cnblogs.com/zuochengsi-9/p/5483808.html

------------------------------------------------------------------

正文：

我正在到處尋找完美的例子，可是沒有一個能完美的解決我的需求。我的需求非常簡單，如下：

創建一個PDF文檔從一個HTML頁面。這個HTML里的代碼包含的了img標簽，同時用的相對路徑。

我找到了有價值的信息從這幾個地方http://kuujinbo.info/iTextSharp/tableWithImageToPdf.aspx

和http://hamang.net/2008/08/14/html-to-pdf-in-net/

最終，可以用下面的asp.net代碼解決了我的問題。我希望這也能幫助到你！

必要條件：

Download and copy iTextSharp.dll 我的版本是5.1.1

問題和解決方案：

這個新的ITextSharp庫，對於HTML代碼轉PDF已經做的很好了。可是有個主要的缺陷，圖片的URL映射只能是絕對路徑。

不然HTMLworlker類就會拋異常，如果你用的相對路徑。

這里有兩個解決方法對於這個問題：

1、用IImageProvider 接口取出所有的圖片從HTML代碼中，然后再"paste"PDF中。

但是HTML代碼中對img修飾的style，例如height和width都不會保留下來。

2、解析HTML代碼，同時用絕對的URL替換相對的URL在寫入PDF文件之前。

這個方法會保存HTML代碼中對與<img>設置的height和width。當然這個方法更好。

不過我還是提供兩種解決方案讓你自己去選擇。

基本的准備：

添加一個新的page在你的代碼中

PostToPDF_AM22.aspx

PostToPDF_AM22.aspx.cs

方法1：

using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Web;
    using System.Web.UI;
    using System.Web.UI.WebControls;

    //HTML to PDF 的引用
    using iTextSharp.text;
    using iTextSharp.text.html;
    using iTextSharp.text.pdf;
    using iTextSharp.text.xml;
    using iTextSharp.text.html.simpleparser;
    using System.IO;
    using System.util;
    using System.Text.RegularExpressions;
    //For converting HTML TO PDF- END

    public partial class PostToPDF_AM22 : System.Web.UI.Page
    {
    protected void Page_Load(object sender, EventArgs e)
    {
    //Get the HTML code from your database or whereever you have stored it and store
    //it in HTMLCode variable.
    string HTMLCode = string.Empty;
    ConvertHTMLToPDF(HTMLCode);
    }
    protected void ConvertHTMLToPDF(string HTMLCode)
    {
    HttpContext context = HttpContext.Current;

    //Render PlaceHolder to temporary stream
    System.IO.StringWriter stringWrite = new StringWriter();
    System.Web.UI.HtmlTextWriter htmlWrite = new HtmlTextWriter(stringWrite);

    StringReader reader = new StringReader(HTMLCode);

    //Create PDF document
    Document doc = new Document(PageSize.A4);
    HTMLWorker parser = new HTMLWorker(doc);
    PdfWriter.GetInstance(doc, new FileStream(Server.MapPath("~") + "/App_Data/HTMLToPDF.pdf",

    FileMode.Create));
    doc.Open();

    /********************************************************************************/
    var interfaceProps = new Dictionary<string, Object>();
    var ih = new ImageHander() { BaseUri = Request.Url.ToString() };

    interfaceProps.Add(HTMLWorker.IMG_PROVIDER, ih);

    foreach (IElement element in HTMLWorker.ParseToList(
    new StringReader(HTMLCode), null))
    {
    doc.Add(element);
    }
    doc.Close();
    Response.End();

    /********************************************************************************/

    }

    //handle Image relative and absolute URL's
    public class ImageHander : IImageProvider
    {
    public string BaseUri;
    public iTextSharp.text.Image GetImage(string src,
    IDictionary<string, string> h,
    ChainedProperties cprops,
    IDocListener doc)
    {
    string imgPath = string.Empty;

    if (src.ToLower().Contains("http://") == false)
    {
    imgPath = HttpContext.Current.Request.Url.Scheme + "://" +

    HttpContext.Current.Request.Url.Authority + src;
    }
    else
    {
    imgPath = src;
    }

    return iTextSharp.text.Image.GetInstance(imgPath);
    }
    }
    }

方法2：

using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Web;
    using System.Web.UI;
    using System.Web.UI.WebControls;

    //For converting HTML TO PDF- START
    using iTextSharp.text;
    using iTextSharp.text.html;
    using iTextSharp.text.pdf;
    using iTextSharp.text.xml;
    using iTextSharp.text.html.simpleparser;
    using System.IO;
    using System.util;
    using System.Text.RegularExpressions;
    //For converting HTML TO PDF- END

    public partial class PostToPDF_AM22 : System.Web.UI.Page
    {
    protected void Page_Load(object sender, EventArgs e)
    {
    //Get the HTML code from your database or whereever you have stored it and store
    //it in HTMLCode variable.
    string HTMLCode = string.Empty;
    ConvertHTMLToPDF(HTMLCode);
    }
    protected void ConvertHTMLToPDF(string HTMLCode)
    {
    HttpContext context = HttpContext.Current;

    //Render PlaceHolder to temporary stream
    System.IO.StringWriter stringWrite = new StringWriter();
    System.Web.UI.HtmlTextWriter htmlWrite = new HtmlTextWriter(stringWrite);

    /********************************************************************************/
    //Try adding source strings for each image in content
    string tempPostContent = getImage(HTMLCode);
    /*********************************************************************************/

    StringReader reader = new StringReader(tempPostContent);

    //Create PDF document
    Document doc = new Document(PageSize.A4);
    HTMLWorker parser = new HTMLWorker(doc);
    PdfWriter.GetInstance(doc, new FileStream(Server.MapPath("~") + "/App_Data/HTMLToPDF.pdf",

    FileMode.Create));
    doc.Open();

    try
    {
    //Parse Html and dump the result in PDF file
    parser.Parse(reader);
    }
    catch (Exception ex)
    {
    //Display parser errors in PDF.
    Paragraph paragraph = new Paragraph("Error!" + ex.Message);
    Chunk text = paragraph.Chunks[0] as Chunk;
    if (text != null)
    {
    text.Font.Color = BaseColor.RED;
    }
    doc.Add(paragraph);
    }
    finally
    {
    doc.Close();
    }
    }

    public string getImage(string input)
    {
    if (input == null)
    return string.Empty;
    string tempInput = input;
    string pattern = @"<img(.|\n)+?>";
    string src = string.Empty;
    HttpContext context = HttpContext.Current;

    //Change the relative URL's to absolute URL's for an image, if any in the HTML code.
    foreach (Match m in Regex.Matches(input, pattern, RegexOptions.IgnoreCase | RegexOptions.Multiline |

    RegexOptions.RightToLeft))
    {
    if (m.Success)
    {
    string tempM = m.Value;
    string pattern1 = "src=[\'|\"](.+?)[\'|\"]";
    Regex reImg = new Regex(pattern1, RegexOptions.IgnoreCase | RegexOptions.Multiline);
    Match mImg = reImg.Match(m.Value);

    if (mImg.Success)
    {
    src = mImg.Value.ToLower().Replace("src=", "").Replace("\"", "");

    if (src.ToLower().Contains("http://") == false)
    {
    //Insert new URL in img tag
    src = "src=\"" + context.Request.Url.Scheme + "://" +
    context.Request.Url.Authority + src + "\"";
    try
    {
    tempM = tempM.Remove(mImg.Index, mImg.Length);
    tempM = tempM.Insert(mImg.Index, src);

    //insert new url img tag in whole html code
    tempInput = tempInput.Remove(m.Index, m.Length);
    tempInput = tempInput.Insert(m.Index, tempM);
    }
    catch (Exception e)
    {

    }
    }
    }
    }
    }
    return tempInput;
    }

    string getSrc(string input)
    {
    string pattern = "src=[\'|\"](.+?)[\'|\"]";
    System.Text.RegularExpressions.Regex reImg = new System.Text.RegularExpressions.Regex(pattern,
    System.Text.RegularExpressions.RegexOptions.IgnoreCase |

    System.Text.RegularExpressions.RegexOptions.Multiline);
    System.Text.RegularExpressions.Match mImg = reImg.Match(input);
    if (mImg.Success)
    {
    return mImg.Value.Replace("src=", "").Replace("\"", ""); ;
    }

    return string.Empty;
    }
    }

說明：

上面的兩種方案，都有一個方法ConvertHTMLToPDF，對於得到的HTML代碼的格式是有要求的，具體可以去ITextSharp官網看看。

最后結果存儲的一個PDF文檔的名字叫HTMLToPDF.pdf在你的web站點的App_Data文件夾里

記得，你需要寫代碼去拿到HTML代碼從你的數據庫中或者其他文件里在上面的Page_Load事件中。

通過HTML代碼轉換函數,它將為您創建PDF文件。

如果你面臨任何問題，寫在評論中，我會盡力幫助你。

——————————————————————————————————————

初次翻譯，就直接原樣翻譯了。但通過這次就感覺看英文資料沒有以前那種抗拒感了。果然還是有嘗試，就會有收獲！

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 ASP.Net MVC——使用 ITextSharp 完美解決HTML轉PDF（中文也可以） asp.net MVC設計模式中使用iTextSharp實現html字符串生成PDF文件基於ITextSharp插件在ASP.NET MVC中將圖表導出為PDF 在Asp.Net中操作PDF – iTextSharp - 使用表格 Asp.net MVC中關於@Html標簽Label、Editor使用 ASP. NET MVC項目使用iTextSharp將網頁代碼生成PDF文件 ASP.NET MVC 拓展ActionResult實現Html To Pdf 導出 asp.net mvc前台顯示帶htm標簽的解決辦法(Razor —@Html.Raw()) 【譯】ASP.NET MVC 5 教程 - 1：入門邊看邊譯《asp.net mvc 4 in action》（二）