HtmlAgilityPack HTML操作類庫的使用

本文轉載自查看原文 2016-03-30 21:55 1873 C#/ 類庫/ HTML解析/ HtmlAgilityPack

　　HtmlAgilityPack是.NET下的一個開源的HTML解析類庫。支持用XPath來解析HTML。命名空間：HtmlAgilityPack。

1、讀取網絡中html網頁內容，獲取網頁中元素body內的html，處理所有img元素的src屬性后以字符串返回

                    if (l_sWenBenHtmlFtpPath.Substring(l_sWenBenHtmlFtpPath.LastIndexOf(".") + 1) == "html")　　
                    {
                        HtmlWeb htmlWeb = new HtmlWeb();
                        HtmlDocument htmlDoc = htmlWeb.Load(l_sWenBenHtmlFtpPath);
                        HtmlNode htmlNode = htmlDoc.DocumentNode;                                                
                        HtmlNodeCollection nodes = htmlNode.SelectNodes("//body");  //使用xpath語法進行查詢
                        if (nodes != null)
                        {
                            foreach (HtmlNode bodyTag in nodes)
                            {                                
                                HtmlNodeCollection nodes2 = htmlNode.SelectNodes("//img");  //使用xpath語法進行查詢                                
                                if (nodes2 != null)
                                {
                                    foreach (HtmlNode imgTag in nodes2)
                                    {
                                        string imgHttpPath = imgTag.Attributes["src"].Value;
                                        imgTag.Attributes["src"].Value = l_sWenBenHtmlFtpPath.Substring(0, l_sWenBenHtmlFtpPath.LastIndexOf("/") + 1) + imgHttpPath;
                                    }
                                }
                                l_sWenBenHtml = bodyTag.InnerHtml;
                            }
                        }
                    }

2、通過HtmlAgilityPack Html操作類庫將html格式的字符串加載為html文檔對象，再對html dom進行操作

　　　　　　　　　　　　　　　　//1.解碼前台提交的html字串
                                string sDecodeString = HttpUtility.HtmlDecode(HttpUtility.UrlDecode(sEncodeString));
                                //2.拼接成完整的html字串
                                sDecodeString = @"<!DOCTYPE html><html><head><meta http-equiv=""content-type"" content=""text/html;charset=UTF-8""/>"
                                    + @"</head><body><div>" 
　　　　　　　　　　　　　　　　　　+ sDecodeString + @"</div></body></html>";
                                //3.處理html的img標簽的src屬性-C#的HTML DOM操作
                                HtmlDocument doc = new HtmlDocument();
                                doc.LoadHtml(sDecodeString.Replace("\n", " "));
                                HtmlNode node = doc.DocumentNode;
                                HtmlNodeCollection nodes = node.SelectNodes("//img");   //使用xpath語法進行查詢
                                if (nodes != null)  //沒有img節點時出錯
                                {
                                    //處理html字符串中img標簽的src屬性
                                    foreach (HtmlNode imgTag in nodes)
                                    {
                                        string imgHttpPath = imgTag.Attributes["src"].Value;
                                        imgHttpPath = imgHttpPath.Substring(imgHttpPath.LastIndexOf("/") + 1);
                                        imgTag.Attributes["src"].Value = imgHttpPath;                                      
                                    }
                                }
                                //4.獲取處理后的html字符串
                                sHtmlString = node.OuterHtml;    //處理img中src屬性后的html字符串
　　　　　　　　　　　　　　　　//5.將字符串存入html格式的文件中
　　　　　　　　　　　　　　　　//do something

......

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 .Net解析html文檔類庫HtmlAgilityPack完整使用說明 HTML解析組件HtmlAgilityPack使用 .net 使用HtmlAgilityPack做爬蟲 .NET Core HtmlAgilityPack HTML解析利器 HtmlAgilityPack——解析html和采集網頁的神兵利器向HtmlAgilityPack道歉：解析HTML還是你好用 C#使用HtmlAgilityPack快速爬蟲【.NET】使用HtmlAgilityPack抓取網頁數據使用開源免費類庫在.net中操作Excel C# 解析HTML格式字符串（HtmlAgilityPack）