C#用正則表達式 獲取網頁源代碼標簽的屬性或值


1.有url獲取到網頁源代碼:

 1         using System.Web;
 2         using System.IO;
 3         using System.Net;
 4         private void GetHtmlinfo(string PageUrl)
 5         {
 6             WebRequest request = WebRequest.Create(PageUrl);
 7             WebResponse response = request.GetResponse();
 8             Stream resStream = response.GetResponseStream();
 9             StreamReader sr = new StreamReader(resStream, System.Text.Encoding.UTF8);
10             string htmlinfo = sr.ReadToEnd();
11             resStream.Close();
12             sr.Close();       
13            
14         }

2.獲取標簽中的值:

 1          using System.Text.RegularExpressions;
 2          /// 獲取字符中指定標簽的值  
 3       /// </summary>  
 4         /// <param name="str">字符串</param>  
 5         /// <param name="title">標簽</param>  
 6         /// <returns></returns>  
 7         public static string GetTitleContent(string str, string title1, string title2)  
 8         {  
 9             string tmpStr = string.Format("<{0}[^>]*?>(?<Text>[^<]*)</ {1}>", title1, title2); //獲取<title>之間內容  
10   
11             Match TitleMatch = Regex.Match(str, tmpStr, RegexOptions.IgnoreCase);  
12   
13             string result = TitleMatch.Groups["Text"].Value;  
14             return result;  
15         } 

Example:
 HTML 源文件:<span class="t1_tx">現排名:<b class="color1">20</b>

 Parameter: title1 = @"span class=""t1_tx"">現排名:<b class=""color1""";

                  title2 - "b";

3.獲取標簽中的屬性:

 1          /// 獲取字符中指定標簽的值  
 2       /// </summary>  
 3         /// <param name="str">字符串</param>  
 4         /// <param name="title">標簽</param>  
 5         /// <param name="attrib">屬性名</param>  
 6         /// <returns>屬性</returns>  
 7         public static string GetTitleContent(string str, string title,string attrib)  
 8         {  
 9   
10             string tmpStr = string.Format("<{0}[^>]*?{1}=(['\"\"]?)(?<url>[^'\"\"\\s>]+)\\1[^>]*>", title, attrib); //獲取<title>之間內容  
11   
12             Match TitleMatch = Regex.Match(str, tmpStr, RegexOptions.IgnoreCase);  
13   
14             string result = TitleMatch.Groups["url"].Value;  
15             return result;  
16         }  

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM