C#中使用正則表達式提取超鏈接地址的集中方法

本文轉載自查看原文 2015-03-19 16:33 2350 C#窗體/ 1.有關於asp.net

一般在做爬蟲或者CMS的時候經常需要提取 href鏈接或者是src地址。此時可以使用正則表達式輕松完成。

Regex reg = new Regex(@"(?is)<a[^>]*?href=(['""]?)(?<url>[^'""\s>]+)\1[^>]*>(?<text>(?:(?!</?a\b).)*)</a>");            
MatchCollection mc = reg.Matches(yourStr);            
foreach (Match m in mc)           
 {                
  richTextBox2.Text += m.Groups["url"].Value + "\n";//得到href值                
  richTextBox2.Text += m.Groups["text"].Value + "\n";//得到<a><a/>中間的內容          
    }
 方法2:
 <PRE class="brush: c-sharp;">Regex r;       
 Match m;         
 r = new Regex("href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))",           
 RegexOptions.IgnoreCase|RegexOptions.Compiled);       
for (m = r.Match(inputString); m.Success; m = m.NextMatch())        
{            
 Console.WriteLine("Found href " + m.Groups[1] + " at " + m.Groups[1].Index);
}
</PRE>

方法3：提取img src的
 <PRE class="brush: c-sharp;">Regex reg = new Regex(@"(?i)<img[^>]*?\ssrc\s*=\s*(['""]?)(?<src>[^'""\s>]+)\1[^>]*>");  
 MatchCollection mc = reg.Matches(yourStr);  
 foreach (Match m in mc)   
 {    Console.Write(m.Groups["src"].Value + "\n");  
 }
 </PRE>

 方法4：
 提取img src
  <PRE class="brush: c-sharp;">
  /// <summary>        
  /// 獲取Img的路徑        
  /// </summary>        
  /// <param name="htmlText">Html字符串文本</param>       
  /// <returns>以數組形式返回圖片路徑</returns>       
    public static string[] GetHtmlImageUrlList(string htmlText)      
  {         
   Regex regImg = new Regex(@"<img\b[^<>]*?\bsrc[\s\t\r\n]*=[\s\t\r\n]*[""']?[\s\t\r\n]*(?<imgUrl>[^\s\t\r\n""'<>]*)[^<>]*?/?[\s\t\r\n]*>", RegexOptions.IgnoreCase);
   //新建一個matches的MatchCollection對象 保存 匹配對象個數(img標簽)
   MatchCollection matches = regImg.Matches(htmlText);
   int i = 0;           
   string[] sUrlList = new string[matches.Count];
   //遍歷所有的img標簽對象           
   foreach (Match match in matches)
    {                
    //獲取所有Img的路徑src,並保存到數組中
    sUrlList[i++] = match.Groups["imgUrl"].Value;         
    }        
         return sUrlList;    
   }</PRE>

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 C#中使用正則表達式提取超鏈接地址的集中方法微軟面試題：正則表達式提取鏈接地址 PHP正則表達式提取html超鏈接中的href地址正則表達式抓取文件內容中的http鏈接地址 C# 中使用正則表達式 Regex.Matches方法的幾個應用[轉] C#正則表達式提取HTML中IMG標簽的URL地址 . C#正則表達式提取HTML中IMG標簽的SRC地址 C#正則表達式提取HTML中IMG標簽中的SRC地址 c#使用正則表達式匹配提取日期 c# 使用正則表達式提取章節小說正文全本篇