用正則獲取網頁中的標簽內容

本文轉載自查看原文 2017-10-27 09:19 2841 正則表達式

有個同事想要從html網頁標簽中提取特定內容，讓我幫忙看看。我研究了下，做了個小工具。

目標：匹配出 <p><label id="catalog_FUND">基金：</label> 這個p標簽里面的a標簽的內容

解決方案：由於一次性匹配出來，難度太大，因此可分為兩步走，首先獲取這個p標簽里面的所有a標簽，如下圖所示：

然后，再從這些a標簽中獲取內容，如圖：

正則：

<a[^><]*>([^><]*)</a>

由上面正則可以看出，用的最多的就是[^><]*不包括尖括號的任意多個字符，？表示非貪婪模式，表示在滿足匹配的情況下，盡可能少的匹配a標簽。

附小工具的后台代碼：

 1  private void Readtxt_Click(object sender, RoutedEventArgs e)
 2         {
 3             //從當前目錄獲取文件
 4 
 5             string dir = Environment.CurrentDirectory;
 6 
 7             string path = dir + @"\Content.txt";
 8 
 9             if (File.Exists(path))
10             {
11                 var content = File.ReadAllText(path, Encoding.Default);
12 
13                 this.orginText.AppendText(content);
14             }
15         }
16 
17         private void testMatch_Click(object sender, RoutedEventArgs e)
18         {
19             TextRange textRange = new TextRange(orginText.Document.ContentStart, orginText.Document.ContentEnd);
20             var content = textRange.Text;
21             var pattern = regular.Text;
22 
23             if (pattern != "" && content != "")
24             {
25                 if (Regex.IsMatch(content, pattern, RegexOptions.IgnoreCase | RegexOptions.Multiline))
26                 {
27                     MessageBox.Show("ok");
28                     var maches = Regex.Matches(content, pattern, RegexOptions.IgnoreCase | RegexOptions.Multiline);
29                     if (maches.Count > 0)
30                     {
31                         foreach (Match item in maches)
32                         {
33                             if (item.Success)
34                             {
35                                 if (item.Groups.Count > 0)
36                                 {
37                                     ResultText.AppendText(item.Groups[1].Value);
38                                 }
39                             }
40                         }
41                     }
42                 }
43                 else
44                 {
45                     MessageBox.Show("fail");
46                 }
47             }
48         }

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python 使用xpath獲取網頁標簽內容 js獲取標簽中內容（） php 正則匹配出a標簽級a標簽中的內容 Asp.Net正則獲取頁面a標簽里的內容獲取input標簽中file的內容使用CefSharp獲取A標簽中的內容 JS正則匹配過濾字符串中的html標簽及html標簽內的內容 avascript怎么獲取指定url網頁中的內容獲取網頁內容生成html，並將某些標簽屬性進行修改 (基於python3.6) java獲取網頁內容