C# 實現抓取網站頁面內容

本文轉載自查看原文 2013-08-19 14:14 2128 C#基礎

抓取新浪網的新聞欄目，如圖所示：

使用谷歌瀏覽器的查看源代碼：通過分析得知，我們所要找的內容在以下兩個標簽之間：

1 <!-- publish_helper name='要聞-新聞' p_id='1' t_id='850' d_id='1' -->
2 
3 
4 內容。。。。
5 
6 
7 
8 <!-- publish_helper name='要聞-財經' p_id='30' t_id='98' d_id='1' -->

如圖所示：

內容。。。。

使用VS建立一個如圖所示的網站：

我們下載網絡數據主要通過 WebClient 類來實現。

使用下面源代碼獲取我們選擇的內容：

 1         protected void Enter_Click(object sender, EventArgs e)
 2         {
 3             WebClient we = new WebClient();  //主要使用WebClient類
 4             byte[] myDataBuffer;
 5             myDataBuffer = we.DownloadData(txtURL.Text);  //該方法返回的是 字節數組，所以需要定義一個byte[]
 6             string download = Encoding.Default.GetString(myDataBuffer);  //對下載的數據進行編碼
 7 
 8            
 9             //通過查詢源代碼，獲取某兩個值之間的新聞內容
10             int startIndex = download.IndexOf("<!-- publish_helper name='要聞-新聞' p_id='1' t_id='850' d_id='1' -->");
11             int endIndex = download.IndexOf("<!-- publish_helper name='要聞-財經' p_id='30' t_id='98' d_id='1' -->");
12             
13             string temp = download.Substring(startIndex, endIndex - startIndex + 1);  //截取新聞內容
14 
15             lblMessage.Text = temp;//顯示所截取的新聞內容
16         }

效果如圖：

最后: 除了把下載的數據保存為文本以外，還可以保存為文件類型和流類型。

1             WebClient wc = new WebClient();
2             wc.DownloadFile(TextBox1.Text, @"F:\test.txt");
3             Label1.Text = "文件下載完成";

1  　　　　　　WebClient wc = new WebClient();
2             Stream  s =  wc.OpenRead(TextBox1.Text);
3            
4             StreamReader sr = new StreamReader(s);
5             Label1.Text =  sr.ReadToEnd();

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 用C#抓取AJAX頁面的內容 C# 實現對網站數據的采集和抓取 C# asp.net 抓取需要登錄的網頁內容抓取asp.net登錄驗證的網站 C#抓取網頁HTML內容 C# 抓取網頁內容的方法 c#關於網頁內容抓取，簡單爬蟲的實現。（包括動態，靜態的） C# 從需要登錄的網站上抓取數據 C# 從需要登錄的網站上抓取數據 c#實現網頁正文抓取 C#實現網站登錄