.net core + headless chrome實現動態網頁爬蟲

本文轉載自查看原文 2018-05-23 17:50 1064

一般的http請求庫只能夠抓取到網頁的靜態內容，如果想抓取通過js動態生成的內容可以使用沒有gui的browser庫，之前許多人會使用phantomjs作為headless browser，不過現在phantomjs團隊已經宣布停止更新工作，需要一款替代庫，於是這里就采用了headless chrome來進行動態網頁內容抓取。

爬蟲實現如下:

1.在.net core項目中引用如下nuget包

Selenium.WebDriver
Selenium.WebDriver.ChromeDriver

注意:引用Selenium.WebDriver.ChromeDriver后，會在代碼目錄中copy出chromedriver.exe文件，exe文件只能運行與windows平台下，所以我們需要去網站(http://chromedriver.storage.googleapis.com/index.html)下載當前最新的chromedriver程序linux版，並將程序添加到項目中，屬性設置為復制到輸出目錄。這樣導出的程序才可以在linux和windwos平台下都正常運行。

注意2:爬蟲的宿主服務器中需要安裝和chromedriver一致版本的chrome版本(兩個都安裝最新版就可以)

2.爬蟲代碼

class Program
    {
        static void Main(string[] args)
        {
            ChromeOptions op = new ChromeOptions();
            op.AddArguments("--headless");//開啟無gui模式
            op.AddArguments("--no-sandbox");//停用沙箱以在Linux中正常運行
            ChromeDriver cd = new ChromeDriver(Environment.CurrentDirectory, op,TimeSpan.FromSeconds(180));
            cd.Navigate().GoToUrl("http://chart.icaile.com/sd11x5.php");
            string text = cd.FindElementById("fixedtable").Text;
            cd.Quit();
            Console.WriteLine(text);
            Console.Read();
        }
    }

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 爬蟲（三）通過Selenium + Headless Chrome爬取動態網頁使用scrapy-selenium, chrome-headless抓取動態網頁 python爬蟲之動態網頁的加載selenium+chrome（phantonJS） Python爬蟲爬取動態網頁 python應用：爬蟲實例(動態網頁) C#多線程使用webbrowser實現采集動態網頁的爬蟲機器人在python使用selenium獲取動態網頁信息並用BeautifulSoup進行解析--動態網頁爬蟲 java之jsp實現動態網頁 Chrome + Python 抓取動態網頁內容網頁爬蟲--python3.6+selenium+BeautifulSoup實現動態網頁的數據抓取，適用於對抓取頻率不高的情況