typescript 學習筆記 - 簡單網頁爬蟲1：爬取整個網頁的內容

本文轉載自查看原文 2020-12-11 16:48 380 TypeScript

1. 新建文件夾。 crowller

2. 在文件夾下，進行 npm init -y ,進行初始化，出現package.json文件。

3. 在文件夾下，進行 tsc --init , 新增typescript的配置文件 tsconfig.json

4. 安裝typescript文件，ts-node工具

npm install typescript --save-dev
npm install ts-node --save-dev

5. 在crowller文件夾下，新建 src文件夾，src文件夾下，新建crowller.ts文件

打開package.json文件，在script{}中，寫入命令行

"scripts": {
    "dev": "ts-node ./src/crowller.ts"
  },

6. 打開crowller.ts，進行編寫代碼。

過程中使用到 superagent 這個包，superagent 輕量的Ajax api。

superagent是js編寫，在ts語法中直接引入這個類庫，ts不知道這個類庫會有什么方法。所以需要安裝一個類型定義文件 *.d.ts ：npm i @types/superagent

npm i superagent --save
npm i @types/superagent

import superagent from 'superagent' 
class Crowller {
  private _url: string;
  private rowHtml = '';
  constructor(url:string){
    this._url = url;
    this.getRawHtml();
  };
  async getRawHtml(){
    const result = await superagent.get(this._url)
    this.rowHtml = result.text
  console.log(this.rowHtml);
  }
  get url(){
    return this._url
  }
  set url(url){
    this._url = url
  }

}

const r = new Crowller('https://www.cnblogs.com/shine-lovely/p/12777684.html')

7. 在終端中運行 npm run dev

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 爬蟲基本流程及簡單爬取網頁 java爬蟲-簡單爬取網頁圖片爬蟲學習（八）——帶cookie的網頁進行爬取 7-13爬蟲入門之BeautifulSoup對網頁爬取內容的解析爬蟲爬取多個網頁 Python學習之實現簡單的高並發爬蟲爬取網頁 java爬蟲爬取網頁內容前，對網頁內容的編碼格式進行判斷的方式 Python入門,以及簡單爬取網頁文本內容【網絡爬蟲學習】實戰，爬取網頁以及貼吧數據 python爬蟲學習（四）：爬取網頁圖片-正則解析數據