web全棧應用【爬取（scrapy）數據 -> 通過restful接口存入數據庫 -> websocket推送展示到前台】

本文轉載自查看原文 2018-09-10 23:59 1211 Scrapy/ restful api/ MongoDB/ vuejs/ WebSocket

作為

https://github.com/fanqingsong/web_full_stack_application

子項目的一功能的核心部分，使用scrapy抓取數據，解析完的數據，使用 python requets庫，將數據推送到 webservice接口上， webservice接口負責保存數據到mongoDB數據庫。

實現步驟：

1、使用requests庫，與webservice接口對接。

2、使用scrapy抓取數據。

3、結合1 2 實現完整功能。

Requests庫（Save to DB through restful api）

庫的安裝和快速入門見：

http://docs.python-requests.org/en/master/user/quickstart/#response-content

給出測試通過示例代碼：

insert_to_db.py

import requests

resp = requests.get('http://localhost:3000/api/v1/summary')

# ------------- GET --------------
if resp.status_code != 200:
     # This means something went wrong.
     raise ApiError('GET /tasks/ {}'.format(resp.status_code))

for todo_item in resp.json():
     print('{} {}'.format(todo_item['Technology'], todo_item['Count']))

# ------------- POST --------------
Technology = {"Technology": "Django", "Count": "50" }

resp = requests.post('http://localhost:3000/api/v1/summary', json=Technology)
if resp.status_code != 201:
     raise ApiError('POST /Technologys/ {}'.format(resp.status_code))

print("-------------------")
print(resp.text)

print('Created Technology. ID: {}'.format(resp.json()["_id"])

Python VirutalEnv運行環境

https://realpython.com/python-virtual-environments-a-primer/

Create a new virtual environment inside the directory:
# Python 2:
$ virtualenv env

# Python 3
$ python3 -m venv env
Note: By default, this will not include any of your existing site packages.

windows 激活：

env\Scripts\activate

Scrapy（Scratch data）

https://scrapy.org/

An open source and collaborative framework for extracting the data you need from websites.

In a fast, simple, yet extensible way.

https://scrapy-chs.readthedocs.io/zh_CN/0.24/topics/architecture.html

Scrapy architecture

安裝和使用參考：

https://www.cnblogs.com/lightsong/p/8732537.html

安裝和運行過程報錯解決辦法：

1、 Scrapy運行ImportError: No module named win32api錯誤

https://blog.csdn.net/u013687632/article/details/57075514

pip install pypiwin32

2、 error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools

https://www.cnblogs.com/baxianhua/p/8996715.html

1. http://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted 下載twisted對應版本的whl文件（我的Twisted‑17.5.0‑cp36‑cp36m‑win_amd64.whl)，cp后面是python版本，amd64代表64位，

2. 運行命令：
pip install C:\Users\CR\Downloads\Twisted-17.5.0-cp36-cp36m-win_amd64.whl

給出示例代碼：

quotes_spider.py

import scrapy

class QuotesSpider(scrapy.Spider):
     name = "quotes"
     start_urls = [
         'http://quotes.toscrape.com/tag/humor/',
     ]

    def parse(self, response):
         for quote in response.css('div.quote'):
             yield {
                 'text': quote.css('span.text::text').extract_first(),
                 'author': quote.xpath('span/small/text()').extract_first(),
             }

        next_page = response.css('li.next a::attr("href")').extract_first()
         if next_page is not None:
             yield response.follow(next_page, self.parse)

在此目錄下，運行

scrapy runspider quotes_spider.py -o quotes.json

輸出結果

[
{"text": "\u201cThe person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.\u201d", "author": "Jane Austen"},
{"text": "\u201cA day without sunshine is like, you know, night.\u201d", "author": "Steve Martin"},
{"text": "\u201cAnyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make you a car.\u201d", "author": "Garrison Keillor"},
{"text": "\u201cBeauty is in the eye of the beholder and it may be necessary from time to time to give a stupid or misinformed beholder a black eye.\u201d", "author": "Jim Henson"},
{"text": "\u201cAll you need is love. But a little chocolate now and then doesn't hurt.\u201d", "author": "Charles M. Schulz"},
{"text": "\u201cRemember, we're madly in love, so it's all right to kiss me anytime you feel like it.\u201d", "author": "Suzanne Collins"},
{"text": "\u201cSome people never go crazy. What truly horrible lives they must lead.\u201d", "author": "Charles Bukowski"},
{"text": "\u201cThe trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it.\u201d", "author": "Terry Pratchett"},
{"text": "\u201cThink left and think right and think low and think high. Oh, the thinks you can think up if only you try!\u201d", "author": "Dr. Seuss"},
{"text": "\u201cThe reason I talk to myself is because I\u2019m the only one whose answers I accept.\u201d", "author": "George Carlin"},
{"text": "\u201cI am free of all prejudice. I hate everyone equally. \u201d", "author": "W.C. Fields"},
{"text": "\u201cA lady's imagination is very rapid; it jumps from admiration to love, from love to matrimony in a moment.\u201d", "author": "Jane Austen"}
]

業務全流程實例

https://github.com/fanqingsong/web_data_visualization

由於zhipin網站對爬蟲有反制策略，本例子采用scrapy的官方爬取實例quotes為研究對象。

流程為：

1、爬取數據， scrapy 的兩個組件 spider & item pipeline

2、存數據庫， requests庫的post方法推送數據到 webservice_quotes服務器的api

3、 webservice_quotes將數據保存到mongoDB

4、瀏覽器訪問vue頁面，與websocket_quotes服務器建立連接

5、 websocket_quotes定期（每隔1s）從mongoDB中讀取數據，推送給瀏覽器端，緩存為Vue應用的data，data綁定到模板視圖

scrapy item pipeline 推送數據到webservice接口

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html

import requests

class ScratchZhipinPipeline(object):
     def process_item(self, item, spider):

        print("--------------------")
         print(item['text'])
         print(item['author'])
         print("--------------------")

        # save to db through web service
         resp = requests.post('http://localhost:3001/api/v1/quote', json=item)
         if resp.status_code != 201:
             raise ApiError('POST /item/ {}'.format(resp.status_code))
         print(resp.text)
         print('Created Technology. ID: {}'.format(resp.json()["_id"]))

        return item

爬蟲運行: scrapy crawl quotes

webservice運行: npm run webservice_quotes

websocket運行: npm run websocket_quotes

vue調試環境運行： npm run dev

chrome:

db:

Python生成requirement.text文件

http://www.cnblogs.com/zhaoyingjie/p/6645811.html

快速生成requirement.txt的安裝文件
(CenterDesigner) xinghe@xinghe:~/PycharmProjects/CenterDesigner$ pip freeze > requirements.txt

安裝所需要的文件

pip install -r requirement.txt

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python爬取豆瓣電影top250數據存入數據庫 Python爬蟲爬取豆瓣電影名稱和鏈接，分別存入txt，excel和數據庫 Python使用Scrapy框架爬取數據存入CSV文件(Python爬蟲實戰4) Python全棧 MongoDB 數據庫（數據的修改） scrapy自動抓取蛋殼公寓最新房源信息並存入sql數據庫 scrapy使用PhantomJS爬取數據 scrapy數據增量式爬取 Python爬取數據並寫入MySQL數據庫的實例用python進行網頁爬取，數據插入數據庫 python爬蟲：爬取醫葯數據庫drugbank