隨機UA
https://github.com/hellysmile/fake-useragent
DOWNLOADER_MIDDLEWARES增加自定義
from fake_useragent import UserAgent class RandomUserAgentMiddlware(object): # 隨機更換user-agent def __init__(self, crawler): super(RandomUserAgentMiddlware, self).__init__() self.ua = UserAgent() self.ua_type = crawler.settings.get("RANDOM_UA_TYPE", "random") @classmethod def from_crawler(cls, crawler): return cls(crawler) def process_request(self, request, spider): def get_ua(): return getattr(self.ua, self.ua_type) request.headers.setdefault('User-Agent', get_ua())
備注:settings.py增加配置項
RANDOM_UA_TYPE = "random"
動態IP
1、通過免費的代理IP,如西刺,自己獲取IP源進行使用
2、免費插件scrapy_proxies
https://github.com/aivarsk/scrapy-proxies
3、收費插件scrapy-crawlera
https://github.com/scrapy-plugins/scrapy-crawlera
驗證碼
1、編碼實現(tesseract-ocr)
2、在線打碼,如雲打碼
3、人工打碼
RANDOM_UA_TYPE = "random"