采集免費ip,制作自己的代理ip池

本文轉載自查看原文 2021-09-22 07:39 201 B1-網絡協議

采集免費ip,制作自己的代理ip池

第一步，選擇一個免費代理ip的網站，把他們網站的所有ip都爬取下來，

http://www.66ip.cn/index.html

https://seofangfa.com/proxy/

https://ip.jiangxianli.com/

http://www.xiladaili.com/gaoni/6/

http://www.xsdaili.cn/dayProxy/ip/2459.html

http://www.dailiip.cc/freedailiip/2020/0929/966.html

http://31f.cn/https://www.chenjiayu.cn/archives6462.html

https://www.89ip.cn/index.html

https://www.kuaidaili.com/free/inha/

https://www.feizhuip.com/news-getInfo-id-1122.html

https://ip.ihuan.me/

https://www.7yip.cn/free/

http://ip.yqie.com/ipproxy.htm

http://ip.yqie.com/world.aspx

https://ip.jiangxianli.com/

http://www.ip3366.net/free/?stype=3

http://www.pachongdaili.com/free/freelist1.html

免費的代理IP不同網站質量也不盡相同，如果大家對於代理IP質量要求比較高，或者需要大量穩定代理IP的話，還是建議大家進行購買啦~

提取的時候，使用正則表達式，

這種免費代理，絕大部分，都是不可用的，

第二步，用request庫去請求驗證這個ip，然后把可用的ip，都提取出來，

要想判斷所使用的代理IP是否有用，只需要通過代理IP訪問IP地址查詢網站抓取地址以及歸屬地信息並與不使用代理IP時的地址信息以及歸屬地信息進行比較即可。

在瀏覽器訪問icanhazip.com，瀏覽器會直接返回你的出口IP（也叫公網IP）。或者百度“IP”也可以返回你的出口IP。

瀏覽器訪問IP138.com 或者http://ip.chinaz.com/，就可以得到外網地址。icanhazip.com

第三步，把可用的ip，都保存到數據庫，

如此就能拿到新鮮免費的代理ip了，為了使得ip能多次使用，我將其存入mysql數據庫中。

寫入代碼如下

def insert(self,l):
    print("插入{}條".format(len(l)))
    self.cur.executemany("insert into xc values(%s,%s,%s,%s,%s)",l)
    self.con.commit()

讀取代碼如下

def select(self):
    a=self.cur.execute("select ip,port,xieyi from xc")
    info=self.cur.fetchall()
    return info

整個過程使用python+re+request+mysql來完成，

####

import re
import requests
import pymysql
import time
class xiciSpider(object):
    def __init__(self):
        self.req=requests.Session()
        self.headers={
            'Accept-Encoding':'gzip, deflate, br',
            'Accept-Language':'zh-CN,zh;q=0.8',
            'Referer':'http://www.xicidaili.com/nn/',
            'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 '
                         '(KHTML, like Gecko) Ubuntu Chromium/60.0.3112.113 Chrome/60.0.3112.113 Safari/537.36',
        }
        self.proxyHeaders={
            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 '
                          '(KHTML, like Gecko) Ubuntu Chromium/60.0.3112.113 Chrome/60.0.3112.113 Safari/537.36',
        }
        self.con=pymysql.Connect(
            host='127.0.0.1',
            user='root',
            password="*****",
            database='xici',
            port=3306,
            charset='utf8',
        )
        self.cur=self.con.cursor()


    def getPage(self,url):
        page=self.req.get(url,headers=self.headers).text
        # print(page)
        return page

    def Page(self,text):
        time.sleep(2)
        pattern=re.compile(u'<tr class=".*?">.*?'
                           +u'<td class="country"><img.*?/></td>.*?'
                           +u'<td>(\d+\.\d+\.\d+\.\d+)</td>.*?'
                           +u'<td>(\d+)</td>.*?'
                           +u'<td>.*?'
                           +u'<a href=".*?">(.*?)</a>.*?'
                           +u'</td>.*?'
                           +u'<td class="country">(.*?)</td>.*?'
                           +u'<td>([A-Z]+)</td>.*?'
                           +'</tr>'
                           ,re.S)
        l=re.findall(pattern,text)
        return l
        # print(result[0])
    def getUrl(self,pageNum):
        url='http://www.xicidaili.com/nn/'+str(pageNum)
        return url

    def insert(self,l):
        print("插入{}條".format(len(l)))
        self.cur.executemany("insert into xc values(%s,%s,%s,%s,%s)",l)
        self.con.commit()
    def select(self):
        a=self.cur.execute("select ip,port,xieyi from xc")
        info=self.cur.fetchall()
        return info
    def getAccessIP(self,size=1):
        info=self.select()
        p=[]
        for i in info:
            if len(p)==size:
                return p
            try:
                self.req.get("http://www.baidu.com",proxies={"{}".format(i[2]):"{}://{}:{}".format(i[2],i[0],i[1])},timeout=5)
                p.append(i)
            except Exception:
                print("{} is valid".format(i))
        print(p)

    def getNewipToMysql(self):
        for i in range(2300):
            page=self.getPage(self.getUrl(i))
            p.insert(self.Page(page))

if __name__=='__main__':
    p=xiciSpider()
    # p.Page(p.getPage('http://www.xicidaili.com/nn/'))

    # for i in range(2300):
    #     page=p.getPage(p.getUrl(i))
    #     p.insert(p.Page(page))
    p.getAccessIP()

#####

from xc import xiciSpider

p=xiciSpider()
#第一次先運行這個方法，現將ip存入mysql
p.getNewipToMysql()
#獲取可用代理ip,默認獲取1個，可指定size大小
ip=p.getAccessIP()
print(ip)

####

######

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 采集15個代理IP網站，打造免費代理IP池免費 IP 代理池示例如何維護一個1000 IP的免費代理池免費IP代理池定時維護，封裝通用爬蟲工具類每次隨機更新IP代理池跟UserAgent池，並制作簡易流量爬蟲免費HTTP代理IP采集器 V1.0 制作ip地址池利用代理IP池(proxy pool)搭建免費ip代理和api 配置個人Ip代理池 Scrapy ip代理池爬蟲IP代理池