適用於有且只有一點Python3和網頁基礎的朋友,大牛&路人請繞道
(本文很多廢話,第一次在網上長篇大論,所以激動的停不下來,如果有大佬路過,也希望不要直接繞道,煩請指點一二)
感謝博客園給了我一個機會,我喜歡的id還沒有被搶注,真的是太可怕了
*注:這是一段廢話,正文請直接跳過這一段. 大二的時候因為愛好,自己學了點python(當初學主要是因為語法簡潔美觀,還沒有大括號,代碼對齊?反正java代碼也要對齊啊~),還好我學python的時候py3已經流行起來了,沒有學py2,不然又得好一陣折騰. *
寫這個代碼的背景:記得高二的時候逛騙子網站,出於好奇在網站上留下了同桌的手機號,結果這都過了大概大半年了吧,同桌還是接二連三的可以收到騙子的電話,但是原來的網址已經找不到了,於是就在百度隨便搜索了關鍵詞"牛股"挨個查看的,只找到了一個可以輸入手機號的,剩下的都是讓加微信的,本來想整整我們老師呢,想想還是免了,干脆往他們數據庫填點東西玩吧,其實最終是否成功我也不能確定
- 操作環境
- win10 1803 64位
- Chrome 68.0.3440.106(正式版本) (64 位)
- pycharm-UI(pycharm專業版) 2018.2
- python-365
- 庫(非自帶庫用pip直接安裝就行):
- pymysql :import pymysql
- requests :import requests
- json(自帶) :import json
- Faker: :from faker import Faker
首先選取目標
目標網站是這個,url為:http://gpyd.gp241.com/nyqpc/bd2.html?id=20110052,

1.首先肯定是抓取一下post/get地址
進入首頁后點擊"點擊領取9月牛股"彈出對話框后,按F12彈出開發者工具
在開發者工具中選中"Network",隨后點擊網頁中的點擊領取,會看到network中多出來一條文件信息
然后提取一下我們需要的數據放到pycharm中,並整理成這種json格式:
2.這樣我們就得到了這些數據:
url = r"https: // download.zslxt.com / tinterface.php"
headers = {
"Host": "download.zslxt.com",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36",
"Accept": "*/*",
"Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "http:/gpyd.gp241.com/nyqpc/bd2.html?id=20110052",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Content-Length": "107",
"Origin": "http://gpyd.gp241.com",
"Connection": "keep-alive"
}
data = {
"bm": "gbk",
"gpdm": "",
"id": "20110052",
"phone": "15666668888",
"qudao": 98,
"remarks": "牛有圈百度2)"
}
這里看data的參數也應該明白了,這里就是我們剛才輸入的手機號了,別的代碼可以不動,我剛才切換瀏覽器發現並沒有影響,不知道是怎么來的,可能是跟百度推廣有關吧
然后,這樣之后就可以向網站發送一條數據了
首先我們要使用requests庫,這里就不介紹了,是一個可以用來請求get/post...還可以使用session保持登陸,用途很廣
這里有一點需要注意,就是data數據不可以直接傳送,需要用json.dumps()方法轉為字符串
requests可以返回請求數據,這段代碼並沒有體現出來,但是請不要被誤導
import requests
import json
url = r"https: // download.zslxt.com / tinterface.php"
headers = {
"Host": "download.zslxt.com",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36",
"Accept": "*/*",
"Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "http:/gpyd.gp241.com/nyqpc/bd2.html?id=20110052",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Content-Length": "107",
"Origin": "http://gpyd.gp241.com",
"Connection": "keep-alive"
}
data = {
"bm": "gbk",
"gpdm": "",
"id": "20110052",
"phone": "15666668888",
"qudao": 98,
"remarks": "牛有圈百度2)"
}
requests.post(url=url, headers=headers, data=json.dumps(data))
可是,總不能只發送一次吧
這里先介紹一下python中最假的庫--Faker
其實這個庫的"造假"功能出乎意料的強大,有興趣的可以去了解一下
在這個例子中大概只需要兩個功能:生成隨機user-agent和手機號碼(甚至這個網站也沒必要隨機user-agent,因為我沒有使用代理ip提交了大概兩千條數據,都沒有被封)
這樣之后,我們的代碼就學會了一點偽裝的皮毛
(這里插個題外話,那天在某論壇看到一個朋友問為什么一直在更換代理還是被封號了,,,當時我用的手機也沒有哪個論壇的帳號因此不方便回復他,同一個賬號一直在更換ip這種行為不正常吖)
這樣之后我們的代碼便成了如下代碼:
import requests
import json
from faker import Faker
f = Faker(locale="zh-CN")
user_agent = f.user_agent()
phone = f.phone_number()
url = r"https: // download.zslxt.com / tinterface.php"
headers = {
"Host": "download.zslxt.com",
"User-Agent": user_agent,
"Accept": "*/*",
"Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "http:/gpyd.gp241.com/nyqpc/bd2.html?id=20110052",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Content-Length": "107",
"Origin": "http://gpyd.gp241.com",
"Connection": "keep-alive"
}
data = {
"bm": "gbk",
"gpdm": "",
"id": "20110052",
"phone": phone,
"qudao": 98,
"remarks": "牛有圈百度2)"
}
req = requests.post(url=url, headers=headers, data=json.dumps(data))
然后就快要完成了,為了方便循環發送數據,我們再把它整理成一段函數:
其實我一開始學python真的不喜歡寫函數,畢竟那么兩行代碼就能寫完了,包裝成一個函數簡直就是在湊代碼行數,毫無用途,但是我今天看到了一個故事:
為了檢測空的奶盒子,博士后和農民用兩種方式解決了這個問題:發明一台機器,使用了一台風扇
但是很多時候我們新學東西時遇到的問題都可以用以前就會的方法解決這個問題,但是隨着問題的深入,有時候就只能使用新學的只是來解決以后遇到的問題了,寫寫函數(包裝成類)總是沒錯的,前提是這個代碼你是用來練手的,而不是用來應急的.
import requests
import json
from faker import Faker
f = Faker(locale="zh-CN")
def duang():
user_agent = f.user_agent()
phone = f.phone_number()
url = r"https: // download.zslxt.com / tinterface.php"
headers = {
"Host": "download.zslxt.com",
"User-Agent": user_agent,
"Accept": "*/*",
"Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "http:/gpyd.gp241.com/nyqpc/bd2.html?id=20110052",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Content-Length": "107",
"Origin": "http://gpyd.gp241.com",
"Connection": "keep-alive"
}
data = {
"bm": "gbk",
"gpdm": "",
"id": "20110052",
"phone": phone,
"qudao": 98,
"remarks": "牛有圈百度2)"
}
req = requests.post(url=url, headers=headers, data=json.dumps(data))
return user_agent, phone, req
這樣我們就可以方便的進行調用了,寫個main函數來調用它
import requests
import json
from faker import Faker
f = Faker(locale="zh-CN")
def duang():
user_agent = f.user_agent()
phone = f.phone_number()
url = r"https: // download.zslxt.com / tinterface.php"
headers = {
"Host": "download.zslxt.com",
"User-Agent": user_agent,
"Accept": "*/*",
"Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "http:/gpyd.gp241.com/nyqpc/bd2.html?id=20110052",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Content-Length": "107",
"Origin": "http://gpyd.gp241.com",
"Connection": "keep-alive"
}
data = {
"bm": "gbk",
"gpdm": "",
"id": "20110052",
"phone": phone,
"qudao": 98,
"remarks": "牛有圈百度2)"
}
req = requests.post(url=url, headers=headers, data=json.dumps(data))
return user_agent, phone, req
if __name__ == '__main__':
for i in range(100000):
user_agent, phone, req = duang()
print(i, '\t', phone, '\t', req.status_code, '\n', user_agent)
這里就是輸出一下信息啦,剛才出去吃飯的時候斷網了,只跑了3000多,這里就不截圖了(如果真有用來練手的朋友可以嘗試自己完善一下代碼,斷網后也可以等待並繼續執行)
附上全部代碼(寫到mysql了):
import pymysql
import requests
import json
from faker import Faker
f = Faker(locale="zh-CN")
def duang():
user_agent = f.user_agent()
phone = f.phone_number()
url = r"https: // download.zslxt.com / tinterface.php"
headers = {
"Host": "download.zslxt.com",
"User-Agent": user_agent,
"Accept": "*/*",
"Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "http:/gpyd.gp241.com/nyqpc/bd2.html?id=20110052",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Content-Length": "107",
"Origin": "http://gpyd.gp241.com",
"Connection": "keep-alive"
}
data = {
"bm": "gbk",
"gpdm": "",
"id": "20110052",
"phone": phone,
"qudao": 98,
"remarks": "牛有圈百度2)"
}
req = requests.post(url=url, headers=headers, data=json.dumps(data))
return user_agent, phone, req.status_code
if __name__ == '__main__':
for i in range(100000):
user_agent, phone, status_code = duang()
db = pymysql.connect("localhost", "root", "xiaoyan", "python")
cur = db.cursor()
cur.execute(f"INSERT INTO python1duang VALUES(default,'{user_agent}','{phone}','{status_code}')")
db.commit()
print(i, '\t', phone, '\t', status_code, '\n', user_agent)
db.close()



