分析結果對:
http://xxx.com?method=getrequest&gesnum=00000001
http://xxx.com?method=getrequest&gesnum=00000002
http://xxx.com?method=getrequest&gesnum=00000003
返回的數據進行爬取
由於返回的python3 JSON數據中存在單個轉義字符“\”的處理 沒有處理好
req =requests.get(url=url,headers=headers,verify=False,timeout=60).json()
於是通過返回的是 bytes 型的二進制數據 進行處理。
req =requests.get(url=url,headers=headers,verify=False,allow_redirects=False,timeout=60)
data= json.dumps(bytes.decode(req.content,'UTF-8'))
#!/usr/bin/python3
#-*- coding:utf-8 -*-
#編寫環境 windows 7 x64 Notepad++ + Python3.5.0
import urllib3
urllib3.disable_warnings()
import sys
import requests
import re
import json
cookie = '''JSESSIONID=1B7407076DE01727BC48DCD56FF9BA70; entsoft=entsoft; JSESSIONID=4877B5AC1DF6307E90CF1641D3863A6C; radId=45991FBF-0BC4-3BA4-08E2-00072022FB2C'''
headers ={
'Accept': 'application/json, text/plain, */*',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Cookie': cookie,
}
#輸出00000001-00000300存放在num.txt中
def getNum():
filename='C:\\Users\\Administrator\\Desktop\\腳本\\num.txt'
file = open(filename,'w')
for i in range(1,300):
file.write(("%08d" % i)+'\n')
file.close()
def main():
#url ='http://xxx.com?method=getrequest&gesnum=00000001'
getNum()
filename='C:\\Users\\Administrator\\Desktop\\腳本\\num.txt'
with open(filename,'r') as file:
for line in file:
url ='http://xxx.com?method=getrequest&gesnum={line}'.format(line=line)
#print(url)
#req =requests.get(url=url,headers=headers,verify=False,timeout=60).json()
#遇到問題: python3 JSON數據中存在單個轉義字符“\”的處理沒解決 於是使用下面的方式
req =requests.get(url=url,headers=headers,verify=False,allow_redirects=False,timeout=60)
#使用json.dumps的方法,可以將json對象轉化為字符串
#print(req.content)
#response.text 返回的是一個 unicode 型的文本數據
#response.content 返回的是 bytes 型的二進制數據
#由於返回unicode 型的文本數據報錯,使用返回bytes 型的二進制數據
data= json.dumps(bytes.decode(req.content,'UTF-8'))
#print(data)
#正則匹配郵箱地址
emailRegex = r"[-_\w\.]{0,64}@([-\w]{1,63}\.)*[-\w]{1,63}"
email = re.search(emailRegex,data)
print(email)
if __name__ == '__main__':
main()
<_sre.SRE_Match object; span=(158, 184), match='xxxx@hotmail.com'> <_sre.SRE_Match object; span=(145, 170), match='xxxx@nordictelecom.net'>
#!/usr/bin/python3
#-*- coding:utf-8 -*-
#編寫環境 windows 7 x64 Notepad++ + Python3.5.0
def main():
filename = "C:\\Users\\Administrator\\Desktop\\腳本\\email_handle.txt"
filename1 = "C:\\Users\\Administrator\\Desktop\\腳本\\email_handle_handle.txt"
file1 = open(filename1,'w')
with open(filename,'r') as file:
for line in file:
data=line[48:]
print(data)
file1.write(data)
file.close()
file1.close()
if __name__ == '__main__':
main()
xxxx@hotmail.com'>
xxxx@nordictelecom.net'>
python爬蟲使用Cookie的兩種方法
https://blog.csdn.net/weixin_38706928/article/details/80376572
Python3 關於UnicodeDecodeError/UnicodeEncodeError: ‘gbk’ codec can’t decode/encode bytes類似的文本編碼問題
https://www.cnblogs.com/worstprogrammer/p/5189758.html
Python模擬登陸(使用requests庫)
https://blog.csdn.net/majianfei1023/article/details/49927969
Python的urllib3軟件包的證書認證及警告的禁用
https://blog.csdn.net/taiyangdao/article/details/72825735
JSON在線解析及格式化驗證
https://www.json.cn/
