遇到的問題：

1.用戶昵稱的編碼問題。

請求https://m.weibo.cn/api/container/getSecond?containerid=1005055232655493_-_FOLLOWERS&page=4的cookie有一個參數H5_INDEX_TITLE是有userName經過urlencode編碼得來的，因此要獲得用戶userName，可以在登錄微博后查看m.weibo.cn的網頁源代碼中獲得。

userName是經過編碼處理的，中文或字符會轉出Unicode編碼，而英文字符不會。我是通過正則表達式匹配得到的，匹配結果Meeeeeeeeeeeeee\u4e36中的\u4e36是不能轉義的，因此就不能變成中文。這就存在一個問題，將匹配的userName經過urlencode得到的H5_INDEX_TITLE不是我們想要的值。因為\u4e36被當作一個字符串進行urlencode。

解決辦法：使用decode

username.encode('utf-8').decode('unicode_escape')，decode('unicode_escape')將轉義字符\u讀取出來

參考地址：Python讀取文件中unicode編碼轉成中文顯示問題

2.urlencode

from urllib import parse

query = {

'name': 'username',

'age': 20,

}

parse.urlencode(query)

'name=username&age=20'

在線解密工具: https://tool.lu/encdec/

代碼如下:

  1 #!/usr/bin/env python3
  2 # -*- coding:utf-8 -*-
  3 import requests
  4 import time
  5 import random
  6 import re
  7 import csv
  8 from urllib import parse
  9 from user_agent import getUserAgent
 10 
 11 
 12 class GetMweiboFollow(object):
 13 
 14     def __init__(self, username, password):
 15         '''
 16         GetMweiboFollow給綁定屬性username,password；使用requests的Session(),使得登錄微博后能夠保持登錄狀態
 17         :param username: 用戶登錄新浪微博的賬號(郵箱，手機號碼等，不包括QQ登錄)
 18         :param password: 賬號密碼
 19         '''
 20         self.__username = username
 21         self.__password = password
 22         self.request = requests.Session()
 23 
 24     def login_mweibo(self):
 25         # 登錄微博
 26         print('登錄前請關閉微博的登錄保護！！！')
 27         user_agent = getUserAgent()
 28         headers = {
 29             'Accept': '*/*',
 30             'Accept-Encoding': 'gzip, deflate, br',
 31             'Accept-Language': 'zh-CN,zh;q=0.8',
 32             'Cache-Control': 'no-cache',
 33             'Connection': 'keep-alive',
 34             'Content-Length': '286',
 35             'Content-Type': 'application/x-www-form-urlencoded',
 36             'Host': 'passport.weibo.cn',
 37             'Origin': 'https://passport.weibo.cn',
 38             'Pragma': 'no-cache',
 39             'Referer': 'https://passport.weibo.cn/signin/welcome?entry=mweibo&r=http%3A%2F%2Fm.weibo.cn%2F',
 40             # http%3A%2F%2Fm.weibo.cn%2F 可用urldecode解碼，http://m.weibo.cn/
 41             'User-Agent': user_agent
 42         }
 43         data = {
 44             'username': self.__username,
 45             'password': self.__password,
 46             'savestate': '1',
 47             'r': 'http://m.weibo.cn/',
 48             'ec': '0',
 49             'pagerefer': 'https://passport.weibo.cn/signin/welcome?entry=mweibo&r=http%3A%2F%2Fm.weibo.cn%2F',
 50             'entry': 'mweibo',
 51             'wentry': '',
 52             'loginfrom': '',
 53             'client_id': '',
 54             'code': '',
 55             'qq': '',
 56             'mainpageflag': '1',
 57             'hff': '',
 58             'hfp': ''
 59         }
 60         login_url = 'https://passport.weibo.cn/sso/login'
 61         try:
 62             time.sleep(random.uniform(1.0, 2.5))
 63             login_response = self.request.post(login_url, headers=headers, data=data)
 64             login_status = login_response.json()['msg']   # 獲得登錄狀態信息，用於判斷是否成功登錄。
 65             if login_response.status_code == 200 and login_status == '用戶名或密碼錯誤':
 66                 print('{}登錄失敗!'.format(login_status))
 67             else:
 68                 print("{}成功登錄微博！".format(data['username']))
 69                 # 以下為成功登錄微博的標志。無論是否成功登錄微博，此請求狀態碼都為200
 70                 # login_response.json()['msg'] == ''或者login_response.json()['retcode'] == 20000000
 71                 self.uid = login_response.json()['data']['uid']   # 獲得用戶ID，即uid
 72                 self.cookie_info = login_response.headers['Set-Cookie']  # 獲得服務器響應此請求的set-cookie，用於后面構建cookie
 73                 return True, self.uid, self.cookie_info
 74         except Exception as e:
 75             print('Error:', e.args)
 76 
 77     def get_cookies(self):
 78         # 構建cookie，用於獲得關注列表get_follow_url()時，發送請求的headers的Cookie設置
 79         # 通過正則表達式，獲得cookie里的幾個參數SUB、SHUB、SCF、SSOloginState
 80         comp = re.compile(r'SUB=(.*?);.*?SUHB=(.*?);.*?SCF=(.*?);.*?SSOLoginState=(.*?);.*?ALF=(.*?);.*?')
 81         reg_info = re.findall(comp, self.cookie_info)[0]
 82         SUB = reg_info[0]
 83         SHUB = reg_info[1]
 84         SCF = reg_info[2]
 85         SSOLoginState = reg_info[3]
 86         # ALF = reg_info[4]
 87         m_weibo_cookie = 'SUB' + '=' + SUB + ';' \
 88                          + 'SHUB' + '=' + SHUB + ';' \
 89                          + 'SCF' + '=' + SCF + ';' \
 90                          + 'SSOLoginState' + '=' + SSOLoginState
 91         headers = {
 92             'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
 93             'Accept-Encoding': 'gzip, deflate, br',
 94             'Accept-Language': 'zh-CN,zh;q=0.9',
 95             'Connection': 'keep-alive',
 96             'Cookie': m_weibo_cookie,
 97             'Host': 'm.weibo.cn',
 98             'Upgrade-Insecure-Requests': '1',
 99             'User-Agent': getUserAgent()
100         }
101         # 發送請求給m.weibo.cn，獲得響應體中的其它cookie參數，_T_WM、H5_INDEX_TITLE
102         # MLOGIN、H5_INDEX、WEIBOCN_FROM的值是固定的
103         # H5_INDEX_TITLE是將用戶昵稱經過urlencode得到的
104         m_weibo_resp = self.request.get('https://m.weibo.cn/', headers=headers)
105         username = re.findall(r'"userName":"(.*?)"', m_weibo_resp.text)[0]
106         # 獲得的用戶昵稱，中文字符是轉成了Unicode編碼(如\u4e5d)，而英文字符沒有。因此要將username由unicode編碼為utf-8，再以uniocde_escape解碼
107         # unicode_escape可以將轉義字符\u讀取出來
108         username_unicode = username.encode('utf-8').decode('unicode_escape')
109         _T_WM = re.findall(r'_T_WM=(.*?);', m_weibo_resp.headers['Set-Cookie'])[0]
110         MLOGIN = 1
111         H5_INDEX = 3
112         WEIBOCN_FROM = 1110006030
113         H5_INDEX_TITLE = parse.urlencode({'H5_INDEX_TITLE': username_unicode})
114         self.build_weibo_cookie = m_weibo_cookie + ';' \
115                                   + '_T_WM' + '=' + _T_WM + ';' \
116                                   + 'MLOGIN' + '=' + str(MLOGIN) + ';' \
117                                   + 'H5_INDEX' + '=' + str(H5_INDEX) + ';' \
118                                   + H5_INDEX_TITLE + ';'\
119                                   + 'WEIBOCN_FROM' + '=' + str(WEIBOCN_FROM)
120 
121     def get_follow_url(self, page=1, *args):
122         # 關注列表的api接口，Ajax加載。每一頁最多十條關注列表信息；頁數大於1，傳入page參數
123         # 獲得每頁的api接口的json格式數據，即關注列表信息
124         user_agent = getUserAgent()
125         contain_uid = str(100505) + self.uid
126         if page <= 1:
127             params = {'containerid': '{}_-_FOLLOWERS'.format(contain_uid)}
128             cookie = self.build_weibo_cookie
129         else:
130             params = {'containerid': '{}_-_FOLLOWERS'.format(contain_uid),
131                       'page': args[0]}
132             cookie = self.build_weibo_cookie + ';' \
133                      + 'M_WEIBOCN_PARAMS=fid%3D{}_-_FOLLOWERS%26uicode%3D10000012'.format(contain_uid)
134         headers = {
135             'Accept': 'application/json, text/plain, */*',
136             'Accept-Encoding': 'gzip, deflate, br',
137             'Accept-Language': 'zh-CN,zh;q=0.9',
138             'Connection': 'keep-alive',
139             'Cookie': cookie,
140             'Host': 'm.weibo.cn',
141             'Referer': 'https://m.weibo.cn/p/second?containerid={}'.format(params['containerid']),
142             'User-Agent': user_agent,
143             'X-Requested-With': 'XMLHttpRequest'
144         }
145         follow_url = 'https://m.weibo.cn/api/container/getSecond?'
146         try:
147             time.sleep(random.uniform(0.5, 2.7))
148             resp = self.request.get(follow_url, headers=headers, params=params)
149             if resp.status_code == 200:
150                 follow_maxPage = int(resp.json()['data']['maxPage'])
151                 if follow_maxPage >= 1:
152                     return resp, follow_maxPage
153                 else:
154                     return resp
155         except Exception as e:
156             print(e.args)
157             return None
158 
159     def get_follow(self, response):
160         # 獲得關注列表的用戶的信息，使用yield
161         follow_info = response.json()['data']['cards']
162         for info in follow_info:
163             follow = {'id': info['user']['id'],
164                       'screen_name': info['user']['screen_name'],
165                       'gender': info['user']['gender'],
166                       'description': info['user']['description'],
167                       'followers_count': info['user']['followers_count'],
168                       'follow_count': info['user']['follow_count'],
169                       'statuses_count': info['user']['statuses_count'],
170                       'scheme': info['scheme']
171                       }
172             if info['user']['verified'] == 'true':
173                 follow['verified_reason'] = info['user']['verified_reason']
174                 yield follow
175             else:
176                 follow['verified_reason'] = 'None'
177                 yield follow
178 
179     def write_to_csv(self, *args, has_title=True):
180         # param has_title: 用於判斷是否在csv表格中寫入關注列表信息的列名。一般只寫入一次。
181         if has_title is True:
182             with open('follow.csv', 'w', encoding='utf-8', newline='') as file:
183                 follow_title = csv.writer(file, delimiter=',')
184                 follow_title.writerow(['id', 'screen_name', 'gender', 'description', 'follow_count', 'followers_count',
185                                        'statuses_count', 'scheme', 'verified_reason'])
186         if has_title is False:
187             with open('follow.csv', 'a+', encoding='utf-8', newline='') as file:
188                 follow = csv.writer(file, delimiter=',')
189                 for data in self.get_follow(args[0]):
190                     print(data)
191                     follow.writerow([data['id'], data['screen_name'], data['gender'], data['description'],
192                                      data['follow_count'], data['followers_count'], data['statuses_count'],
193                                      data['scheme'], data['verified_reason']])
194 
195 
196 def main():
197     user = input('user:')
198     passwd = input('password:')
199     start_time = time.time()
200     gkp = GetMweiboFollow(user, passwd)
201     gkp.login_mweibo()
202     gkp.get_cookies()
203     if gkp.get_follow_url() is not None: # 若gkp.get_follow_url()不為None，說明成功發送了請求，並獲得api的json數據
204         if isinstance(gkp.get_follow_url(), tuple): # 若gkp.get_follow_url()是tuple，說明關注列表有兩頁及以上（大於10個）
205             follow_maxPage = gkp.get_follow_url()[1] # 最大頁數
206             gkp.write_to_csv(has_title=True)
207             for page in range(1, follow_maxPage + 1):  # 獲得每頁的api的response，從而得到關注的人的信息，並寫入csv
208                 response = gkp.get_follow_url(follow_maxPage, page)[0]
209                 gkp.write_to_csv(response, has_title=False)
210             end_time = time.time()
211             print('耗費時間:', end_time - start_time)
212         else:
213             # 頁數為1
214             response = gkp.get_follow_url()
215             gkp.write_to_csv(has_title=True)
216             gkp.write_to_csv(response, has_title=False)
217             end_time = time.time()
218             print('耗費時間:', end_time - start_time)
219     else:
220         print('獲取關注列表失敗！')
221         end_time = time.time()
222         print('耗費時間:', end_time - start_time)
223         exit()
224 
225 
226 if __name__ == '__main__':
227     main()

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 HttpClient 模擬登錄手機版新浪微博 python selenium 模擬手機瀏覽器 Appium+Python-模擬手機按鍵操作 Python selenium —— 用chrome的Mobile emulation模擬手機瀏覽器測試手機網頁 Python selenium 用chrome的Mobile emulation模擬手機瀏覽器測試手機網頁 python+appium模擬手機物理按鍵操作 Appium+Python-模擬手機滑動操作（swipe） python自動化測試——模擬手機瀏覽器 python自動化--mock、webservice及webdriver模擬手機瀏覽器 python selenium Chrome模擬手機瀏覽器（十七）