常見的請求頭:
host:網站的域名 比如:www.lagou.com
content-type:請求數據的類型
user-agent:發送請求的代理
cookie:發送請求攜帶的cookie
referer:上一次請求的地址
Location:(響應頭中)重定向的地址
爬取抽屜:
備注:最常用的一種反爬蟲的方式,就是驗證請求頭中有沒有攜帶user-agent,所有在爬取時要攜帶這個頭請求
抽屜網的自動登錄和查看個人頁面
備注:請求的過程中要攜帶cookie,可以通過從response中獲取cookies,借助requests模塊中封裝的方法,直接將請求返回的響應封裝成一個dict的形式,通過 ret.cookies.get_dict()的方法。
自動登錄GitHub
獲取登錄頁面
發送登錄的post請求
發送post請求時是向https://github.com/session這個url發送的post的請求,請求數據中,除了必須的登錄名和密碼外,還需要commit參數和utf8參數,此外還有一個動態參數authenticity_token,每次發送請求,都會發生變化。那這個參數是怎么來的?
一般這種參數要么隱藏在頁面中,就像django中的csrf-token一樣,要么是在js中動態的通過一定的算法生成。
github中的authenticity_token參數是在頁面中隱藏的一個參數。
可以通過beautifulsoup直接解析出來
發送post請求必須攜帶cookies
每次請求都會返回一個cookie,查看某個頁面時,攜帶的請求要么是上一次的請求的cookie,要么是第一次訪問頁面的cookie,亦或者是兩次或更多次cookie的結合。
代碼示例:

import requests from bs4 import BeautifulSoup ################### 發送登錄的get請求,獲得登錄頁面 #################### ret = requests.get( url='https://github.com/login', ) ret_cookie_dict = ret.cookies.get_dict() # 獲取請求頁面中的authenticity_token參數 soup = BeautifulSoup(ret.text,'html.parser') token = soup.find(name='input',attrs={'name':'authenticity_token'}).attrs.get('value') ################# 發送登錄的post請求 ###########################333 ret1 = requests.post( url='https://github.com/session', headers={ 'Host': 'github.com', 'Origin': 'https://github.com', 'Pragma': 'no-cache', 'Referer': 'https://github.com/login', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36' }, data={ 'commit': 'Sign in', 'utf8': '✓', 'authenticity_token': token, 'login': 'Zhao-panpan', 'password': 'xxxxx' }, cookies = ret_cookie_dict ) ret1_cookie_dict = ret1.cookies.get_dict() ############### 查看個人email設置頁面 ######################### ret3 = requests.get( url='https://github.com/settings/emails', headers={ 'Host': 'github.com', 'Origin': 'https://github.com', 'Pragma': 'no-cache', 'Referer': 'https://github.com/login', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36' }, cookies = ret1_cookie_dict ) print(ret3.text)
備注:請求攜帶的頭中,經常會有自定義的一些請求頭,比如:code、token等
爬取拉鈎網
登錄時向login.json這個url發送post請求
請求頭和請求數據的格式
點擊登錄后,會跳轉到主頁面https://www.lagou.com/,這個過程發送了四次請求,其中三次重定向請求,一次最終請求
備注:
請求頭中的參數:x-requested-with:XMLHttpRequest 表示發送的一次ajax請求
requests模塊返回響應時,如果返回的結果為一個字典類型的字符串時,而不是HTML時,可以通過 res.json() 獲取請求內容json對應的字典。
發送請求時,可以在請求參數中添加 allow_redirects=False 阻止requests模塊內部的自動重定向
發送post請求時,參數中data和json的區別:兩者的數據都是作為請求體的數據發送,data參數是請求頭中發送的content-type類型為urlencode的數據,在請求體中拼接成 name=xxxx&age=18的格式,而json參數是請求頭中的content-type為json類型時,發送的,請數據轉化為json字符串的格式。
發送的數據在 Request Payload中時,一般為json格式的數據,此時,請求的參數為json={‘k1’:"v1"}
瀏覽器重定向時,可以設置Network中Preserve log的參數勾選,可以保留每次重定向的所有請求。
代碼示例:

import re import requests all_cookie_dict = {} # ##################################### 第一步:訪問登錄頁面 ##################################### r1 = requests.get( url='https://passport.lagou.com/login/login.html', headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36' } ) token = re.findall("X_Anti_Forge_Token = '(.*)';",r1.text)[0] code = re.findall("X_Anti_Forge_Code = '(.*)';",r1.text)[0] r1_cookie_dict = r1.cookies.get_dict() all_cookie_dict.update(r1_cookie_dict) # ##################################### 第二步:去登陸 ##################################### r2 = requests.post( url='https://passport.lagou.com/login/login.json', data={ 'isValidate':'true', 'username':'15131255089', 'password':'4565465', 'request_form_verifyCode':'', 'submit':'' }, headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36', 'X-Requested-With':'XMLHttpRequest', 'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8', 'Host':'passport.lagou.com', 'Origin':'https://passport.lagou.com', 'Referer':'https://passport.lagou.com/login/login.html', 'X-Anit-Forge-Code':code, 'X-Anit-Forge-Token':token }, cookies=all_cookie_dict ) r2_response_json = r2.json() r2_cookie_dict = r2.cookies.get_dict() all_cookie_dict.update(r2_cookie_dict) # ##################################### 第三步:grant ##################################### r3 = requests.get( url='https://passport.lagou.com/grantServiceTicket/grant.html', headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36', 'Referer':'https://passport.lagou.com/login/login.html', 'Host':'passport.lagou.com', }, cookies=all_cookie_dict, allow_redirects=False ) r3_cookie_dict = r3.cookies.get_dict() all_cookie_dict.update(r3_cookie_dict) # ##################################### 第四步:action ##################################### r4 = requests.get( url=r3.headers['Location'], headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36', 'Referer':'https://passport.lagou.com/login/login.html', 'Host':'www.lagou.com', 'Upgrade-Insecure-Requests':'1', }, cookies=all_cookie_dict, allow_redirects=False ) r4_cookie_dict = r4.cookies.get_dict() all_cookie_dict.update(r4_cookie_dict) # ##################################### 第五步:獲取認證信息 ##################################### r5 = requests.get( url=r4.headers['Location'], headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36', 'Referer':'https://passport.lagou.com/login/login.html', 'Host':'www.lagou.com', 'Upgrade-Insecure-Requests':'1', }, cookies=all_cookie_dict, allow_redirects=False ) r5_cookie_dict = r5.cookies.get_dict() all_cookie_dict.update(r5_cookie_dict) print(r5.headers['Location']) # ##################################### 第六步 ##################################### r6 = requests.get( url=r5.headers['Location'], headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36', 'Referer':'https://passport.lagou.com/login/login.html', 'Host':'www.lagou.com', 'Upgrade-Insecure-Requests':'1', }, cookies=all_cookie_dict, allow_redirects=False ) r6_cookie_dict = r6.cookies.get_dict() all_cookie_dict.update(r6_cookie_dict) print(r6.headers['Location']) # ##################################### 第七步 ##################################### r7 = requests.get( url=r6.headers['Location'], headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36', 'Referer':'https://passport.lagou.com/login/login.html', 'Host':'www.lagou.com', 'Upgrade-Insecure-Requests':'1', }, cookies=all_cookie_dict, allow_redirects=False ) r7_cookie_dict = r7.cookies.get_dict() all_cookie_dict.update(r7_cookie_dict) # ##################################### 第八步:查看個人信息 ##################################### r8 = requests.get( url='https://gate.lagou.com/v1/neirong/account/users/0/', headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36', 'Host':'gate.lagou.com', 'Pragma':'no-cache', 'Referer':'https://account.lagou.com/v2/account/userinfo.html', 'X-L-REQ-HEADER':'{deviceType:1}' }, cookies=all_cookie_dict ) r8_response_json = r8.json() # print(r8_response_json) all_cookie_dict.update(r8.cookies.get_dict()) # ##################################### 第九步:查看個人信息 ##################################### r9 = requests.put( url='https://gate.lagou.com/v1/neirong/account/users/0/', headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36', 'Host':'gate.lagou.com', 'Origin':'https://account.lagou.com', 'Referer':'https://account.lagou.com/v2/account/userinfo.html', 'X-L-REQ-HEADER':'{deviceType:1}', 'X-Anit-Forge-Code':r8_response_json.get('submitCode'), 'X-Anit-Forge-Token':r8_response_json.get('submitToken'), 'Content-Type':'application/json;charset=UTF-8', }, json={"userName":"wupeiqi999","sex":"MALE","portrait":"images/myresume/default_headpic.png","positionName":"...","introduce":"...."}, cookies=all_cookie_dict ) print(r9.text)
爬取抖音
訪問每一個用戶的頁面時,會發送一個ajax請求該用戶的所有的作品
發送ajax的請求時,url路徑攜帶的參數中,_signature參數是可變的,這個參數是js代碼生成的
生成該參數的js肯定在發送這個ajax請求之前得到的
在所有js中先找_signature
再找signature
signature是由一個_bytedAcrawler中的函數sign傳入uid(用戶id)生成的
再找_bytedAcrawler
得到sign方法在導入的一個模塊中require("douyin_falcon:node_modules/byted-acrawler/dist/runtime")
接着在source中查找所有的靜態資源
找到這些js文件,然后放到一個HTML中,傳入相應數據,console.log結果
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>Title</title> </head> <body> <script> __M.define("douyin_falcon:node_modules/byted-acrawler/dist/runtime", function (l, e) { Function(function (l) { return 'e(e,a,r){(b[e]||(b[e]=t("x,y","x "+e+" y")(r,a)}a(e,a,r){(k[r]||(k[r]=t("x,y","new x[y]("+Array(r+1).join(",x[y]")(1)+")")(e,a)}r(e,a,r){n,t,s={},b=s.d=r?r.d+1:0;for(s["$"+b]=s,t=0;t<b;t)s[n="$"+t]=r[n];for(t=0,b=s=a;t<b;t)s[t]=a[t];c(e,0,s)}c(t,b,k){u(e){v[x]=e}f{g=,ting(bg)}l{try{y=c(t,b,k)}catch(e){h=e,y=l}}for(h,y,d,g,v=[],x=0;;)switch(g=){case 1:u(!)4:f5:u((e){a=0,r=e;{c=a<r;c&&u(e[a]),c}}(6:y=,u((y8:if(g=,lg,g=,y===c)b+=g;else if(y!==l)y9:c10:u(s(11:y=,u(+y)12:for(y=f,d=[],g=0;g<y;g)d[g]=y.charCodeAt(g)^g+y;u(String.fromCharCode.apply(null,d13:y=,h=delete [y]14:59:u((g=)?(y=x,v.slice(x-=g,y:[])61:u([])62:g=,k[0]=65599*k[0]+k[1].charCodeAt(g)>>>065:h=,y=,[y]=h66:u(e(t[b],,67:y=,d=,u((g=).x===c?r(g.y,y,k):g.apply(d,y68:u(e((g=t[b])<"<"?(b--,f):g+g,,70:u(!1)71:n72:+f73:u(parseInt(f,3675:if(){bcase 74:g=<<16>>16g76:u(k[])77:y=,u([y])78:g=,u(a(v,x-=g+1,g79:g=,u(k["$"+g])81:h=,[f]=h82:u([f])83:h=,k[]=h84:!085:void 086:u(v[x-1])88:h=,y=,h,y89:u({e{r(e.y,arguments,k)}e.y=f,e.x=c,e})90:null91:h93:h=0:;default:u((g<<16>>16)-16)}}n=this,t=n.Function,s=Object.keys||(e){a={},r=0;for(c in e)a[r]=c;a=r,a},b={},k={};r'.replace(/[-]/g, function (e) { return l[15 & e.charCodeAt(0)] }) }("v[x++]=v[--x]t.charCodeAt(b++)-32function return ))++.substrvar .length(),b+=;break;case ;break}".split("")))()('gr$Daten Иb/s!l y͒yĹg,(lfi~ah`{mv,-n|jqewVxp{rvmmx,&effkx[!cs"l".Pq%widthl"@q&heightl"vr*getContextx$"2d[!cs#l#,*;?|u.|uc{uq$fontl#vr(fillTextx$$龘ฑภ경2<[#c}l#2q*shadowBlurl#1q-shadowOffsetXl#$$limeq+shadowColorl#vr#arcx88802[%c}l#vr&strokex[ c}l"v,)}eOmyoZB]mx[ cs!0s$l$Pb<k7l l!r&lengthb%^l$1+s$jl s#i$1ek1s$gr#tack4)zgr#tac$! +0o![#cj?o ]!l$b%s"o ]!l"l$b*b^0d#>>>s!0s%yA0s"l"l!r&lengthb<k+l"^l"1+s"jl s&l&z0l!$ +["cs\'(0l#i\'1ps9wxb&s() &{s)/s(gr&Stringr,fromCharCodes)0s*yWl ._b&s o!])l l Jb<k$.aj;l .Tb<k$.gj/l .^b<k&i"-4j!+& s+yPo!]+s!l!l Hd>&l!l Bd>&+l!l <d>&+l!l 6d>&+l!l &+ s,y=o!o!]/q"13o!l q"10o!],l 2d>& s.{s-yMo!o!]0q"13o!]*Ld<l 4d#>>>b|s!o!l q"10o!],l!& s/yIo!o!].q"13o!],o!]*Jd<l 6d#>>>b|&o!]+l &+ s0l-l!&l-l!i\'1z141z4b/@d<l"b|&+l-l(l!b^&+l-l&zl\'g,)gk}ejo{cm,)|yn~Lij~em["cl$b%@d<l&zl\'l $ +["cl$b%b|&+l-l%8d<@b|l!b^&+ q$sign ', [Object.defineProperty(e, "__esModule", {value: !0})]) }); _bytedAcrawler = require("douyin_falcon:node_modules/byted-acrawler/dist/runtime"); signature = _bytedAcrawler.sign('58841646784'); console.log(signature); </script> </body> </html>
運行后,報錯
查找__M得到是由一個自執行函數中t的參數,這個t是傳入的,傳的是this
備注:! function 表示自執行函數
require是__M中的一個參數
這樣可以得到這個動態的_signature的值了
最終執行的js文件為:
!function(t) { if (t.__M = t.__M || {}, !t.__M.require) { var e, n, r = document.getElementsByTagName("head")[0], i = {}, o = {}, a = {}, u = {}, c = {}, s = {}, l = function(t, n) { if (!(t in u)) { u[t] = !0; var i = document.createElement("script"); if (n) { var o = setTimeout(n, e.timeout); i.onerror = function() { clearTimeout(o), n() } ; var a = function() { clearTimeout(o) }; "onload"in i ? i.onload = a : i.onreadystatechange = function() { ("loaded" === this.readyState || "complete" === this.readyState) && a() } } return i.type = "text/javascript", i.src = t, r.appendChild(i), i } }, f = function(t, e, n) { var r = i[t] || (i[t] = []); r.push(e); var o, a = c[t] || c[t + ".js"] || {}, u = a.pkg; o = u ? s[u].url || s[u].uri : a.url || a.uri || t, l(o, n && function() { n(t) } ) }; n = function(t, e) { "function" != typeof e && (e = arguments[2]), t = t.replace(/\.js$/i, ""), o[t] = e; var n = i[t]; if (n) { for (var r = 0, a = n.length; a > r; r++) n[r](); delete i[t] } } , e = function(t) { if (t && t.splice) return e.async.apply(this, arguments); t = e.alias(t); var n = a[t]; if (n) return n.exports; var r = o[t]; if (!r) throw "[ModJS] Cannot find module `" + t + "`"; n = a[t] = { exports: {} }; var i = "function" == typeof r ? r.apply(n, [e, n.exports, n]) : r; return i && (n.exports = i), n.exports && !n.exports["default"] && Object.defineProperty && Object.isExtensible(n.exports) && Object.defineProperty(n.exports, "default", { value: n.exports }), n.exports } , e.async = function(n, r, i) { function a(t) { for (var n, r = 0, h = t.length; h > r; r++) { var p = e.alias(t[r]); p in o ? (n = c[p] || c[p + ".js"], n && "deps"in n && a(n.deps)) : p in s || (s[p] = !0, l++, f(p, u, i), n = c[p] || c[p + ".js"], n && "deps"in n && a(n.deps)) } } function u() { if (0 === l--) { for (var i = [], o = 0, a = n.length; a > o; o++) i[o] = e(n[o]); r && r.apply(t, i) } } "string" == typeof n && (n = [n]); var s = {} , l = 0; a(n), u() } , e.resourceMap = function(t) { var e, n; n = t.res; for (e in n) n.hasOwnProperty(e) && (c[e] = n[e]); n = t.pkg; for (e in n) n.hasOwnProperty(e) && (s[e] = n[e]) } , e.loadJs = function(t) { l(t) } , e.loadCss = function(t) { if (t.content) { var e = document.createElement("style"); e.type = "text/css", e.styleSheet ? e.styleSheet.cssText = t.content : e.innerHTML = t.content, r.appendChild(e) } else if (t.url) { var n = document.createElement("link"); n.href = t.url, n.rel = "stylesheet", n.type = "text/css", r.appendChild(n) } } , e.alias = function(t) { return t.replace(/\.js$/i, "") } , e.timeout = 5e3, t.__M.define = n, t.__M.require = e } }(this); __M.define("douyin_falcon:node_modules/byted-acrawler/dist/runtime", function (l, e) { Function(function (l) { return 'e(e,a,r){(b[e]||(b[e]=t("x,y","x "+e+" y")(r,a)}a(e,a,r){(k[r]||(k[r]=t("x,y","new x[y]("+Array(r+1).join(",x[y]")(1)+")")(e,a)}r(e,a,r){n,t,s={},b=s.d=r?r.d+1:0;for(s["$"+b]=s,t=0;t<b;t)s[n="$"+t]=r[n];for(t=0,b=s=a;t<b;t)s[t]=a[t];c(e,0,s)}c(t,b,k){u(e){v[x]=e}f{g=,ting(bg)}l{try{y=c(t,b,k)}catch(e){h=e,y=l}}for(h,y,d,g,v=[],x=0;;)switch(g=){case 1:u(!)4:f5:u((e){a=0,r=e;{c=a<r;c&&u(e[a]),c}}(6:y=,u((y8:if(g=,lg,g=,y===c)b+=g;else if(y!==l)y9:c10:u(s(11:y=,u(+y)12:for(y=f,d=[],g=0;g<y;g)d[g]=y.charCodeAt(g)^g+y;u(String.fromCharCode.apply(null,d13:y=,h=delete [y]14:59:u((g=)?(y=x,v.slice(x-=g,y:[])61:u([])62:g=,k[0]=65599*k[0]+k[1].charCodeAt(g)>>>065:h=,y=,[y]=h66:u(e(t[b],,67:y=,d=,u((g=).x===c?r(g.y,y,k):g.apply(d,y68:u(e((g=t[b])<"<"?(b--,f):g+g,,70:u(!1)71:n72:+f73:u(parseInt(f,3675:if(){bcase 74:g=<<16>>16g76:u(k[])77:y=,u([y])78:g=,u(a(v,x-=g+1,g79:g=,u(k["$"+g])81:h=,[f]=h82:u([f])83:h=,k[]=h84:!085:void 086:u(v[x-1])88:h=,y=,h,y89:u({e{r(e.y,arguments,k)}e.y=f,e.x=c,e})90:null91:h93:h=0:;default:u((g<<16>>16)-16)}}n=this,t=n.Function,s=Object.keys||(e){a={},r=0;for(c in e)a[r]=c;a=r,a},b={},k={};r'.replace(/[-]/g, function (e) { return l[15 & e.charCodeAt(0)] }) }("v[x++]=v[--x]t.charCodeAt(b++)-32function return ))++.substrvar .length(),b+=;break;case ;break}".split("")))()('gr$Daten Иb/s!l y͒yĹg,(lfi~ah`{mv,-n|jqewVxp{rvmmx,&effkx[!cs"l".Pq%widthl"@q&heightl"vr*getContextx$"2d[!cs#l#,*;?|u.|uc{uq$fontl#vr(fillTextx$$龘ฑภ경2<[#c}l#2q*shadowBlurl#1q-shadowOffsetXl#$$limeq+shadowColorl#vr#arcx88802[%c}l#vr&strokex[ c}l"v,)}eOmyoZB]mx[ cs!0s$l$Pb<k7l l!r&lengthb%^l$1+s$jl s#i$1ek1s$gr#tack4)zgr#tac$! +0o![#cj?o ]!l$b%s"o ]!l"l$b*b^0d#>>>s!0s%yA0s"l"l!r&lengthb<k+l"^l"1+s"jl s&l&z0l!$ +["cs\'(0l#i\'1ps9wxb&s() &{s)/s(gr&Stringr,fromCharCodes)0s*yWl ._b&s o!])l l Jb<k$.aj;l .Tb<k$.gj/l .^b<k&i"-4j!+& s+yPo!]+s!l!l Hd>&l!l Bd>&+l!l <d>&+l!l 6d>&+l!l &+ s,y=o!o!]/q"13o!l q"10o!],l 2d>& s.{s-yMo!o!]0q"13o!]*Ld<l 4d#>>>b|s!o!l q"10o!],l!& s/yIo!o!].q"13o!],o!]*Jd<l 6d#>>>b|&o!]+l &+ s0l-l!&l-l!i\'1z141z4b/@d<l"b|&+l-l(l!b^&+l-l&zl\'g,)gk}ejo{cm,)|yn~Lij~em["cl$b%@d<l&zl\'l $ +["cl$b%b|&+l-l%8d<@b|l!b^&+ q$sign ', [Object.defineProperty(e, "__esModule", {value: !0})]) }); _bytedAcrawler = __M.require("douyin_falcon:node_modules/byted-acrawler/dist/runtime"); signature = _bytedAcrawler.sign('58841646784'); console.log(signature);
接着就是下一步,怎么通過python執行這段js代碼
通過python執行js代碼,先通過命令行node 執行這段代碼,在通過python中的os模塊中的popen執行命令行命令得到最終的結果,或者通過subprocess模塊執行命令行
通過node執行(命令行輸入:node js文件的路徑)后,報錯
然后找到這個報錯的地方document.getElementsByTagName("head")[0]
解決方法:在該用戶的抖音頁面下console.log(document.getElementsByTagName("head")[0])得到的是一個head標簽
然后將這段標簽替換掉document.getElementsByTagName("head")[0]即可
替換后又報另一個錯誤 找不到__M
解決方法:在瀏覽器中this是指window對象,在js中,this不是
接着又報錯 找不到userAgent
解決方法:在瀏覽器中,獲取userAgent參數是通過一個navigator中的userAgent得到的
然后在這個js文件中加上這個參數即可
所以,最終的js文件為:
navigator = {
userAgent:"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"
} if (t.__M = t.__M || {}, !t.__M.require) { var e, n, r = "<head> <meta charset=\"utf-8\"><title>快來加入抖音短視頻,讓你發現最有趣的我!</title><meta name=\"viewport\" content=\"width=device-width,initial-scale=1,user-scalable=0,minimum-scale=1,maximum-scale=1,minimal-ui,viewport-fit=cover\"><meta name=\"format-detection\" content=\"telephone=no\"><meta name=\"baidu-site-verification\" content=\"szjdG38sKy\"><meta name=\"keywords\" content=\"抖音、抖音音樂、抖音短視頻、抖音官網、amemv\"><meta name=\"description\" content=\"抖音短視頻-記錄美好生活的視頻平台\"><meta name=\"apple-mobile-web-app-capable\" content=\"yes\"><meta name=\"apple-mobile-web-app-status-bar-style\" content=\"default\"><link rel=\"apple-touch-icon-precomposed\" href=\"//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/image/logo/logo_launcher_v2_40f12f4.png\"><link rel=\"shortcut icon\" href=\"//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/image/logo/favicon_v2_7145ff0.ico\" type=\"image/x-icon\"><meta http-equiv=\"X-UA-Compatible\" content=\"IE=Edge;chrome=1\"><meta name=\"screen-orientation\" content=\"portrait\"><meta name=\"x5-orientation\" content=\"portrait\"><script async=\"\" src=\"//www.google-analytics.com/analytics.js\"></script><script type=\"text/javascript\">!function(){function e(e){return this.config=e,this}e.prototype={reset:function(){var e=Math.min(document.documentElement.clientWidth,750)/750*100;document.documentElement.style.fontSize=e+\"px\";var t=parseFloat(window.getComputedStyle(document.documentElement).fontSize),n=e/t;1!=n&&(document.documentElement.style.fontSize=e*n+\"px\")}},window.Adapter=new e,window.Adapter.reset(),window.onload=function(){window.Adapter.reset()},window.onresize=function(){window.Adapter.reset()}}();</script> <meta name=\"screen-orientation\" content=\"portrait\"><meta name=\"x5-orientation\" content=\"portrait\"><script>tac='i)69eo056r4s!i$1afls\"0,<8~z|\x7f@QGNCJF[\\\\^D\\\\KFYSk~^WSZhg,(lfi~ah`{md\"inb|1d<,%Dscafgd\"in,8[xtm}nLzNEGQMKAdGG^NTY\x1ckgd\"inb<b|1d<g,&TboLr{m,(\x02)!jx-2n&vr$testxg,%@tug{mn ,%vrfkbm[!cb|'</script><script type=\"text/javascript\">!function(){function e(e){return this.config=e,this}e.prototype={reset:function(){var e=Math.min(document.documentElement.clientWidth,750)/750*100;document.documentElement.style.fontSize=e+\"px\";var t=parseFloat(window.getComputedStyle(document.documentElement).fontSize),n=e/t;1!=n&&(document.documentElement.style.fontSize=e*n+\"px\")}},window.Adapter=new e,window.Adapter.reset(),window.onload=function(){window.Adapter.reset()},window.onresize=function(){window.Adapter.reset()}}();</script><meta name=\"pathname\" content=\"aweme_mobile_user\"> <meta name=\"screen-orientation\" content=\"portrait\"><meta name=\"x5-orientation\" content=\"portrait\"><meta name=\"theme-color\" content=\"#161823\"><meta name=\"pathname\" content=\"aweme_mobile_video\"><link rel=\"dns-prefetch\" href=\"//s3.bytecdn.cn/\"><link rel=\"dns-prefetch\" href=\"//s3a.bytecdn.cn/\"><link rel=\"dns-prefetch\" href=\"//s3b.bytecdn.cn/\"><link rel=\"dns-prefetch\" href=\"//s0.pstatp.com/\"><link rel=\"dns-prefetch\" href=\"//s1.pstatp.com/\"><link rel=\"dns-prefetch\" href=\"//s2.pstatp.com/\"><link rel=\"dns-prefetch\" href=\"//v1-dy.ixigua.com/\"><link rel=\"dns-prefetch\" href=\"//v1-dy.ixiguavideo.com/\"><link rel=\"dns-prefetch\" href=\"//v3-dy.ixigua.com/\"><link rel=\"dns-prefetch\" href=\"//v3-dy.ixiguavideo.com/\"><link rel=\"dns-prefetch\" href=\"//v6-dy.ixigua.com/\"><link rel=\"dns-prefetch\" href=\"//v6-dy.ixiguavideo.com/\"><link rel=\"dns-prefetch\" href=\"//v9-dy.ixigua.com/\"><link rel=\"dns-prefetch\" href=\"//v9-dy.ixiguavideo.com/\"><link rel=\"dns-prefetch\" href=\"//v11-dy.ixigua.com/\"><link rel=\"dns-prefetch\" href=\"//v11-dy.ixiguavideo.com/\"><link rel=\"stylesheet\" href=\"//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/style/base_99078a4.css\"><style>@font-face{font-family:iconfont;src:url(//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/font/iconfont_9eadf2f.eot);src:url(//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/font/iconfont_9eadf2f.eot#iefix) format('embedded-opentype'),url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/font/iconfont_9eb9a50.woff) format('woff'),url(//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/font/iconfont_da2e2ef.ttf) format('truetype'),url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/font/iconfont_31180f7.svg#iconfont) format('svg')}.iconfont{font-family:iconfont!important;font-size:.24rem;font-style:normal;letter-spacing:-.045rem;margin-left:-.085rem}@font-face{font-family:icons;src:url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/iconfont_2f1b1cd.eot);src:url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/iconfont_2f1b1cd.eot#iefix) format('embedded-opentype'),url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/iconfont_87ad39c.woff) format('woff'),url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/iconfont_5848858.ttf) format('truetype'),url(//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/iconfont_20c7f77.svg#iconfont) format('svg')}.icons{font-family:icons!important;font-size:.24rem;font-style:normal;-webkit-font-smoothing:antialiased;-webkit-text-stroke-width:.2px;-moz-osx-font-smoothing:grayscale}@font-face{font-family:Ies;src:url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/Ies_317064f.woff2?ba9fc668cd9544e80b6f5998cdce1672) format(\"woff2\"),url(//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/Ies_a07f3d4.woff?ba9fc668cd9544e80b6f5998cdce1672) format(\"woff\"),url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/Ies_4c0d8be.ttf?ba9fc668cd9544e80b6f5998cdce1672) format(\"truetype\"),url(//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/static/icons/Ies_1ac3f94.svg?ba9fc668cd9544e80b6f5998cdce1672#Ies) format(\"svg\")}i{line-height:1}i[class^=ies-]:before,i[class*=\" ies-\"]:before{font-family:Ies!important;font-style:normal;font-weight:400!important;font-variant:normal;text-transform:none;line-height:1;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}.ies-checked:before{content:\"\\f101\"}.ies-chevron-left:before{content:\"\\f102\"}.ies-chevron-right:before{content:\"\\f103\"}.ies-clear:before{content:\"\\f104\"}.ies-close:before{content:\"\\f105\"}.ies-copy:before{content:\"\\f106\"}.ies-delete:before{content:\"\\f107\"}.ies-edit:before{content:\"\\f108\"}.ies-help-circle:before{content:\"\\f109\"}.ies-info:before{content:\"\\f10a\"}.ies-loading:before{content:\"\\f10b\"}.ies-location:before{content:\"\\f10c\"}.ies-paste:before{content:\"\\f10d\"}.ies-plus:before{content:\"\\f10e\"}.ies-query:before{content:\"\\f10f\"}.ies-remove:before{content:\"\\f110\"}.ies-search:before{content:\"\\f111\"}.ies-settings:before{content:\"\\f112\"}.ies-shopping-bag:before{content:\"\\f113\"}.ies-sort-left:before{content:\"\\f114\"}.ies-sort-right:before{content:\"\\f115\"}.ies-title-decorate-left:before{content:\"\\f116\"}.ies-title-decorate-right:before{content:\"\\f117\"}.ies-triangle-right:before{content:\"\\f118\"}.ies-triangle-top:before{content:\"\\f119\"}.ies-video:before{content:\"\\f11a\"}</style> <link rel=\"stylesheet\" href=\"//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/component/loading/index_5108ff2.css\">\n" + "<link rel=\"stylesheet\" href=\"//s3.bytecdn.cn/ies/resource/falcon/douyin_falcon/component/banner/index_3941ffc.css\">\n" + "<link rel=\"stylesheet\" href=\"//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/component/common/openBrowser/index_2c31596.css\">\n" + "<link rel=\"stylesheet\" href=\"//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/page/reflow_user/index_ecb0bc9.css\">\n" + "<link rel=\"stylesheet\" href=\"//s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/pkg/video_93fd288.css\"></head>", i = {}, o = {}, a = {}, u = {}, c = {}, s = {}, l = function (t, n) { if (!(t in u)) { u[t] = !0; var i = document.createElement("script"); if (n) { var o = setTimeout(n, e.timeout); i.onerror = function () { clearTimeout(o), n() } ; var a = function () { clearTimeout(o) }; "onload" in i ? i.onload = a : i.onreadystatechange = function () { ("loaded" === this.readyState || "complete" === this.readyState) && a() } } return i.type = "text/javascript", i.src = t, r.appendChild(i), i } }, f = function (t, e, n) { var r = i[t] || (i[t] = []); r.push(e); var o, a = c[t] || c[t + ".js"] || {}, u = a.pkg; o = u ? s[u].url || s[u].uri : a.url || a.uri || t, l(o, n && function () { n(t) } ) }; n = function (t, e) { "function" != typeof e && (e = arguments[2]), t = t.replace(/\.js$/i, ""), o[t] = e; var n = i[t]; if (n) { for (var r = 0, a = n.length; a > r; r++) n[r](); delete i[t] } } , e = function (t) { if (t && t.splice) return e.async.apply(this, arguments); t = e.alias(t); var n = a[t]; if (n) return n.exports; var r = o[t]; if (!r) throw "[ModJS] Cannot find module `" + t + "`"; n = a[t] = { exports: {} }; var i = "function" == typeof r ? r.apply(n, [e, n.exports, n]) : r; return i && (n.exports = i), n.exports && !n.exports["default"] && Object.defineProperty && Object.isExtensible(n.exports) && Object.defineProperty(n.exports, "default", { value: n.exports }), n.exports } , e.async = function (n, r, i) { function a(t) { for (var n, r = 0, h = t.length; h > r; r++) { var p = e.alias(t[r]); p in o ? (n = c[p] || c[p + ".js"], n && "deps" in n && a(n.deps)) : p in s || (s[p] = !0, l++, f(p, u, i), n = c[p] || c[p + ".js"], n && "deps" in n && a(n.deps)) } } function u() { if (0 === l--) { for (var i = [], o = 0, a = n.length; a > o; o++) i[o] = e(n[o]); r && r.apply(t, i) } } "string" == typeof n && (n = [n]); var s = {} , l = 0; a(n), u() } , e.resourceMap = function (t) { var e, n; n = t.res; for (e in n) n.hasOwnProperty(e) && (c[e] = n[e]); n = t.pkg; for (e in n) n.hasOwnProperty(e) && (s[e] = n[e]) } , e.loadJs = function (t) { l(t) } , e.loadCss = function (t) { if (t.content) { var e = document.createElement("style"); e.type = "text/css", e.styleSheet ? e.styleSheet.cssText = t.content : e.innerHTML = t.content, r.appendChild(e) } else if (t.url) { var n = document.createElement("link"); n.href = t.url, n.rel = "stylesheet", n.type = "text/css", r.appendChild(n) } } , e.alias = function (t) { return t.replace(/\.js$/i, "") } , e.timeout = 5e3, t.__M.define = n, t.__M.require = e } }(this); this.__M.define("douyin_falcon:node_modules/byted-acrawler/dist/runtime", function (l, e) { Function(function (l) { return 'e(e,a,r){(b[e]||(b[e]=t("x,y","x "+e+" y")(r,a)}a(e,a,r){(k[r]||(k[r]=t("x,y","new x[y]("+Array(r+1).join(",x[y]")(1)+")")(e,a)}r(e,a,r){n,t,s={},b=s.d=r?r.d+1:0;for(s["$"+b]=s,t=0;t<b;t)s[n="$"+t]=r[n];for(t=0,b=s=a;t<b;t)s[t]=a[t];c(e,0,s)}c(t,b,k){u(e){v[x]=e}f{g=,ting(bg)}l{try{y=c(t,b,k)}catch(e){h=e,y=l}}for(h,y,d,g,v=[],x=0;;)switch(g=){case 1:u(!)4:f5:u((e){a=0,r=e;{c=a<r;c&&u(e[a]),c}}(6:y=,u((y8:if(g=,lg,g=,y===c)b+=g;else if(y!==l)y9:c10:u(s(11:y=,u(+y)12:for(y=f,d=[],g=0;g<y;g)d[g]=y.charCodeAt(g)^g+y;u(String.fromCharCode.apply(null,d13:y=,h=delete [y]14:59:u((g=)?(y=x,v.slice(x-=g,y:[])61:u([])62:g=,k[0]=65599*k[0]+k[1].charCodeAt(g)>>>065:h=,y=,[y]=h66:u(e(t[b],,67:y=,d=,u((g=).x===c?r(g.y,y,k):g.apply(d,y68:u(e((g=t[b])<"<"?(b--,f):g+g,,70:u(!1)71:n72:+f73:u(parseInt(f,3675:if(){bcase 74:g=<<16>>16g76:u(k[])77:y=,u([y])78:g=,u(a(v,x-=g+1,g79:g=,u(k["$"+g])81:h=,[f]=h82:u([f])83:h=,k[]=h84:!085:void 086:u(v[x-1])88:h=,y=,h,y89:u({e{r(e.y,arguments,k)}e.y=f,e.x=c,e})90:null91:h93:h=0:;default:u((g<<16>>16)-16)}}n=this,t=n.Function,s=Object.keys||(e){a={},r=0;for(c in e)a[r]=c;a=r,a},b={},k={};r'.replace(/[-]/g, function (e) { return l[15 & e.charCodeAt(0)] }) }("v[x++]=v[--x]t.charCodeAt(b++)-32function return ))++.substrvar .length(),b+=;break;case ;break}".split("")))()('gr$Daten Иb/s!l y͒yĹg,(lfi~ah`{mv,-n|jqewVxp{rvmmx,&effkx[!cs"l".Pq%widthl"@q&heightl"vr*getContextx$"2d[!cs#l#,*;?|u.|uc{uq$fontl#vr(fillTextx$$龘ฑภ경2<[#c}l#2q*shadowBlurl#1q-shadowOffsetXl#$$limeq+shadowColorl#vr#arcx88802[%c}l#vr&strokex[ c}l"v,)}eOmyoZB]mx[ cs!0s$l$Pb<k7l l!r&lengthb%^l$1+s$jl s#i$1ek1s$gr#tack4)zgr#tac$! +0o![#cj?o ]!l$b%s"o ]!l"l$b*b^0d#>>>s!0s%yA0s"l"l!r&lengthb<k+l"^l"1+s"jl s&l&z0l!$ +["cs\'(0l#i\'1ps9wxb&s() &{s)/s(gr&Stringr,fromCharCodes)0s*yWl ._b&s o!])l l Jb<k$.aj;l .Tb<k$.gj/l .^b<k&i"-4j!+& s+yPo!]+s!l!l Hd>&l!l Bd>&+l!l <d>&+l!l 6d>&+l!l &+ s,y=o!o!]/q"13o!l q"10o!],l 2d>& s.{s-yMo!o!]0q"13o!]*Ld<l 4d#>>>b|s!o!l q"10o!],l!& s/yIo!o!].q"13o!],o!]*Jd<l 6d#>>>b|&o!]+l &+ s0l-l!&l-l!i\'1z141z4b/@d<l"b|&+l-l(l!b^&+l-l&zl\'g,)gk}ejo{cm,)|yn~Lij~em["cl$b%@d<l&zl\'l $ +["cl$b%b|&+l-l%8d<@b|l!b^&+ q$sign ', [Object.defineProperty(e, "__esModule", {value: !0})]) }); _bytedAcrawler = this.__M.require("douyin_falcon:node_modules/byted-acrawler/dist/runtime"); signature = _bytedAcrawler.sign(process.argv[2]) console.log(signature);
備注:使用命令行執行node 執行js時,如果傳參,就需要一個process.argv的東東。
process.argv 是一個包含命令行參數的數組。第一參數是“節點”,第二個是js的文件名。接下來的就是我們要的命令行參數。
所以process.argv[2]得到的值就是我們傳的參數。
現在我們通過終端可以得到這個隨機的值了,下一步是如何在python中執行
import os
import subprocess
# 方式一 signature = os.popen('node D:\Python\pachong\signed.js %s'%user_id) print(signature.read()) # 方式二 signat = subprocess.getoutput('node D:\Python\pachong\signed.js %s'%user_id) print(signat)
發送請求:
import requests import subprocess user_id = '58841646784' signature = subprocess.getoutput('node signed.js %s' %user_id) user_video_list = [] # ############################# 獲取個人作品 ########################## user_video_params = { 'user_id': str(user_id), 'count': '21', 'max_cursor': '0', 'aid': '1128', '_signature': signature, 'dytk': 'b4dceed99803a04a1c4395ffc81f3dbc' # '114f1984d1917343ccfb14d94e7ce5f5' } res = requests.get( url="https://www.douyin.com/aweme/v1/aweme/post/", params=user_video_params, headers={ 'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36', 'x-requested-with':'XMLHttpRequest', 'referer':'https://www.douyin.com/share/user/58841646784', } ) print(res.text)
運行得到結果:{"status_code": 0, "has_more": true, "aweme_list": []}
has_more為true表示請求正確,所以要多爬幾次。
爬出代碼示例:
import requests user_id = '58841646784' # 6556303280 # 獲取小姐姐的所有作品 """ signature = _bytedAcrawler.sign('用戶ID') douyin_falcon:node_modules/byted-acrawler/dist/runtime """ import subprocess signature = subprocess.getoutput('node s1.js %s' %user_id) user_video_list = [] # ############################# 獲取個人作品 ########################## user_video_params = { 'user_id': str(user_id), 'count': '21', 'max_cursor': '0', 'aid': '1128', '_signature': signature, 'dytk': 'b4dceed99803a04a1c4395ffc81f3dbc' # '114f1984d1917343ccfb14d94e7ce5f5' } def get_aweme_list(max_cursor=None): if max_cursor: user_video_params['max_cursor'] = str(max_cursor) res = requests.get( url="https://www.douyin.com/aweme/v1/aweme/post/", params=user_video_params, headers={ 'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36', 'x-requested-with':'XMLHttpRequest', 'referer':'https://www.douyin.com/share/user/58841646784', } ) content_json = res.json() aweme_list = content_json.get('aweme_list', []) user_video_list.extend(aweme_list) if content_json.get('has_more') == 1: return get_aweme_list(content_json.get('max_cursor')) get_aweme_list() # ############################# 獲取喜歡作品 ########################## favor_video_list = [] favor_video_params = { 'user_id': str(user_id), 'count': '21', 'max_cursor': '0', 'aid': '1128', '_signature': signature, 'dytk': 'b4dceed99803a04a1c4395ffc81f3dbc' } def get_favor_list(max_cursor=None): if max_cursor: favor_video_params['max_cursor'] = str(max_cursor) res = requests.get( url="https://www.douyin.com/aweme/v1/aweme/favorite/", params=favor_video_params, headers={ 'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36', 'x-requested-with':'XMLHttpRequest', 'referer':'https://www.douyin.com/share/user/58841646784', } ) content_json = res.json() aweme_list = content_json.get('aweme_list', []) favor_video_list.extend(aweme_list) if content_json.get('has_more') == 1: return get_favor_list(content_json.get('max_cursor')) get_favor_list() # ############################# 視頻下載 ########################## for item in user_video_list: video_id = item['video']['play_addr']['uri'] video = requests.get( url='https://aweme.snssdk.com/aweme/v1/playwm/', params={ 'video_id':video_id }, headers={ 'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36', 'x-requested-with': 'XMLHttpRequest', 'referer': 'https://www.douyin.com/share/user/58841646784', }, stream=True ) file_name = video_id + '.mp4' with open(file_name,'wb') as f: for line in video.iter_content(): f.write(line) for item in favor_video_list: video_id = item['video']['play_addr']['uri'] video = requests.get( url='https://aweme.snssdk.com/aweme/v1/playwm/', params={ 'video_id':video_id }, headers={ 'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36', 'x-requested-with': 'XMLHttpRequest', 'referer': 'https://www.douyin.com/share/user/58841646784', }, stream=True ) file_name = video_id + '.mp4' with open(file_name, 'wb') as f: for line in video.iter_content(): f.write(line)