1、酷狗音樂型md5加密給我上身體(這應該就是加密了吧,,要不然挺尷尬T_T),我這個不是爬取酷狗TOP500,而是搜索之后在下載歌曲
如下圖上,當你播放歌曲跳到另一個頁面(酷狗有一個專門播放歌曲的頁面),F12打開network,然后刷新頁面,就會發現歌曲下載地址在下圖所示類型數據包中
2、然后我們去分析它的請求頭
https://wwwapi.kugou.com/yy/index.php?r=play/getdata&callback=jQuery1910026577801017397817_1596164131720&hash=AC9B095DCF364F751BCED2E148638AFA&album_id=&dfid=0zPSYM1q2qH90xArIr1yOGtc&mid=5cd316b98602d9b29f31085c96e6c682&platid=4&_=1596164131721
他用的get請求將參數加到鏈接后面,我們只需要分析一下參數就可以了,這里我就直接說結果了
只需要它的hash值改變,那么這個請求鏈接就是那首歌的信息
也就可以說hash值相當於歌曲的唯一標識id
3、然后我們去找hash值,剛開始以為hash值是在js文件中生成的,搜索之后部分js代碼如下
找了半天都沒有找到是怎么來的,代碼里面好像option就是參數,其他地方直接傳過來的,,,那么估計hash值不是js中生成的了,那么就去其他文件里面找找有沒有這個hash值的來源
在這個文件中有歌曲的信息,你把列表信息打開,就會發現有一個FileHash,和哪個一對比,果然一樣
(我的不一樣,是因為這兩首歌都不一樣,我只是給你們說一下過程,不要在意細節)
然后我們就要去分析這個數據包的請求頭
https://complexsearch.kugou.com/v2/search/song?callback=callback123&keyword=%E7%A8%BB%E9%A6%99&page=1&pagesize=30&bitrate=0&isfuzzy=0&tag=em&inputtype=0&platform=WebFilter&userid=-1&clientver=2000&iscorrection=1&privilege_filter=0&srcappid=2919&clienttime=1596167570360&mid=1596167570360&uuid=1596167570360&dfid=-&signature=BA1355A64A559B7A779E96825A7631E8 callback: callback123 keyword: 稻香 page: 1 pagesize: 30 bitrate: 0 isfuzzy: 0 tag: em inputtype: 0 platform: WebFilter userid: -1 clientver: 2000 iscorrection: 1 privilege_filter: 0 srcappid: 2919 clienttime: 1596167570360 mid: 1596167570360 uuid: 1596167570360 dfid: - signature: BA1355A64A559B7A779E96825A7631E8
這個請求也是get請求,我們分析它的參數就可以了,你多分析幾個這樣的數據包,看一下參數就會看出來那些值在改變,這里我說結果
clienttime和mid和uuid和signature一直在改變,而且clienttime和mid和uuid的值都一樣
然后我們就就接着去找js代碼
大致一找就會發現clienttime和mid和uuid的值都是時間戳(我怎么說看着這么眼熟),但是沒有signature
o.join方法就是把o這個列表里面的信息變成一個字符串,例如:
<script type="text/javascript"> var arr = new Array(3) arr[0] = "George" arr[1] = "John" arr[2] = "Thomas" document.write(arr.join()) </script> 輸出: George,John,Thomas
我們斷點調試發現o里面的數據是
0: "NVPh5oo715z5DIWAeQlhMDsWXXQV4hwt" 1: "bitrate=0" 2: "callback=callback123" 3: "clienttime=1596179805676" 4: "clientver=2000" 5: "dfid=-" 6: "inputtype=0" 7: "iscorrection=1" 8: "isfuzzy=0" 9: "keyword=稻香" 10: "mid=1596179805676" 11: "page=1" 12: "pagesize=30" 13: "platform=WebFilter" 14: "privilege_filter=0" 15: "srcappid=2919" 16: "tag=em" 17: "userid=-1" 18: "uuid=1596179805676" 19: "NVPh5oo715z5DIWAeQlhMDsWXXQV4hwt" length: 20
然后我們去看faultylabs.MD5代碼

1 faultylabs.MD5 = function(a) { 2 function b(a) { 3 var b = (a >>> 0).toString(16); 4 return "00000000".substr(0, 8 - b.length) + b 5 } 6 function c(a) { 7 for (var b = [], c = 0; c < a.length; c++) 8 b = b.concat(k(a[c])); 9 return b 10 } 11 function d(a) { 12 for (var b = [], c = 0; 8 > c; c++) 13 b.push(255 & a), 14 a >>>= 8; 15 return b 16 } 17 function e(a, b) { 18 return a << b & 4294967295 | a >>> 32 - b 19 } 20 function f(a, b, c) { 21 return a & b | ~a & c 22 } 23 function g(a, b, c) { 24 return c & a | ~c & b 25 } 26 function h(a, b, c) { 27 return a ^ b ^ c 28 } 29 function i(a, b, c) { 30 return b ^ (a | ~c) 31 } 32 function j(a, b) { 33 return a[b + 3] << 24 | a[b + 2] << 16 | a[b + 1] << 8 | a[b] 34 } 35 function k(a) { 36 for (var b = [], c = 0; c < a.length; c++) 37 if (a.charCodeAt(c) <= 127) 38 b.push(a.charCodeAt(c)); 39 else 40 for (var d = encodeURIComponent(a.charAt(c)).substr(1).split("%"), e = 0; e < d.length; e++) 41 b.push(parseInt(d[e], 16)); 42 return b 43 } 44 function l() { 45 for (var a = "", c = 0, d = 0, e = 3; e >= 0; e--) 46 d = arguments[e], 47 c = 255 & d, 48 d >>>= 8, 49 c <<= 8, 50 c |= 255 & d, 51 d >>>= 8, 52 c <<= 8, 53 c |= 255 & d, 54 d >>>= 8, 55 c <<= 8, 56 c |= d, 57 a += b(c); 58 return a 59 } 60 function m(a) { 61 for (var b = new Array(a.length), c = 0; c < a.length; c++) 62 b[c] = a[c]; 63 return b 64 } 65 function n(a, b) { 66 return 4294967295 & a + b 67 } 68 function o() { 69 function a(a, b, c, d) { 70 var f = v; 71 v = u, 72 u = t, 73 t = n(t, e(n(s, n(a, n(b, c))), d)), 74 s = f 75 } 76 var b = p.length; 77 p.push(128); 78 var c = p.length % 64; 79 if (c > 56) { 80 for (var k = 0; 64 - c > k; k++) 81 p.push(0); 82 c = p.length % 64 83 } 84 for (k = 0; 56 - c > k; k++) 85 p.push(0); 86 p = p.concat(d(8 * b)); 87 var m = 1732584193 88 , o = 4023233417 89 , q = 2562383102 90 , r = 271733878 91 , s = 0 92 , t = 0 93 , u = 0 94 , v = 0; 95 for (k = 0; k < p.length / 64; k++) { 96 s = m, 97 t = o, 98 u = q, 99 v = r; 100 var w = 64 * k; 101 a(f(t, u, v), 3614090360, j(p, w), 7), 102 a(f(t, u, v), 3905402710, j(p, w + 4), 12), 103 a(f(t, u, v), 606105819, j(p, w + 8), 17), 104 a(f(t, u, v), 3250441966, j(p, w + 12), 22), 105 a(f(t, u, v), 4118548399, j(p, w + 16), 7), 106 a(f(t, u, v), 1200080426, j(p, w + 20), 12), 107 a(f(t, u, v), 2821735955, j(p, w + 24), 17), 108 a(f(t, u, v), 4249261313, j(p, w + 28), 22), 109 a(f(t, u, v), 1770035416, j(p, w + 32), 7), 110 a(f(t, u, v), 2336552879, j(p, w + 36), 12), 111 a(f(t, u, v), 4294925233, j(p, w + 40), 17), 112 a(f(t, u, v), 2304563134, j(p, w + 44), 22), 113 a(f(t, u, v), 1804603682, j(p, w + 48), 7), 114 a(f(t, u, v), 4254626195, j(p, w + 52), 12), 115 a(f(t, u, v), 2792965006, j(p, w + 56), 17), 116 a(f(t, u, v), 1236535329, j(p, w + 60), 22), 117 a(g(t, u, v), 4129170786, j(p, w + 4), 5), 118 a(g(t, u, v), 3225465664, j(p, w + 24), 9), 119 a(g(t, u, v), 643717713, j(p, w + 44), 14), 120 a(g(t, u, v), 3921069994, j(p, w), 20), 121 a(g(t, u, v), 3593408605, j(p, w + 20), 5), 122 a(g(t, u, v), 38016083, j(p, w + 40), 9), 123 a(g(t, u, v), 3634488961, j(p, w + 60), 14), 124 a(g(t, u, v), 3889429448, j(p, w + 16), 20), 125 a(g(t, u, v), 568446438, j(p, w + 36), 5), 126 a(g(t, u, v), 3275163606, j(p, w + 56), 9), 127 a(g(t, u, v), 4107603335, j(p, w + 12), 14), 128 a(g(t, u, v), 1163531501, j(p, w + 32), 20), 129 a(g(t, u, v), 2850285829, j(p, w + 52), 5), 130 a(g(t, u, v), 4243563512, j(p, w + 8), 9), 131 a(g(t, u, v), 1735328473, j(p, w + 28), 14), 132 a(g(t, u, v), 2368359562, j(p, w + 48), 20), 133 a(h(t, u, v), 4294588738, j(p, w + 20), 4), 134 a(h(t, u, v), 2272392833, j(p, w + 32), 11), 135 a(h(t, u, v), 1839030562, j(p, w + 44), 16), 136 a(h(t, u, v), 4259657740, j(p, w + 56), 23), 137 a(h(t, u, v), 2763975236, j(p, w + 4), 4), 138 a(h(t, u, v), 1272893353, j(p, w + 16), 11), 139 a(h(t, u, v), 4139469664, j(p, w + 28), 16), 140 a(h(t, u, v), 3200236656, j(p, w + 40), 23), 141 a(h(t, u, v), 681279174, j(p, w + 52), 4), 142 a(h(t, u, v), 3936430074, j(p, w), 11), 143 a(h(t, u, v), 3572445317, j(p, w + 12), 16), 144 a(h(t, u, v), 76029189, j(p, w + 24), 23), 145 a(h(t, u, v), 3654602809, j(p, w + 36), 4), 146 a(h(t, u, v), 3873151461, j(p, w + 48), 11), 147 a(h(t, u, v), 530742520, j(p, w + 60), 16), 148 a(h(t, u, v), 3299628645, j(p, w + 8), 23), 149 a(i(t, u, v), 4096336452, j(p, w), 6), 150 a(i(t, u, v), 1126891415, j(p, w + 28), 10), 151 a(i(t, u, v), 2878612391, j(p, w + 56), 15), 152 a(i(t, u, v), 4237533241, j(p, w + 20), 21), 153 a(i(t, u, v), 1700485571, j(p, w + 48), 6), 154 a(i(t, u, v), 2399980690, j(p, w + 12), 10), 155 a(i(t, u, v), 4293915773, j(p, w + 40), 15), 156 a(i(t, u, v), 2240044497, j(p, w + 4), 21), 157 a(i(t, u, v), 1873313359, j(p, w + 32), 6), 158 a(i(t, u, v), 4264355552, j(p, w + 60), 10), 159 a(i(t, u, v), 2734768916, j(p, w + 24), 15), 160 a(i(t, u, v), 1309151649, j(p, w + 52), 21), 161 a(i(t, u, v), 4149444226, j(p, w + 16), 6), 162 a(i(t, u, v), 3174756917, j(p, w + 44), 10), 163 a(i(t, u, v), 718787259, j(p, w + 8), 15), 164 a(i(t, u, v), 3951481745, j(p, w + 36), 21), 165 m = n(m, s), 166 o = n(o, t), 167 q = n(q, u), 168 r = n(r, v) 169 } 170 return l(r, q, o, m).toUpperCase() 171 } 172 var p = null 173 , q = null; 174 return "string" == typeof a ? p = k(a) : a.constructor == Array ? 0 === a.length ? p = a : "string" == typeof a[0] ? p = c(a) : "number" == typeof a[0] ? p = a : q = typeof a[0] : "undefined" != typeof ArrayBuffer ? a instanceof ArrayBuffer ? p = m(new Uint8Array(a)) : a instanceof Uint8Array || a instanceof Int8Array ? p = m(a) : a instanceof Uint32Array || a instanceof Int32Array || a instanceof Uint16Array || a instanceof Int16Array || a instanceof Float32Array || a instanceof Float64Array ? p = m(new Uint8Array(a.buffer)) : q = typeof a : q = typeof a, 175 q && alert("MD5 type mismatch, cannot process " + q), 176 o() 177 }
我感覺我已經分析不出來了,兄弟們,誰要是弄出來了給我說一說可不^_^,我感覺太太太太太復雜了!!!!!
雖然我們不能從頭開始模擬,但是我們可以下載一首歌曲找到這首歌曲的hash值,然后再去模擬請求
這也是沒有辦法中的辦法了,唉,沒辦法,,,這加密我也看不懂呀(但是代碼還是可以爬取vip音樂)
注意:
請求到的數據包不是json格式,不能轉化成字典了,,,唉,,,,,可以用re模塊來弄
然后就是最后歌曲的鏈接也需要處理一下,因為原鏈接里面包含"\"
代碼:
1 import requests 2 import hashlib 3 import time 4 import json 5 import re 6 7 class Spider: 8 def __init__(self): 9 self.headers = { 10 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36' 11 12 } 13 ''' 14 self.url = 'https://complexsearch.kugou.com/v2/search/song?' 15 music_name = input('輸入歌曲名字:') 16 self.clienttime = input('輸入參數時間戳:') 17 self.signature = input('輸入參數hash值:') 18 19 self.data = [ 20 'callback=callback123', 21 'keyword={}'.format(music_name), 22 'page=1', 23 'pagesize=30', 24 'bitrate=0', 25 'isfuzzy=0', 26 'tag=em', 27 'inputtype=0', 28 'platform=WebFilter', 29 'userid=-1', 30 'clientver=2000', 31 'iscorrection=1', 32 'privilege_filter=0', 33 'srcappid=2919', 34 'clienttime={}'.format(self.clienttime), 35 'mid={}'.format(self.clienttime), 36 'uuid={}'.format(self.clienttime), 37 'dfid=-', 38 'signature={}'.format(self.signature) 39 ]''' 40 ''' 41 def get_para(self): 42 u = "&".join(self.data) 43 response = requests.get(self.url+u, headers=self.headers) 44 print(response.text) 45 response.encoding = response.apparent_encoding # 這個apparent_encoding就是讓系統根據頁面來判斷用何種編碼 46 response = response.json() # 得到josn字典dict 47 print('----------',response) 48 music_list = response['data']['lists'] 49 print("共計" + str(len(music_list)) + "結果: ") 50 self.all_singers = [] # 放置所有歌手人名 51 self.names = [] # 放置歌曲名字 52 self.all_hash = [] # 放置所有rid,rid是網頁所需參數 53 a = 0 54 for music in music_list: 55 # print(music) 56 singer = music["SingerName"] # 歌手名 57 name = str(a) + " " + music["FileName"] # 歌曲名 58 rid = music["FileHash"] # 取出rid,之后要對這個字符串進行切割 59 60 self.all_singers.append(singer) # 將對應信息放到列表中 61 self.names.append(name) 62 self.all_hash.append(rid) 63 a = a + 1 64 infs = dict(zip(self.names, self.all_singers)) 65 infs = json.dumps(infs, ensure_ascii=False, indent=4, separators=(',', ':')) 66 infs = infs.replace('"', ' ') 67 infs = infs.replace(':', '——————') 68 print(infs) 69 self.get_music_playurl()''' 70 71 def down_music(self,url): 72 music = requests.get(url, headers=self.headers).content 73 with open('1.mp3', 'wb') as f: 74 f.write(music) 75 76 def get_music_playurl(self): 77 self.url2 = 'https://wwwapi.kugou.com/yy/index.php?' 78 hash_result = input("請輸入需下載歌曲的哈希值:") 79 self.data2 = [ 80 'r=play/getdata', 81 'callback=jQuery1910026577801017397817_1596164131720', 82 'hash={}'.format(hash_result), 83 'album_id=', 84 'dfid=0zPSYM1q2qH90xArIr1yOGtc', 85 'mid=5cd316b98602d9b29f31085c96e6c682', 86 'platid=4', 87 '_=1596164131722' 88 ] 89 u = "&".join(self.data2) 90 response = requests.get(self.url2+u, headers=self.headers) 91 html = response.text 92 pattern = 'play_url":"(.*?)"' 93 url = re.search(pattern, html).group(1) 94 url = url.replace('\\','') 95 print(url) 96 self.down_music(url) 97 98 ''' 99 def get_md5(t): 100 t = t.encode('utf-8') 101 md5 = hashlib.md5(t).hexdigest() 102 return md5 103 ''' 104 if __name__ == '__main__': 105 spider = Spider() 106 spider.get_music_playurl() 107 108 ''' 109 例子:稻香 110 0A62227CAAB66F54D43EC084B4BDD81F 111 '''