爬取小紅書


1.打開要爬取的網頁https://tophub.today/n/L4MdA5ldxD

2.按F12獲取headers

3.右鍵查看源代碼

4.代碼實現

import requests
import pandas as pd
from bs4 import BeautifulSoup
from pandas import DataFrame
url='https://tophub.today/n/L4MdA5ldxD'
def getHTMLText(url):
    try:
        headers={'user-agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3314.0 Safari/537.36 SE 2.X MetaSr 1.0'}
        r=requests.get(url,timeout=30,headers=headers)
        r.raise_for_status()
        r.encoding='utf-8'
        return r.text
    except:
        return'異常'
def saveHTMLText(title,html,c):
        soup=BeautifulSoup(html,'html.parser')
        a=soup.find_all('span',class_='t')
        print('排名', '標題')
        index=[i for i in range(c)]
        print(index)
        title.append(a)      
        title=[]
        saveHTMLText(title,html,c=10)
        html=getHTMLText(url)
        df=pd.DataFrame(title,columns=['排名','標題'])
        print(df.T)

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM