簡單爬蟲操作:1.簡單爬取網頁數據並輸出 2.爬取數據打印到xls表格中


 

 

安裝python環境參考菜鳥教程:

傳送門:https://www.runoob.com/w3cnote/python-pip-install-usage.html

1..簡單爬取網頁數據並輸出

 

 

import requests from lxml import etree import xlwt # 獲取源碼 html = requests.get("https://www.ghpym.com/category/videos") # 打印源碼 #print (html.text)  etree_html = etree.HTML(html.text) #將源碼轉化為能被 XPath 匹配的格式 # #//*[@id="wrap"]/div/div/div/ul/li[1]/div[2]/h2/a/text() content = etree_html.xpath('//*[@id="wrap"]/div/div/div/ul/li/div[2]/h2/a/@href') for each in content: replace = each.replace('\n','').replace(' ','') #去掉換行符和空格 if replace =='\n' or replace == "": continue else: print (replace) content = etree_html.xpath('//*[@id="wrap"]/div/div/div/ul/li/div[2]/h2/a/text()') for each in content: replace = each.replace('\n','').replace(' ','') if replace =='\n' or replace == "": continue else: print (replace) print("完成")

 

2.爬取數據打印到xls表格中

 

 

# coding:utf-8 from lxml import etree import requests import xlwt title=[] def get_film_name(url): html = requests.get(url).text #這里一般先打印一下 html 內容,看看是否有內容再繼續。 #print(html) s=etree.HTML(html) #將源碼轉化為能被 XPath 匹配的格式 filename =s.xpath('//*[@id="wrap"]/div/div/div/ul/li/div[2]/h2/a/@href') #返回為一列表 print (filename) title.extend(filename) def get_all_film_name(): for i in range(0, 250, 25): url = 'https://www.ghpym.com/category/videos'.format(i) get_film_name(url) if '_main_': myxls=xlwt.Workbook() sheet1=myxls.add_sheet(u'top250',cell_overwrite_ok=True) get_all_film_name() for i in range(0,len(title)): sheet1.write(i,0,i+1) sheet1.write(i,1,title[i]) myxls.save('top250.xls') print("完成")

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM