簡單爬蟲操作：1.簡單爬取網頁數據並輸出 2.爬取數據打印到xls表格中

本文轉載自查看原文 2020-03-27 21:31 868 雜記

安裝python環境參考菜鳥教程：

傳送門：https://www.runoob.com/w3cnote/python-pip-install-usage.html

1..簡單爬取網頁數據並輸出

import requests from lxml import etree import xlwt # 獲取源碼 html = requests.get("https://www.ghpym.com/category/videos") # 打印源碼 #print (html.text)  etree_html = etree.HTML(html.text) #將源碼轉化為能被 XPath 匹配的格式 # #//*[@id="wrap"]/div/div/div/ul/li[1]/div[2]/h2/a/text() content = etree_html.xpath('//*[@id="wrap"]/div/div/div/ul/li/div[2]/h2/a/@href') for each in content: replace = each.replace('\n','').replace(' ','') #去掉換行符和空格 if replace =='\n' or replace == "": continue else: print (replace) content = etree_html.xpath('//*[@id="wrap"]/div/div/div/ul/li/div[2]/h2/a/text()') for each in content: replace = each.replace('\n','').replace(' ','') if replace =='\n' or replace == "": continue else: print (replace) print("完成")

2.爬取數據打印到xls表格中

# coding:utf-8 from lxml import etree import requests import xlwt title=[] def get_film_name(url): html = requests.get(url).text #這里一般先打印一下 html 內容，看看是否有內容再繼續。 #print(html) s=etree.HTML(html) #將源碼轉化為能被 XPath 匹配的格式 filename =s.xpath('//*[@id="wrap"]/div/div/div/ul/li/div[2]/h2/a/@href') #返回為一列表 print (filename) title.extend(filename) def get_all_film_name(): for i in range(0, 250, 25): url = 'https://www.ghpym.com/category/videos'.format(i) get_film_name(url) if '_main_': myxls=xlwt.Workbook() sheet1=myxls.add_sheet(u'top250',cell_overwrite_ok=True) get_all_film_name() for i in range(0,len(title)): sheet1.write(i,0,i+1) sheet1.write(i,1,title[i]) myxls.save('top250.xls') print("完成")

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python爬蟲——爬取網頁數據和解析數據最簡單的爬蟲：用Pandas爬取表格數據 python爬蟲——爬取網頁數據和解析數據 Python 爬蟲爬取多頁數據 C# 爬取網頁數據 python爬取網頁數據 python爬取網頁數據方法 curl ——爬取網頁數據如何輕松爬取網頁數據？ pycharm爬取網頁數據