使用python爬蟲爬取股票數據

本文轉載自查看原文 2017-11-08 20:43 6913 爬蟲/ python/ bs4/ re/ python爬蟲/ requests

前言：

編寫一個爬蟲腳本，用於爬取東方財富網的上海股票代碼，並通過爬取百度股票的單個股票數據，將所有上海股票數據爬取下來並保存到本地文件中

系統環境：

64位win10系統，64位python3.6,IDE位pycharm

預備知識：

BeautifulSoup的基本知識，re正則表達式的基本知識

代碼：

import requests
from bs4 import BeautifulSoup
import traceback
import re
def getHTMLText(url):
    try:
        user_agent = '自己的瀏覽器頭部信息'
        headers = {'User-Agent': user_agent}
        r = requests.get(url,headers = headers,timeout = 30)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except: 
        return ""

def getStockList(lst,stock_list_url):
    html = getHTMLText(stock_list_url)
    soup = BeautifulSoup(html,'html.parser')
    a = soup.find_all('a')
    for i in a:
        try:
            href = i.attrs['href']
            lst.append(re.findall(r"sh\d{6}",href)[0])
            #print(lst)
        except:
            continue

def getStockInfo(lst,stock_info_url,fpath):
    for stock in lst:
        url = stock_info_url + stock + '.html'
        html = getHTMLText(url)
        try:
            if html =="":
                continue
            infoDict = { }
            soup = BeautifulSoup(html,'html.parser')
            stockInfo = soup.find('div',attrs = {'class':'stock-bets'})
            if stockInfo == None:
                continue
            #print(stockInfo)
            name = stockInfo.find_all(attrs={'class':'bets-name'})[0]
            #print(name)
            infoDict.update({'股票名稱': name.text.split()[0]})

            keyList = stockInfo.find_all('dt')
            valueList = stockInfo.find_all('dd')
            for i in range(len(keyList)):
            
                key = keyList[i].text
                val = valueList[i].text
                infoDict[key] = val
            with open(fpath,'a',encoding = 'utf-8') as f:
                f.write(str(infoDict) + '\n')
        except:
            traceback.print_exc()
            continue
            


def main():
    stock_list_url = 'http://quote.eastmoney.com/stocklist.html'
    stock_info_url = 'http://gupiao.baidu.com/stock/'
    output_file = 'D://Postgraduate//Python//python項目//Python網絡爬蟲與信息提取-中國大學MOOC//3 網絡爬蟲之實戰//BaiduStockInfo.txt'
    slist = []
    getStockList(slist,stock_list_url)
    getStockInfo(slist,stock_info_url,output_file)



main()

代碼解釋：

第一個getHTMLText函數的作用是獲得所需的網頁源代碼

第二個getStockList函數的作用是獲得東方財富網上面上海股票的全部代碼，查看網頁源代碼可知，股票代碼的數據放在'a'標簽里面，如下圖所示：

因此，首先用find_all方法遍歷所有'a'標簽，然后在'a'標簽里面提取出href部分信息，在提取出來的href信息里面，用正則表達式匹配所需的信息，“sh\d{6}”，即徐亞匹配例如sh200010的信息
第三個函數需要根據第二個函數得到的股票代碼，拼接出一個url，在這個特定的url的網頁里，使用第一個函數解析網頁，首先加一個判斷，如果遇到html為空，那么要繼續執行下去，同樣，我們也需要再加一個判斷（關鍵之處），遇到網頁不存在，
但html源代碼仍然是存在的，因此接下去這個命令

stockInfo = soup.find('div',attrs = {'class':'stock-bets'})

可能為空，如果不加判斷，程序執行到這里就會報錯而無法繼續執行，因此添加：

if stockInfo == None:
    continue

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 爬蟲爬取股票數據 Python爬取股票數據 MATLAB爬蟲爬取股票數據 Scrapy 爬蟲實戰1—股票數據爬取 Python爬蟲小實例：爬股票數據 python-股票數據定向爬取爬取股票數據多線程+代理池爬取天天基金網、股票數據(無需使用爬蟲框架) 定向爬取股票數據——記錄一次爬蟲實戰使用python獲取股票數據