股票交易日定時爬取上交所/深交所所有股票行情數據存儲到數據庫


一、該項目主要分以下三步組成:

  1. 配置數據庫信息
  2. 編寫爬蟲腳本
  3. 配置Jenkins定時任務
  4. 查看采集結果

二、詳細過程

1.配置數據庫信息

建表語句, 以其中部分字段為例:

CREATE TABLE `stockmarket` (
  `date` varchar(12) NOT NULL DEFAULT '' COMMENT '時間',
  `stockCode` varchar(100) NOT NULL DEFAULT '' COMMENT '股票代碼',
  `stockName` varchar(100) DEFAULT NULL COMMENT '股票名字',
  `close` decimal(19,2) DEFAULT NULL COMMENT '閉市價',
  `high` decimal(19,2) DEFAULT NULL COMMENT '最高',
  `low` decimal(19,2) DEFAULT NULL COMMENT '最低',
  `amplitudeRatio` decimal(19,2) DEFAULT NULL COMMENT '振幅',
  `turnoverRatio` decimal(19,2) DEFAULT NULL COMMENT '換手率',
  `preClose` decimal(19,2) DEFAULT NULL COMMENT '昨收',
  `open` decimal(19,2) DEFAULT NULL COMMENT '開盤價',
  PRIMARY KEY (`date`,`stockCode`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

配置json數據到.json文件, 用於讀取配置信息,進行數據庫連接

"stockMarket":{
      "host":"localhost",
      "port":3326,
      "user":"root",
      "password":"password",
      "database":"stockMarket",
      "charset":"utf8"
    }

 

2.腳本編寫

涉及到的python庫

import re,pymysql,json,time,requests

代碼編寫

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Author : Torre Yang Edit with Python3.6
# @Email  : klyweiwei@163.com
# @Time   : 2018/6/28 10:50
# 定時 爬取每日股票行情數據;
# 股票數據內容:
import getSoup
import pymysql
import os
import re
import json
import requestsimport connect_dataBase
import time

# db連接
connectDB = connect_dataBase.ConnectDatabase()
get_conf = connectDB.get_conf('databases_conf.json')
conn, cur = connectDB.connect_db(get_conf["stockMarket"]["host"], get_conf["stockMarket"]["user"],
                     get_conf["stockMarket"]["password"], get_conf["stockMarket"]["database"], get_conf["stockMarket"]["port"])

# 第一步, 通過東方財富網  獲取 上海/深圳 所有股票的 股票代碼, 存儲到list中
url = 'http://quote.eastmoney.com/stocklist.html#'
soup = getSoup.getSoup(url)
uls = soup.select('div#quotesearch li')
# 正則表達式獲取所有的股票代碼
re1 = re.compile(r'href="http://quote.eastmoney.com/(.+?).html"')
stockCodes = re1.findall(str(uls))
# print(stockCodes)

# 第二步, 將股票代碼加入到 股票搜索 的網址中
stockValues = []
for stockCode in stockCodes:
    # url = 'https://gupiao.baidu.com/stock/'+stockCode+'.html'
    url = 'https://gupiao.baidu.com/api/rails/stockbasicbatch?from=pc&os_ver=1&cuid=xxx&vv=100&format=json&stock_code='+stockCode+''
    # print(url)
    # url = 'https://gupiao.baidu.com/api/rails/stockbasicbatch?from=pc&os_ver=1&cuid=xxx&vv=100&format=json&stock_code=sh201003'
    response = requests.get(url)
    response.raise_for_status()
    res = response.content
    try:
        JsonDatas = json.loads(res, encoding='utf-8')
    except:
        print('解析為空')
    datas = JsonDatas['data']
   )
    for data in datas:
        # 添加當天日期(交易日)
        date = time.strftime("%Y-%m-%d", time.localtime())
        stockCode = data['stockCode']
        stockName = data['stockName']
        close = data['close']
        high = data['high']
        low = data['low']
        amplitudeRatio = data['amplitudeRatio']
        turnoverRatio = data['turnoverRatio']
        preClose = data['preClose']
        open = data['open']
        sql = 'insert into stockmarket(date,stockCode,stockName,close,high,low,amplitudeRatio,turnoverRatio,preClose,open)values("'+str(date)+'","'+str(stockCode)+'","'+str(stockName)+'","'+str(close)+'","'+str(high)+'","'+str(low)+'","'+str(amplitudeRatio)+'","'+str(turnoverRatio)+'","'+str(preClose)+'","'+str(open)+'")'
        print(sql)
        if 'None' in sql:
            print('jump this data')
        else:
            try:
                connectDB.get_fetch(conn, cur, sql)
            except:
                print('數據異常, 跳過')

print('采集數據完畢')

3.配置Jenkins

遠程ssh配置,配置定時任務(tip:建議晚上進行采集(或閉市時間),因為交易時間,股票的數據在動態變化)

Jenkins> 系統配置>ssh remote hosts (我是裝的虛擬機,centos7版本,已經配置好了JDK,python3,mysql,tomcat等常用軟件服務

 

 

 

4.驗證結果

 

 

源碼地址:https://github.com/Testworm/stockMarket.git

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM