北上資金一直被譽為“聰明錢”,擅長左側交易。現在很多機構和大戶都會盯着北上資金調整自己的交易。這似乎已經是公開的秘密了。香港證券交易所每天收盤都會公布當天北上資金的持股量,所以我們也可以爬取這份數據抄一抄北上資金的作業。
爬取數據將會用到 《Python 學習筆記:獲取網絡數據》里提及的 urllib 和 BeautifulSoup。
我們分別爬取滬港通和深港通的數據,然后再將兩個 dataframe 合並起來,並保存為 csv 文件。
好了,不多說了上代碼吧。
Code
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
import pandas as pd
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
urls = ['https://sc.hkexnews.hk/TuniS/www.hkexnews.hk/sdw/search/mutualmarket_c.aspx?t=sh&t=sh',
'https://sc.hkexnews.hk/TuniS/www.hkexnews.hk/sdw/search/mutualmarket_c.aspx?t=sh&t=sz']
dates = []
df_list = []
for url in urls:
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'lxml')
date = soup.find('input', class_='input-searchDate')['value']
dates.append(date)
codes = [code.find('div', class_='mobile-list-body').string for code in soup.find_all('td',class_='col-stock-code')]
names = [name.find('div', class_='mobile-list-body').string for name in soup.find_all('td',class_='col-stock-name')]
shareholding = [int(shareholding.find('div', class_='mobile-list-body').string.replace(',', '')) for shareholding in soup.find_all('td',class_='col-shareholding')]
percent = [float(percent.find('div', class_='mobile-list-body').string.strip('%')) for percent in soup.find_all('td',class_='col-shareholding-percent')]
df = pd.DataFrame(list(zip(codes, names, shareholding, percent)), columns=['code', 'stock', 'shareholding', 'shareholding%'])
df_list.append(df)
output = pd.DataFrame()
if dates[0] == dates[1]:
# combine dataframe sz and dataframe sh
output = pd.concat(df_list)
output.to_csv(fname, encoding='utf-8', index=False)
else:
print('failed to get northbound data from web')
