quantmod 介紹
quantmod 是一個非常強大的金融分析報, 包含數據抓取,清洗,建模等等功能.
1. 獲取數據 getSymbols
默認是數據源是yahoo
獲取上交所股票為 getSymbols("600030.ss"), 深交所為 getSymbols("000002.sz"). ss表示上交所, sz表示深交所
2. 重命名函數 setSymbolLookup
3. 股息函數 getDividends
4. 除息調整函數 adjustOHLC
5. 除權除息函數 getSplits
6. 期權交易函數 getOptionChain
7. 財務報表 getFinancials / getFin
> library(quantmod) > setSymbolLookup(WANKE=list(name="000002.sz", src="yahoo")) > getSymbols("WANKE") [1] "WANKE" Warning message: 000002.sz contains missing values. Some functions will not work if objects contain missing values in the middle of the series. Consider using na.omit(), na.approx(), na.fill(), etc to remove or replace them. > head(WANKE) 000002.SZ.Open 000002.SZ.High 000002.SZ.Low 000002.SZ.Close 2008-03-17 14.221 14.221 14.221 13.65 2008-03-18 NA NA NA NA 2008-03-19 NA NA NA NA 2008-03-20 NA NA NA NA 2008-03-21 NA NA NA NA 2008-03-24 NA NA NA NA 000002.SZ.Volume 000002.SZ.Adjusted 2008-03-17 123340858 13.10156 2008-03-18 NA NA 2008-03-19 NA NA 2008-03-20 NA NA 2008-03-21 NA NA 2008-03-24 NA NA >
機器學習 Classification
首先, 簡化問題, 只預測股票的漲跌情況. 問題就變成一個分類問題, 把歷史數據分為漲跌兩種情況. 進一不簡化, 漲跌情況只與歷史數據情況有關.
我們使用Naive Bayes classifier (朴素的貝葉斯分類) 作為學習方法. 朴素的貝葉斯的定義為: 給定類別A條件下,所有的屬性Ai相互獨立
R語言的實現如下
> library(lubridate) #日期包 > library(e1071) #朴素貝葉斯包 > library(quantmod) > setSymbolLookup(WANKE=list(name="000002.sz", src="yahoo")) > getSymbols("WANKE") [1] "WANKE" > head(WANKE) 000002.SZ.Open 000002.SZ.High 000002.SZ.Low 000002.SZ.Close 2008-03-17 14.221 14.221 14.221 13.65 2008-03-18 NA NA NA NA 2008-03-19 NA NA NA NA 2008-03-20 NA NA NA NA 2008-03-21 NA NA NA NA 2008-03-24 NA NA NA NA 000002.SZ.Volume 000002.SZ.Adjusted 2008-03-17 123340858 13.10156 2008-03-18 NA NA 2008-03-19 NA NA 2008-03-20 NA NA 2008-03-21 NA NA 2008-03-24 NA NA > tail(WANKE) 000002.SZ.Open 000002.SZ.High 000002.SZ.Low 000002.SZ.Close 2017-07-31 23.52 23.58 23.10 23.37 2017-08-01 23.35 23.55 23.20 23.42 2017-08-02 23.45 24.12 23.43 23.58 2017-08-03 23.58 23.58 22.79 23.11 2017-08-04 23.00 23.06 22.71 22.84 2017-08-07 22.82 23.05 22.68 22.71 000002.SZ.Volume 000002.SZ.Adjusted 2017-07-31 30942482 23.37 2017-08-01 20952262 23.42 2017-08-02 35391017 23.58 2017-08-03 45518939 23.11 2017-08-04 29612306 22.84 2017-08-07 23409149 22.71 > > startDate <- as.Date("2010-01-01") > endDate <- as.Date("2017-01-01") > DayofWeek <- wday(WANKE, label=TRUE) > PriceChange <- Cl(WANKE) - Op(WANKE) #收盤減去開盤 > Class <- ifelse(PriceChange > 0, "UP", "DOWN") #大於0就是漲 > DataSet <- data.frame(DayofWeek, Class) > MyModel <- naiveBayes(DataSet[,1], DataSet[,2]) > MyModel Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.default(x = DataSet[, 1], y = DataSet[, 2]) A-priori probabilities: DataSet[, 2] DOWN UP 0.5148148 0.4851852 Conditional probabilities: x DataSet[, 2] Sun Mon Tues Wed Thurs Fri DOWN 0.0000000 0.2374101 0.1510791 0.2158273 0.1870504 0.2086331 UP 0.0000000 0.1603053 0.2442748 0.1908397 0.2137405 0.1908397 x DataSet[, 2] Sat DOWN 0.0000000 UP 0.0000000 >
整個dataset的漲跌概率
DataSet[, 2] DOWN UP 0.5148148 0.4851852
基於這個漲跌概率下, 每天的漲跌概率
Conditional probabilities: x DataSet[, 2] Sun Mon Tues Wed Thurs Fri DOWN 0.0000000 0.2374101 0.1510791 0.2158273 0.1870504 0.2086331 UP 0.0000000 0.1603053 0.2442748 0.1908397 0.2137405 0.1908397 x DataSet[, 2] Sat DOWN 0.0000000 UP 0.0000000
模型改進
指數移動平均值 EMA (exponential moving average)
> W <- na.omit(WANKE) > DayofWeek <- wday(W, label=TRUE) > PriceChange <- Cl(W) - Op(W) > Class <- ifelse(PriceChange > 0, "UP", "DOWN") > EMA5 <- EMA(Op(W), n = 5) > EMA10 <- EMA(Op(W), n = 10) > EMACross <- EMA5 -EMA10 > EMACross <- round(EMACross, 2) > DataSet2 <- data.frame(DayofWeek, EMACross, Class) > DataSet2<-DataSet2[-c(1:10),] > head(DataSet2) DayofWeek EMA X000002.SZ.Close 2016-07-14 Thurs 0.11 DOWN 2016-07-15 Fri 0.04 DOWN 2016-07-18 Mon 0.00 DOWN 2016-07-19 Tues -0.10 DOWN 2016-07-20 Wed -0.23 DOWN 2016-07-21 Thurs -0.28 DOWN > tail(DataSet2) DayofWeek EMA X000002.SZ.Close 2017-07-31 Mon -0.34 DOWN 2017-08-01 Tues -0.31 UP 2017-08-02 Wed -0.26 UP 2017-08-03 Thurs -0.19 DOWN 2017-08-04 Fri -0.24 DOWN 2017-08-07 Mon -0.27 DOWN > length(DayofWeek) [1] 270 > TrainingSet<-DataSet2[1:200,] > TestSet<-DataSet2[201:270,] > EMACrossModel<-naiveBayes(TrainingSet[,1:2],TrainingSet[,3]) > EMACrossModel Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.default(x = TrainingSet[, 1:2], y = TrainingSet[, 3]) A-priori probabilities: TrainingSet[, 3] DOWN UP 0.5 0.5 Conditional probabilities: DayofWeek TrainingSet[, 3] Sun Mon Tues Wed Thurs Fri Sat DOWN 0.00 0.22 0.13 0.24 0.18 0.23 0.00 UP 0.00 0.16 0.27 0.17 0.23 0.17 0.00 EMA TrainingSet[, 3] [,1] [,2] DOWN 0.0333 0.4119553 UP -0.0177 0.4191522 > table(predict(EMACrossModel,TestSet),TestSet[,3],dnn=list('predicted','actual')) actual predicted DOWN UP DOWN 16 21 UP 13 10 >
參考文獻
quantmod
http://www.quantmod.com/,
https://github.com/dengyishuo/Notes/tree/master/quantmod
Naive Bayes classifier
http://blog.csdn.net/sulliy/article/details/6629201
Introduction to Use Machine Learning by R
https://www.inovancetech.com/blogML2.html