An example of using Pandas for regression

這個例子來自這本書 - "Python for　Data Analysis"，這本書的作者 Wes McKinney 就是pandas的作者。

pandas提供了一些很方便的功能，比如最小二乘法(OLS)，可以用來計算回歸方程式的各個參數。同時pandas還可以輸出類似ANOVA的匯總信息，比如決定系數(R平方), F 統計量等。

OK，直接上例子。

數據准備

首先創建1000只股票，股票代碼(5個字符）通過隨機方式生成。

In [29]: import string

In [32]: import random

In [33]: random.seed(0)

In [34]: N = 1000

In [35]: def rands(n):
   ....:     choices = string.ascii_uppercase
   ....:     return ''.join([random.choice(choices) for _ in xrange(n)])
   ....:

In [36]: tickers = np.array([rands(5) for x in xrange(N)])

回歸分析

假設現在有個 multiple factor model, 如下所示：

y = 0.7 * x1 - 1.2 * x2 + 0.3 * x3 + random value

按照這個模型創建一個portfolio, 然后我們再拿實際得到的值來跟這3個factor來做下回歸分析，看得到的系數是不是跟上面的這個model比較接近。

首先創建三個隨機數組（每個大小都為1000，對應剛才創建的1000只股票），分別為fac1, fac2, 和fac3.

In [58]: from numpy.random import rand

In [59]: fac1, fac2, fac3 = np.random.rand(3, 1000)

In [62]: ticker_subset = tickers.take(np.random.permutation(N)[:1000])

用選擇的1000只股票按照上面的model創建portfolio, 得到的一組值也就是因變量y.

In [64]: port = Series(0.7*fac1 - 1.2*fac2 + 0.3*fac3 + rand(1000), index=ticker_subset)

現在我們用實際得到y和x1/x2/x3來做下回歸。首先把三個factors 構建成DataFrame.

In [65]: factors = DataFrame({'f1':fac1, 'f2':fac2, 'f3':fac3}, index=ticker_subset)

然后就直接調用pd.ols方法來進行回歸 -

In [70]: pd.ols(y=port, x=factors)
Out[70]:

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <f1> + <f2> + <f3> + <intercept>

Number of Observations:         1000
Number of Degrees of Freedom:   4

R-squared:         0.6867
Adj R-squared:     0.6857

Rmse:              0.2859

F-stat (3, 996):   727.6383, p-value:     0.0000

Degrees of Freedom: model 3, resid 996

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
            f1     0.6968     0.0311      22.44     0.0000     0.6359     0.7577
            f2    -1.2672     0.0312     -40.64     0.0000    -1.3283    -1.2061
            f3     0.3345     0.0310      10.80     0.0000     0.2738     0.3952
     intercept     0.5018     0.0275      18.28     0.0000     0.4480     0.5557
---------------------------------End of Summary---------------------------------

In [71]:

根據回歸結果，得到的方程式是 -

y = 0.5018 + 0.6968 * f1 - 1.2672 * f2 + 0.3345 * f3

對比下實際的model -

y = 0.7 * x1 - 1.2 * x2 + 0.3 * x3 + random value

可以看出還是比較match的。這個從每個參數p-value也可以看出來。

另外，如果只想關注每個系數，可以直接讀取beta.

In [71]: pd.ols(y=port, x=factors).beta
Out[71]:
f1           0.696817
f2          -1.267172
f3           0.334505
intercept    0.501836
dtype: float64

怎么樣，感覺pandas是不是棒棒噠！

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Backtrader中文筆記之Pandas DataFeed Example [翻譯]用神經網絡做回歸(Using Neural Networks With Regression) JWT Authentication Tutorial: An example using Spring Boot--轉 An Example of Pre-Query and Post-Query Triggers in Oracle Forms With Using Display_Item to Highlight Dynamically pycharm安裝pandas庫失敗解決方法 You are using pip version 10.0.1,however version 18.0 Linear Regression Multiple Regression LibreSpeed Example Python 線性回歸（Linear Regression) - 到底什么是 regression？ Logistic Regression 模型簡介