快速用梯度下降法實現一個Logistic Regression 分類器

本文轉載自查看原文 2015-02-14 17:31 7557 梯度下降法/ data mining/ logistic regression

前陣子聽說一個面試題：你實現一個logistic Regression需要多少分鍾？搞數據挖掘的人都會覺得實現這個簡單的分類器分分鍾就搞定了吧？

因為我做數據挖掘的時候，從來都是順手用用工具的，尤其是微軟內部的TLC相當強大，各種機器學習的算法都有，於是自從離開學校后就沒有自己實現過這些基礎的算法。當有一天心血來潮自己實現一個logistic regression的時候，我會說用了3個小時么？。。。羞羞

---------------------------------------------------前言結束----------------------------------------------

當然logistic regression的淵源還是有點深的，想復習理論知識的話可以去http://en.wikipedia.org/wiki/Logistic_regression ，我這里就直接講實現啦。

首先要了解一個logistic function

這個函數的圖像是這個樣子的：

而我們要實現的logistic regression model，就是要去學習出一組權值w：

x 指feature構成的向量。這個向量w就可以將每個instance映射到一個實數了。

假如我們要出里的是2分類問題，那么問題就被描述為學習出一組w，使得h(正樣本)趨近於1， h(負樣本)趨近於0.

現在就變成了一個最優化問題，我們要讓誤差最小化。現在問題來了，怎么定義誤差函數呢？

首先想到的是L2型損失函數啦，於是啪啪啪寫上了

。

很久沒有復習logistic regression的人最容易犯錯的就是在這了。正確的寫法是：

，

然后對它求偏導數得到梯度下降法的迭代更新方程：

。

於是你會發現這個迭代方程是和線性回歸的是一樣的！

理清了過程時候，代碼就變得異常簡單了：

  1  public class LogisticRegression
  2     {
  3         private int _maxIteration = 1000;
  4         private double _stepSize = 0.000005;
  5         //private double _stepSize = 0.1;
  6         private double _lambda = 0.1;
  7         private double decay = 0.95;
  8 
  9         public int dim;
 10         public double[] theta;
 11 
 12         public LogisticRegression(int dim)
 13         {
 14             this.dim = dim;
 15         }
 16 
 17         public LogisticRegression(int dim, double stepSize)
 18             : this(dim)
 19         {
 20             this._stepSize = stepSize;
 21         }
 22 
 23         public void Train(Instance[] instances)
 24         {
 25             Initialize();
 26 
 27             int instCnt = instances.Length;
 28             double[] dev =new double[this.dim];
 29             for (int t = 0; t < this._maxIteration; t++)
 30             {
 31                 double cost = 0;
 32                 for (int i = 0; i < instCnt; i++)
 33                 {
 34                     double h_x = MathLib.Logistic(MathLib.VectorInnerProd(instances[i].featureValues, this.theta));
 35                     // calculate cost function
 36                     cost += instances[i].label * Math.Log(h_x) + (1 - instances[i].label) * Math.Log(1 - h_x);
 37                 }
 38                 cost *= -1.0 / instCnt; 
 39                 Console.WriteLine("{0},{1}", t, cost);
 40 
 41                
 42                 for (int i = 0; i < instCnt; i++)
 43                 {
 44                     ResetArray(dev);
 45                     double h_x = MathLib.Logistic(MathLib.VectorInnerProd(instances[i].featureValues, this.theta));
 46                     double error =   h_x- instances[i].label ;
 47                     for (int j = 0; j < this.dim; j++)
 48                     {
 49                         dev[j] += error*instances[i].featureValues[j] + 2*dev[j]*this._lambda;
 50                         this.theta[j] -= this._stepSize * dev[j] ;
 51                         //BoundaryLimiting(ref this.theta[j], 0, 1);
 52                     }
 53                 }
 54                 //this._stepSize *= decay;
 55                 //if (this._stepSize > 0.000001)
 56                 //{
 57                 //    this._stepSize = 0.000001;
 58                 //}
 59             }
 60         }
 61          
 62         private void BoundaryLimiting(ref double alpha, double lowerbound, double upperbound)
 63         {
 64             if (alpha < lowerbound)
 65             {
 66                 alpha = lowerbound;
 67             }
 68             else if (alpha > upperbound)
 69             {
 70                 alpha = upperbound;
 71             }
 72         }
 73  
 74         public double[] Predict(Instance[] instances)
 75         {
 76             double[] results = new double[instances.Length];
 77             for (int i = 0; i < results.Length; i++)
 78             {
 79                 results[i] = MathLib.Logistic(MathLib.VectorInnerProd(instances[i].featureValues, this.theta));
 80             }
 81             return results;
 82         }
 83 
 84         private void ResetArray(double[] dev)
 85         {
 86             for (int i = 0; i < dev.Length; i++)
 87             {
 88                 dev[i] = 0;
 89             }
 90         }
 91 
 92         private void Initialize()
 93         {
 94             Random ran = new Random(DateTime.Now.Second);
 95 
 96             this.theta = new double[this.dim];
 97             for (int i = 0; i < this.dim; i++)
 98             {
 99                 this.theta[i] = ran.NextDouble() * 0 ; // initialize theta with a small value
100             }
101         }
102 
103 
104         public static void Test()
105         {
106             LogisticRegression lr = new LogisticRegression(3);
107 
108             List<Instance> instances = new List<Instance>();
109             using (StreamReader rd = new StreamReader(@"D:\\local exp\\data.csv"))
110             {
111                 string content = rd.ReadLine();
112                 while ((content = rd.ReadLine()) != null)
113                 {
114                     instances.Add(Instance.ParseInstance(content,','));
115                 }
116             }
117 
118            // MinMaxNormalize(instances); 
119 
120             lr.Train(instances.ToArray()); 
121             
122         }
123 
124         private static void MinMaxNormalize(List<Instance> instances)
125         {
126             int dim = instances[0].dim;
127             double[] min = new double[dim];
128             double[] max = new double[dim];
129 
130             int instCnt = instances.Count;
131             for (int i = 0; i < instCnt; i++)
132             {
133                 for (int j = 0; j < dim; j++)
134                 {
135                     if (i == 0 || instances[i].featureValues[j] < min[j])
136                     {
137                         min[j] = instances[i].featureValues[j];
138                     }
139                     if (i == 0 || instances[i].featureValues[j] > max[j])
140                     {
141                         max[j] = instances[i].featureValues[j];
142                     }
143                 }
144             }
145 
146 
147             for (int j = 0; j < dim; j++)
148             {
149                 double gap = max[j] - min[j];
150                 if (gap <= 0)
151                 {
152                     continue;
153                 }
154                 for (int i = 0; i < instCnt; i++)
155                 {
156                     instances[i].featureValues[j] = (instances[i].featureValues[j] - min[j]) / gap;
157                 }
158             }
159              
160         }
161     }

前面提到說我花了3個小時，其中很大一部分原因是在debug算法為啥沒有收斂。這里有個很重要的步驟是把feature規范化到[0,1] 。如果不normalize的話，參數調起來比較麻煩，loss function也經常蹦到NaN去了。

以下是對比normalize和不加normalization的收斂曲線圖：

我用的實現數據可以在 http://pingax.com/wp-content/uploads/2013/12/data.csv 下載到。它是一個2維的數據，分布如下：

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 logistic回歸----- 隨機梯度下降法隨機梯度下降分類器和回歸器梯度下降法VS隨機梯度下降法 (Python的實現) matlab實現梯度下降法(Gradient Descent)的一個例子梯度下降法和隨機梯度下降法梯度下降法和隨機梯度下降法梯度下降法分析梯度下降法小結梯度下降法梯度下降法小結