liner classifiers

邏輯回歸用在2分類問題上居多。它是一個非線性的回歸模型，其最大的好處恰恰是可以解決二元類問題，目前在金融行業，基本都是使用Logistic回歸來預判一個用戶是否為好客戶，因為它還彌補了其他黑盒模型（SVM、神經網絡、隨機森林等）不具解釋性的缺點。知乎

1.logistic

邏輯回歸其實是一個分類算法而不是回歸算法。通常是利用已知的自變量來預測一個離散型因變量的值（像二進制值0/1，是/否，真/假）。簡單來說，它就是通過擬合一個邏輯函數（logit fuction）來預測一個事件發生的概率。所以它預測的是一個概率值，自然，它的輸出值應該在0到1之間。--計算的是單個輸出

1.2 sigmoid

邏輯函數
\(g(z)=\frac{1}{1+e^{-z}}\)

sigmoid函數是一個s形的曲線，它的取值在[0, 1]之間，在遠離0的地方函數的值會很快接近0或者1。它的這個特性對於解決二分類問題十分重要
二分類中，輸出y的取值只能為0或者1，所以在線性回歸的假設函數外包裹一層Sigmoid函數，使之取值范圍屬於(0,1)，完成了從值到概率的轉換。邏輯回歸的假設函數形式如下
\(h_{\theta}(x)=g\left(\theta^{T} x\right)=\frac{1}{1+e^{-\theta^{T} x}}=P(y=1 | x ; \theta)\)
則若\(P(y=1 | x ; \theta)=0.7\)，則表示輸入為x的時候，y=1的概率為0.7

1.3 決策邊界

決策邊界，也稱為決策面，是用於在N維空間，將不同類別樣本分開的直線或曲線，平面或曲面

根據以上假設函數表示概率，我們可以推得
if \(h_{\theta}(x) \geqslant 0.5 \Rightarrow y=1\)
if \(h_{\theta}(x)<0.5 \Rightarrow y=0\)

1.3.1 線性決策邊界

1.3.2 非線性決策邊界

1.4 代價函數/損失函數

在線性回歸中的代價函數為
\(J(\theta)=\frac{1}{2 m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}\)

因為它是一個凸函數，所以可用梯度下降直接求解，局部最小值即全局最小值
只有把函數是或者轉化為凸函數，才能使用梯度下降法進行求導哦
在邏輯回歸中，\(h_{\theta }(x)\)是一個復雜的非線性函數，屬於非凸函數，直接使用梯度下降會陷入局部最小值中。類似於線性回歸，邏輯回歸的\(J(\theta )\)的具體求解過程如下
對於輸入x，分類結果為類別1和類別0的概率分別為:
\(P(y=1 | x ; \theta)=h(x) ; \quad P(y=0 | x ; \theta)=1-h(x)\)
因此化簡為一個式子可以寫為
\(\left.P(y | x ; \theta)=(h(x))^{y}(1-h(x))^{(} 1-y\right)\)

1.4.1 似然函數

\(\begin{aligned} L(\theta) &=\prod_{i=1}^{m} P\left(y^{(i)} | x^{(i)} ; \theta\right) \\ &=\prod_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)\right)^{y^{(0)}}\left(1-h_{\theta}\left(x^{(i)}\right)\right)^{1-y^{(i)}} \end{aligned}\)
似然函數取對數之后
\(\begin{aligned} l(\theta) &=\log L(\theta) \\ &=\sum_{i=1}^{m}\left(y^{(i)} \log h_{\theta}\left(x^{(i)}\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right) \end{aligned}\)

根據最大似然估計，需要使用梯度上升法求最大值，因此，為例能夠使用梯度下降法，需要將代價函數構造成為凸函數
因此
\(J(\theta )=-\frac{1}{m} l(\theta )\)
此時可以使用梯度下降求解了
\(\theta_{j}\)更新過程為
\(\theta_{j}:=\theta_{j}-\alpha \frac{\partial}{\partial \theta_{j}} J(\theta)\)
中間求導過程省略
\(\theta_{j}:=\theta_{j}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(\mathrm{x}^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}, \quad(j=0 \ldots n)\)

1.5正則化

損失函數中增加懲罰項：參數值越大懲罰越大–>讓算法去盡量減少參數值
損失函數 \(J(β)\)的簡寫形式:

\(J(\beta)=\frac{1}{m} \sum_{i=1}^{m} \cos (y, \beta)+\frac{\lambda}{2 m} \sum_{j=1}^{n} \beta_{j}^{2}\)

當模型參數 β 過多時，損失函數會很大，算法要努力減少 β 參數值來讓損失函數最小。
λ 正則項重要參數，λ 越大懲罰越厲害，模型越欠擬合，反之則傾向過擬合

1.5.1 lasso

l1正則化
\(J(\beta)=\frac{1}{m} \sum_{i=1}^{\mathrm{m}} \cos t(y, \beta)+\frac{\lambda}{2 m} \sum_{j=1}^{n}\left|\beta_{j}\right|\)

1.5.2 ridge

l2正則化
\(J(\beta)=\frac{1}{m} \sum_{i=1}^{\mathrm{m}} \cos t(y, \beta)+\frac{\lambda}{2 m} \sum_{j=1}^{n} \beta_{j}^{2}\)

1.6 python實現

class sklearn.linear_model.LogisticRegression(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=100, multi_class='auto', verbose=0, warm_start=False, n_jobs=None, l1_ratio=None

1.6.1 參數

penalty：{‘l1’, ‘l2’, ‘elasticnet’, ‘none’}, default=’l2’ 正則項，默認是l2
- The ‘newton-cg’, ‘sag’ and ‘lbfgs’ solvers support only l2 penalties. ‘elasticnet’ is only supported by the ‘saga’ solver. If ‘none’ (not supported by the liblinear solver), no regularization is applied.LogisticRegression
dual：bool, default=False
-Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features，一般情況下是false
tol：float, default=1e-4 閾值，迭代終止條件按
C：float, default=1.0
- Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization，正則化強度的倒數；必須是正浮點數。與支持向量機一樣，較小的值指定更強的正則化
  -solver：{‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, default=’lbfgs’
- Algorithm to use in the optimization problem.
- For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones.
- For multiclass problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss; ‘liblinear’ is limited to one-versus-rest schemes.
- ‘newton-cg’, ‘lbfgs’, ‘sag’ and ‘saga’ handle L2 or no penalty
- ‘liblinear’ and ‘saga’ also handle L1 penalty
- ‘saga’ also supports ‘elasticnet’ penalty
- ‘liblinear’ does not support setting penalty='none'
multi_class：{‘auto’, ‘ovr’, ‘multinomial’}, default=’auto’ 這個參數是多元分類中用到的，二分類中不涉及
- If the option chosen is ‘ovr’, then a binary problem is fit for each label.
- For ‘multinomial’ the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. **‘multinomial’ is unavailable when solver=’liblinear’. **
  -‘auto’ selects ‘ovr’ if the data is binary, or if solver=’liblinear’, and otherwise selects ‘multinomial’.

1.6.1 屬性

classes_：ndarray of shape (n_classes, )屬性數組
A list of class labels known to the classifier.
coef_：ndarray of shape (1, n_features) or (n_classes, n_features)屬性和特征分類的數組
- Coefficient of the features in the decision function.
- coef_ is of shape (1, n_features) when the given problem is binary. In particular, when multi_class='multinomial', coef_ corresponds to outcome 1 (True) and -coef_ corresponds to outcome 0 (False).
intercept_：ndarray of shape (1,) or (n_classes,)
Intercept (a.k.a. bias) added to the decision function.
If fit_intercept is set to False, the intercept is set to zero. intercept_ is of shape (1,) when the given problem is binary. In particular, when multi_class='multinomial', intercept_ corresponds to outcome 1 (True) and -intercept_ corresponds to outcome 0 (False).

n_iter_：ndarray of shape (n_classes,) or (1, )
Actual number of iterations for all classes. If binary or multinomial, it returns only 1 element. For liblinear solver, only the maximum number of iteration across all classes is given.

 # Create LogisticRegression object and fit
    lr = LogisticRegression(C=C_value)
    lr.fit(X_train, y_train)
    
    # Evaluate error rates and append to lists
    train_errs.append( 1.0 - lr.score(X_train, y_train) )
    valid_errs.append( 1.0 - lr.score(X_valid, y_valid) )
    
# Plot results
plt.semilogx(C_values, train_errs, C_values, valid_errs)
plt.legend(("train", "validation"))
plt.show()

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Logistic Regression（邏輯回歸）模型實現二分類和多分類 Data Mining | 二分類模型評估-混淆矩陣 Data Mining | 二分類模型評估-ROC/AUC/K-S/GINI MATLAB上實現Adaboost二分類和多分類 sigmoid與softmax 二分類、多分類的使用分類算法-邏輯回歸與二分類 Cross-entropy loss多分類與二分類機器學習二分類模型評價指標:准確率\召回率\特異度等簡單機器學習——最簡單分類算法（LogisticRegression二分類線性模型、LinearSVC一對其余分類器） Machine Learning 22 二分類實例