隨機邏輯回歸random logistic regression-特征篩選


 python金融風控評分卡模型和數據分析微專業課(博主親自錄制視頻):http://dwz.date/b9vv

 

#author:231469242@qq.com
#微信公眾號:pythonEducation
from sklearn.linear_model import LogisticRegression as LR from sklearn.linear_model import RandomizedLogisticRegression as RLR rlr=RLR() #建立隨機邏輯回歸模型,篩選變量 rlr.fit(x,y) #訓練模型 rlr.get_support() #獲取特征篩選結果 print(u'有效特征為:%s'%','.join(np.array(data.iloc[:,:8].columns)[rlr.get_support()])) x=data[np.array(data.iloc[:,:8].columns)[rlr.get_support()]].as_matrix() #篩選好特征 lr=LR() #建立邏輯回歸模型 lr.fit(x,y) #用篩選后的特征數據來訓練模型 print(u'邏輯回歸模型訓練結束') print(u'模型的平均正確率為:%s'%lr.score(x,y)) #給出模型的平均正確率

  

Scikit_Learn API :

sklearn.linear_model 廣義線性模型
sklearn.linear_model.LogisticRegression Logistic 回歸分類器
Methods:

score(X, y[, sample_weight]) Returns the mean accuracy on the given test data and labels
Parameters:

:x:array-like, Test samples;        y: array-like, True labels for X.

sample_weight:可選項,樣本權重

Returns: 

score: float, Mean accuracy of self.predict(X) wrt. y 獲取各個特征的分數

sklearn.linear_model.RandomizedLogisticRegression 隨機邏輯回歸
官網對於隨機邏輯回歸的解釋:

Randomized Logistic Regression works by subsampling the training data and fitting a L1-penalized LogisticRegression model where the penalty of a random subset of coefficients has been scaled. By performing this double randomization several times, the method assigns high scores to features that are repeatedly selected across randomizations. This is known as stability selection. In short, features selected more often are considered good features.

解讀:對訓練數據進行多次采樣擬合回歸模型,即在不同的數據子集和特征子集上運行特征算法,不斷重復,最終選擇得分高的重要特征。這是穩定性選擇方法。得分高的重要特征可能是由於被認為是重要特征的頻率高(被選為重要特征的次數除以它所在的子集被測試的次數)

 

python信用評分卡建模視頻系列教程(附代碼)  博主錄制

https://study.163.com/course/introduction.htm?courseId=1005214003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM