超參數調整

詳細可以參考官方文檔

定義

在擬合模型之前需要定義好的參數

適用

Linear regression: Choosing parameters
Ridge/lasso regression: Choosing alpha
k-Nearest Neighbors: Choosing n_neighbors
Parameters like alpha and k: Hyperparameters
Hyperparameters cannot be learned by tting the model

GridsearchCV

sklearn.model_selection.GridSearchCV

超參數自動搜索模塊
網格搜索+交叉驗證
指定的參數范圍內，按步長依次調整參數，利用調整的參數訓練學習器，從所有的參數中找到在驗證集上精度最高的參數，這其實是一個訓練和比較的過程

class sklearn.model_selection.GridSearchCV(estimator, param_grid, scoring=None, n_jobs=None, iid='deprecated', refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score=nan, return_train_score=False

參數

estimator:模型對象
param_grid：dict or list of dictionaries，字典類型的參數，定義一個字典然后都放進去
scoring：string, callable, list/tuple, dict or None, default: None，就是metrics，損失函數定義rmse，mse等
- none的話就是默認estimator的score
- 定義規則
  - score規則
  - 自定義score
Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.控制cop,core並行運行數量
-cv：int, cross-validation generator or an iterable, optional，k折交叉驗證數，默認5折
- Determines the cross-validation splitting strategy. Possible inputs for cv are:
- None, to use the default 5-fold cross validation,integer, to specify the number of folds in a (Stratified)KFold,CV splitter,
- An iterable yielding (train, test) splits as arrays of indices.
- For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used.
verbose：控制輸出信息的詳細程度，愈高輸出越多。

屬性

常見：

cv_results_dict of numpy (masked) ndarrays輸出交叉驗證的每一個結果
best_estimator_:最好的估計器
best_params_：dict
- 返回最優模型參數
- Parameter setting that gave the best results on the hold out data.
- For multi-metric evaluation, this is present only if refit is specified.
best_score_：float
- 返回最優模型參數的得分
- Mean cross-validated score of the best_estimator
- For multi-metric evaluation, this is present only if refit is specified.
- This attribute is not available if refit is a function.

復現

# Import necessary modules
from sklearn.model_selection import GridSearchCV

from sklearn.linear_model import LogisticRegression
# Setup the hyperparameter grid
# 創建一個參數集
c_space = np.logspace(-5, 8, 15)
# 這里是創建一個字典保存參數集
param_grid = {'C': c_space}

# Instantiate a logistic regression classifier: logreg
# 針對回歸模型進行的超參數調整
logreg = LogisticRegression()

# Instantiate the GridSearchCV object: logreg_cv
logreg_cv = GridSearchCV(logreg, param_grid, cv=5)

# Fit it to the data
logreg_cv.fit(X,y)

# Print the tuned parameters and score
# 得到最好的模型
print("Tuned Logistic Regression Parameters: {}".format(logreg_cv.best_params_)) 
# 得到最好的模型的最好的結果
print("Best score is {}".format(logreg_cv.best_score_))

<script.py> output:
    Tuned Logistic Regression Parameters: {'C': 3.727593720314938}
    Best score is 0.7708333333333334

GridSearchCV can be computationally expensive, especially if you are searching over a large hyperparameter space and dealing with multiple hyperparameters. A solution to this is to use RandomizedSearchCV, in which not all hyperparameter values are tried out. Instead, a fixed number of hyperparameter settings is sampled from specified probability distributions.
grid相當於一個for循環，會遍歷每一個參數，因此，當調參很多的時候，會導致計算量非常的大，因此，使用隨機抽樣的隨機搜索會好一些

RandomizedSearchCV的使用方法其實是和GridSearchCV一致的，但它以隨機在參數空間中采樣的方式代替了GridSearchCV對於參數的網格搜索，在對於有連續變量的參數時，RandomizedSearchCV會將其當作一個分布進行采樣這是網格搜索做不到的，它的搜索能力取決於設定的n_iter參數，同樣的給出代碼
csdn

RandomizedSearchCV

隨機搜索法
不是每一個參數都被選取，而是從指定概率分布的參數中，抽取一定量的參數
~~我還是沒太能明白？~~
可以比較一下時間

比較網格搜索而言，參數略有不同

算了，還是都列一下常見的吧，剩下的可以參照官方文檔

比Grid 多了一個屬性

.cv_results_，可以交叉驗證的每一輪的結果

復現

# Import necessary modules
from scipy.stats import randint
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import RandomizedSearchCV

# Setup the parameters and distributions to sample from: param_dist
# 以決策樹為例，注意定一個字典的形式哦
param_dist = {"max_depth": [3, None],
              "max_features": randint(1, 9),
              "min_samples_leaf": randint(1, 9),
              "criterion": ["gini", "entropy"]}

# Instantiate a Decision Tree classifier: tree
tree = DecisionTreeClassifier()

# Instantiate the RandomizedSearchCV object: tree_cv
tree_cv = RandomizedSearchCV(tree, param_dist, cv=5)

# Fit it to the data
tree_cv.fit(X,y)

# Print the tuned parameters and score
print("Tuned Decision Tree Parameters: {}".format(tree_cv.best_params_))
print("Best score is {}".format(tree_cv.best_score_))

<script.py> output:
    Tuned Decision Tree Parameters: {'criterion': 'gini', 'max_depth': 3, 'max_features': 5, 'min_samples_leaf': 2}
    Best score is 0.7395833333333334

Limits of grid search and random search

調參的限制點

grid：
-random：

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 課程二(Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization)，第一周（Practical aspects of Deep Learning） —— 4.Programming assignments:Gradient Checking 關於fine-tuning 【ActiveMQ Tuning】Prefetch Limit SQL Tuning Advisor使用實例 Chapter 04-Tuning the shared Pool Bluetooth BQB PTS(Profile Tuning Suite) ORACLE SQL TUNING ADVISOR 使用方法 Tomcat性能調優 | Tomcat Performance Tuning Oracle Sql Tuning Advisor介紹以及使用 Tuning Convolutional Spiking Neural Network with Biologically-plausible Reward Propagation