pycaret之訓練模型(創建模型、比較模型、微調模型)


1、比較模型

這是我們建議在任何受監管實驗的工作流程中的第一步。此功能使用默認的超參數訓練模型庫中的所有模型,並使用交叉驗證評估性能指標。它返回經過訓練的模型對象。使用的評估指標是:

分類:准確性,AUC,召回率,精度,F1,Kappa,MCC
回歸:MAE,MSE,RMSE,R2,RMSLE,MAPE

該函數的輸出是一個表格,顯示了所有模型在折痕處的平均得分。可以使用compare_models函數中的fold參數定義折疊次數。默認情況下,折頁設置為10。表按選擇的度量標准排序(從高到低),可以使用sort參數定義。默認情況下,對於分類實驗,表按Accuracy排序;對於回歸實驗,按R2排序。由於某些模型的運行時間較長,因此無法進行比較。為了繞過此預防措施,可以將turbo參數設置為False。

該函數僅在pycaret.classification和pycaret.regression模塊中可用。

(1)分類案例:

from pycaret.datasets import get_data
diabetes = get_data('diabetes')
# Importing module and initializing setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')
# return best model
best = compare_models()
# return top 3 models based on 'Accuracy'
top3 = compare_models(n_select = 3)
# return best model based on AUC
best = compare_models(sort = 'AUC') #default is 'Accuracy'
# compare specific models
best_specific = compare_models(whitelist = ['dt','rf','xgboost'])
# blacklist certain models
best_specific = compare_models(blacklist = ['catboost', 'svm'])

(2)回歸案例:

from pycaret.datasets import get_data
boston = get_data('boston')
# Importing module and initializing setup
from pycaret.regression import *
reg1 = setup(data = boston, target = 'medv')
# return best model
best = compare_models()
# return top 3 models based on 'R2'
top3 = compare_models(n_select = 3)
# return best model based on MAPE
best = compare_models(sort = 'MAPE') #default is 'R2'
# compare specific models
best_specific = compare_models(whitelist = ['dt','rf','xgboost'])
# blacklist certain models
best_specific = compare_models(blacklist = ['catboost', 'svm'])

2、創建模型

在任何模塊中創建模型就像編寫create_model一樣簡單。它僅采用一個參數,即型號ID作為字符串。對於受監督的模塊(分類和回歸),此函數將返回一個表,該表具有k倍交叉驗證的性能指標以及訓練有素的模型對象。對於無監督的模塊對於無監督的模塊集群,它會返回性能指標以及經過訓練的模型對象,而對於其余的無監督的模塊異常檢測,自然語言處理和關聯規則挖掘,則僅返回經過訓練的模型對象。使用的評估指標是:
分類:准確性,AUC,召回率,精度,F1,Kappa,MCC
回歸:MAE,MSE,RMSE,R2,RMSLE,MAPE
可以使用create_model函數中的fold參數定義折疊次數。默認情況下,折痕設置為10。默認情況下,所有指標均四舍五入為4位小數,可以使用create_model中的round參數進行更改。盡管有一個單獨的函數可以對訓練后的模型進行集成,但是在通過create_model函數中的ensemble參數和方法參數創建時,有一種快速的方法可以對模型進行集成。

分類模型:

ID Name
‘lr’ Logistic Regression
‘knn’ K Nearest Neighbour
‘nb’ Naives Bayes
‘dt’ Decision Tree Classifier
‘svm’ SVM – Linear Kernel
‘rbfsvm’ SVM – Radial Kernel
‘gpc’ Gaussian Process Classifier
‘mlp’ Multi Level Perceptron
‘ridge’ Ridge Classifier
‘rf’ Random Forest Classifier
‘qda’ Quadratic Discriminant Analysis
‘ada’ Ada Boost Classifier
‘gbc’ Gradient Boosting Classifier
‘lda’ Linear Discriminant Analysis
‘et’ Extra Trees Classifier
‘xgboost’ Extreme Gradient Boosting
‘lightgbm’ Light Gradient Boosting
‘catboost’ CatBoost Classifier

回歸模型:

ID Name
‘lr’ Linear Regression
‘lasso’ Lasso Regression
‘ridge’ Ridge Regression
‘en’ Elastic Net
‘lar’ Least Angle Regression
‘llar’ Lasso Least Angle Regression
‘omp’ Orthogonal Matching Pursuit
‘br’ Bayesian Ridge
‘ard’ Automatic Relevance Determination
‘par’ Passive Aggressive Regressor
‘ransac’ Random Sample Consensus
‘tr’ TheilSen Regressor
‘huber’ Huber Regressor
‘kr’ Kernel Ridge
‘svm’ Support Vector Machine
‘knn’ K Neighbors Regressor
‘dt’ Decision Tree
‘rf’ Random Forest
‘et’ Extra Trees Regressor
‘ada’ AdaBoost Regressor
‘gbr’ Gradient Boosting Regressor
‘mlp’ Multi Level Perceptron
‘xgboost’ Extreme Gradient Boosting
‘lightgbm’ Light Gradient Boosting
‘catboost’ CatBoost Regressor

聚類模型:

ID Name
‘kmeans’ K-Means Clustering
‘ap’ Affinity Propagation
‘meanshift’ Mean shift Clustering
‘sc’ Spectral Clustering
‘hclust’ Agglomerative Clustering
‘dbscan’ Density-Based Spatial Clustering
‘optics’ OPTICS Clustering
‘birch’ Birch Clustering
‘kmodes’ K-Modes Clustering

異常檢測模型:

ID Name
‘abod’ Angle-base Outlier Detection
‘iforest’ Isolation Forest
‘cluster’ Clustering-Based Local Outlier
‘cof’ Connectivity-Based Outlier Factor
‘histogram’ Histogram-based Outlier Detection
‘knn’ k-Nearest Neighbors Detector
‘lof’ Local Outlier Factor
‘svm’ One-class SVM detector
‘pca’ Principal Component Analysis
‘mcd’ Minimum Covariance Determinant
‘sod’ Subspace Outlier Detection
‘sos Stochastic Outlier Selection

自然語言處理模型:

ID Model
‘lda’ Latent Dirichlet Allocation
‘lsi’ Latent Semantic Indexing
‘hdp’ Hierarchical Dirichlet Process
‘rp’ Random Projections
‘nmf’ Non-Negative Matrix Factorization

分類例子:

# Importing dataset 
from pycaret.datasets import get_data 
diabetes = get_data('diabetes') 

# Importing module and initializing setup 
from pycaret.classification import * 
clf1 = setup(data = diabetes, target = 'Class variable')

# train logistic regression model
lr = create_model('lr') #lr is the id of the model

# check the model library to see all models
models()

# train rf model using 5 fold CV
rf = create_model('rf', fold = 5)

# train svm model without CV
svm = create_model('svm', cross_validation = False)

# train xgboost model with max_depth = 10
xgboost = create_model('xgboost', max_depth = 10)

# train xgboost model on gpu
xgboost_gpu = create_model('xgboost', tree_method = 'gpu_hist', gpu_id = 0) #0 is gpu-id

# train multiple lightgbm models with n learning_rate<br>import numpy as np
lgbms = [create_model('lightgbm', learning_rate = i) for i in np.arange(0.1,1,0.1)]

# train custom model
from gplearn.genetic import SymbolicClassifier
symclf = SymbolicClassifier(generation = 50)
sc = create_model(symclf)

回歸例子:

# Importing dataset 
from pycaret.datasets import get_data 
boston = get_data('boston') 

# Importing module and initializing setup 
from pycaret.regression import * 
reg1 = setup(data = boston, target = 'medv') 

# train linear regression model
lr = create_model('lr') #lr is the id of the model

# check the model library to see all models
models()

# train rf model using 5 fold CV
rf = create_model('rf', fold = 5)

# train svm model without CV
svm = create_model('svm', cross_validation = False)

# train xgboost model with max_depth = 10
xgboost = create_model('xgboost', max_depth = 10)

# train xgboost model on gpu
xgboost_gpu = create_model('xgboost', tree_method = 'gpu_hist', gpu_id = 0) #0 is gpu-id

# train multiple lightgbm models with n learning_rate
import numpy as np
lgbms = [create_model('lightgbm', learning_rate = i) for i in np.arange(0.1,1,0.1)]

# train custom model
from gplearn.genetic import SymbolicRegressor
symreg = SymbolicRegressor(generation = 50)
sc = create_model(symreg)

聚類例子:

# Importing dataset
from pycaret.datasets import get_data
jewellery = get_data('jewellery')

# Importing module and initializing setup
from pycaret.clustering import *
clu1 = setup(data = jewellery)

# check the model library to see all models
models()

# training kmeans model
kmeans = create_model('kmeans')

# training kmodes model
kmodes = create_model('kmodes')

異常檢測例子:

# Importing dataset
from pycaret.datasets import get_data
anomalies = get_data('anomalies')

# Importing module and initializing setup
from pycaret.anomaly import *
ano1 = setup(data = anomalies)

# check the model library to see all models
models()

# training Isolation Forest
iforest = create_model('iforest')

# training KNN model
knn = create_model('knn')

自然語言處理例子:

# Importing dataset
from pycaret.datasets import get_data
kiva = get_data('kiva')

# Importing module and initializing setup
from pycaret.nlp import *
nlp1 = setup(data = kiva, target = 'en')

# check the model library to see all models
models()

# training LDA model
lda = create_model('lda')

# training NNMF model
nmf = create_model('nmf')

關聯規則例子:

# Importing dataset
from pycaret.datasets import get_data
france = get_data('france')

# Importing module and initializing setup
from pycaret.arules import *
arule1 = setup(data = france, transaction_id = 'InvoiceNo', item_id = 'Description')

# creating Association Rule model
mod1 = create_model(metric = 'confidence')

3、微調模型

在任何模塊中調整機器學習模型的超參數就像編寫tune_model一樣簡單。它使用帶有完全可定制的預定義網格的隨機網格搜索來調整作為估計量傳遞的模型的超參數。優化模型的超參數需要一個目標函數,該目標函數會在有監督的實驗(例如分類或回歸)中自動鏈接到目標變量。但是,對於諸如聚類,異常檢測和自然語言處理之類的無監督實驗,PyCaret允許您通過使用tune_model中的supervised_target參數指定受監督目標變量來定義自定義目標函數(請參見以下示例)。對於有監督的學習,此函數將返回一個表,該表包含k倍的通用評估指標的交叉驗證分數以及訓練有素的模型對象。對於無監督學習,此函數僅返回經過訓練的模型對象。用於監督學習的評估指標是:
分類:准確性,AUC,召回率,精度,F1,Kappa,MCC
回歸:MAE,MSE,RMSE,R2,RMSLE,MAPE
可以使用tune_model函數中的fold參數定義折疊次數。默認情況下,折疊倍數設置為10。默認情況下,所有指標均四舍五入到4位小數,可以使用round參數進行更改。 PyCaret中的音調模型功能是對預定義搜索空間進行的隨機網格搜索,因此它依賴於搜索空間的迭代次數。默認情況下,此函數在搜索空間上執行10次隨機迭代,可以使用tune_model中的n_iter參數進行更改。增加n_iter參數可能會增加訓練時間,但通常會導致高度優化的模型。可以使用優化參數定義要優化的指標。默認情況下,回歸任務將優化R2,而分類任務將優化Accuracy。

分類例子:

# Importing dataset 
from pycaret.datasets import get_data 
diabetes = get_data('diabetes') 

# Importing module and initializing setup 
from pycaret.classification import * 
clf1 = setup(data = diabetes, target = 'Class variable')

# train a decision tree model
dt = create_model('dt')

# tune hyperparameters of decision tree
tuned_dt = tune_model(dt)

# tune hyperparameters with increased n_iter
tuned_dt = tune_model(dt, n_iter = 50)

# tune hyperparameters to optimize AUC
tuned_dt = tune_model(dt, optimize = 'AUC') #default is 'Accuracy'

# tune hyperparameters with custom_grid
params = {"max_depth": np.random.randint(1, (len(data.columns)*.85),20),
          "max_features": np.random.randint(1, len(data.columns),20),
          "min_samples_leaf": [2,3,4,5,6],
          "criterion": ["gini", "entropy"]
          }

tuned_dt_custom = tune_model(dt, custom_grid = params)

# tune multiple models dynamically
top3 = compare_models(n_select = 3)
tuned_top3 = [tune_model(i) for i in top3]

回歸例子:

from pycaret.datasets import get_data 
boston = get_data('boston') 

# Importing module and initializing setup 
from pycaret.regression import * 
reg1 = setup(data = boston, target = 'medv')

# train a decision tree model
dt = create_model('dt')

# tune hyperparameters of decision tree
tuned_dt = tune_model(dt)

# tune hyperparameters with increased n_iter
tuned_dt = tune_model(dt, n_iter = 50)

# tune hyperparameters to optimize MAE
tuned_dt = tune_model(dt, optimize = 'MAE') #default is 'R2'

# tune hyperparameters with custom_grid
params = {"max_depth": np.random.randint(1, (len(data.columns)*.85),20),
          "max_features": np.random.randint(1, len(data.columns),20),
          "min_samples_leaf": [2,3,4,5,6],
          "criterion": ["gini", "entropy"]
          }

tuned_dt_custom = tune_model(dt, custom_grid = params)

# tune multiple models dynamically
top3 = compare_models(n_select = 3)
tuned_top3 = [tune_model(i) for i in top3]

聚類例子:

# Importing dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# Importing module and initializing setup
from pycaret.clustering import *
clu1 = setup(data = diabetes)

# Tuning K-Modes Model
tuned_kmodes = tune_model('kmodes', supervised_target = 'Class variable')

異常檢測例子:

# Importing dataset
from pycaret.datasets import get_data
boston = get_data('boston')

# Importing module and initializing setup
from pycaret.anomaly import *
ano1 = setup(data = boston)

# Tuning Isolation Forest Model
tuned_iforest = tune_model('iforest', supervised_target = 'medv')

自然語言例子:

# Importing dataset
from pycaret.datasets import get_data
kiva = get_data('kiva')

# Importing module and initializing setup
from pycaret.nlp import *
nlp1 = setup(data = kiva, target = 'en')

# Tuning LDA Model
tuned_lda = tune_model('lda', supervised_target = 'status')

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM