sklearn.preprocessing.LabelEncoder的使用

本文轉載自查看原文 2018-12-17 22:00 7006 python/ LabelEncoder/ sklearn

在訓練模型之前，我們通常都要對訓練數據進行一定的處理。將類別編號就是一種常用的處理方法，比如把類別“男”，“女”編號為0和1。可以使用sklearn.preprocessing中的LabelEncoder處理這個問題。

作用

將n個類別編碼為0~n-1之間的整數（包含0和n-1）。

例子

假設我們要對性別數據進行編碼，則數據可以分為兩種情況：無NaN，有NaN。
首先導入要使用的包

import numpy as np
import pandas as pd 
from sklearn import preprocessing

無NaN

數據如下

sex = pd.Series(["male", "female", "female", "male"])

使用LabelEncoder進行處理，過程如下

le = preprocessing.LabelEncoder()    #獲取一個LabelEncoder
le = le.fit(["male", "female"])      #訓練LabelEncoder, 把male編碼為0，female編碼為1
sex = le.transform(sex)                #使用訓練好的LabelEncoder對原數據進行編碼
print(sex)

輸出：

[1 0 0 1]

可以看到LabelEncoder將源數據中用字符串表示的類別編碼成int型的數字，便於訓練。
根據編碼后的類別還可以獲取編碼前的類別：

le.inverse_transform([1,0,0,1])

輸出：

array(['male', 'female', 'female', 'male'], dtype='<U6')

有NaN

假如數據中包含NaN，如下

sex = pd.Series(["male", "female", "female", np.nan])

這時執行

le = preprocessing.LabelEncoder()    #獲取一個LabelEncoder
le = le.fit(["male", "female"])      #訓練LabelEncoder, 把male編碼為0，female編碼為1
sex = le.transform(sex)                #使用訓練好的LabelEncoder對原數據進行編碼
print(sex)

就會出錯

ValueError: y contains previously unseen labels: nan

解決方法也很簡單，只要把NaN替換掉就行了

sex.fillna("unknown", inplace=True)

le = preprocessing.LabelEncoder()    #獲取一個LabelEncoder
le = le.fit(["male", "female", "unknown"])      #訓練LabelEncoder, 把male編碼為0，female編碼為1, unknown為2
sex = le.transform(sex)                #使用訓練好的LabelEncoder對原數據進行編碼
print(sex)

輸出：

[1 0 0 2]

這里將NaN替換為unkown，將unknown加入le.fit中，這樣unknown就會被編碼為2了。

總結

sklearn.preprocessing.LabelEncoder可以簡單方便地將數據中的類別編碼。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 11.sklearn.preprocessing.LabelEncoder的作用使用sklearn之LabelEncoder將Label標准化 sklearn preprocessing （預處理） sklearn.preprocessing.OneHotEncoder 理解 sklearn.preprocessing.MinMaxScaler Python——sklearn.preprocessing.StandardScaler() 數據預處理 | 使用 sklearn.preprocessing.OrdinalEncoder 將分類特征轉換為數值型 LabelEncoder save 離線使用 sklearn5_preprocessing數據標准化數據規范化——sklearn.preprocessing