In [5]: from sklearn import preprocessing ...: le =preprocessing.LabelEncoder() ...: le.fit(["paris", "paris", "tokyo", "amsterdam"]) ...: print('標簽個數:%s'% le.classes_) ...: print('標簽值標准化:%s' % le.transform(["tokyo", "tokyo", "paris"])) ...: print('標准化標簽值反轉:%s' % le.inverse_transform([2, 2, 1])) ...: 標簽個數:['amsterdam' 'paris' 'tokyo'] 標簽值標准化:[2 2 1] 標准化標簽值反轉:['tokyo' 'tokyo' 'paris']
sklearn.preprocessing.LabelEncoder():標准化標簽,將標簽值統一轉換成range(標簽值個數-1)范圍內
例如
["paris", "paris", "tokyo", "amsterdam"];里面不同的標簽數目是3個,則標准化標簽之后就是0,1,2,並且根據字典排序