pandas將非數值型特征轉化為數值型(one-hot編碼)


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

name = np.array([['jack', 'ross', 'john', 'blues', 'frank', 'bitch', 'haha', 'asd', 'loubin']])
age = np.array([[12, 32, 23, 4,32,45,65,23,65]])
married = np.array([[1, 0, 1, 1, 0, 1, 0, 0, 0]])
gender = np.array([[0, 0, 0, 0, 1, 1, 1, 1, 1]])


matrix = np.concatenate((name, age, married, gender), axis=0)
matrix = matrix.T


data = pd.DataFrame(data=matrix, columns=['name', 'age', 'married', 'gender'])
print(data)

print(pd.get_dummies(data=data['name'], prefix='name'))

運行結果如下,新的表的列名是以被編碼的列的值進行命名,可以定義前綴

C:\software\Anaconda\envs\ml\python.exe C:/學習/python/科比生涯數據分析/venv/groupy.py
     name age married gender
0    jack  12       1      0
1    ross  32       0      0
2    john  23       1      0
3   blues   4       1      0
4   frank  32       0      1
5   bitch  45       1      1
6    haha  65       0      1
7     asd  23       0      1
8  loubin  65       0      1
   name_asd  name_bitch  name_blues  ...  name_john  name_loubin  name_ross
0         0           0           0  ...          0            0          0
1         0           0           0  ...          0            0          1
2         0           0           0  ...          1            0          0
3         0           0           1  ...          0            0          0
4         0           0           0  ...          0            0          0
5         0           1           0  ...          0            0          0
6         0           0           0  ...          0            0          0
7         1           0           0  ...          0            0          0
8         0           0           0  ...          0            1          0

[9 rows x 9 columns]

Process finished with exit code 0

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM