python——pandas技巧(處理dataframe每個元素,不用for,而用apply)


用apply處理pandas比用for循環,快了無數倍,測試如下:

我們有一個pandas加載的dataframe如下,features是0和1特征的組合,可惜都是str形式(字符串形式),我們要將其轉換成一個裝有整型int 0和1的list

 

 

 

(1)用for循壞(耗時約3小時)

1 from tqdm import tqdm #計時器函數
2 for i in tqdm(range(df.shape[0])):
3     df['features'][i] = df['features'][i].split(",")   #每一行形如0,0,1,1,0,1,1的string,所以按照逗號切割,返回一個list
4     for j in range(len(df['features'][i])):            #遍歷該list,對於每個元素進行int轉換
5         df['features'][i][j] = int(df['features'][i][j])
6         
7 print(type(df['features'][0]))

 

 

(2)推薦用apply方法(耗時約30秒)

 1 from time import time
 2 from tqdm import tqdm
 3 
 4 def func(x):
 5     l = x.split(",")
 6     for i in range(len(l)):
 7         l[i] = int(l[i])
 8     return l
 9 
10 stime = time()
11 df['new_features'] = df['features'].apply(func)
12 endtime = time()
13 
14 print("time:"+str(endtime-stime)+"s")
15 #df.head()
16 print("over")

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM