用apply處理pandas比用for循環,快了無數倍,測試如下:
我們有一個pandas加載的dataframe如下,features是0和1特征的組合,可惜都是str形式(字符串形式),我們要將其轉換成一個裝有整型int 0和1的list
(1)用for循壞(耗時約3小時)
1 from tqdm import tqdm #計時器函數 2 for i in tqdm(range(df.shape[0])): 3 df['features'][i] = df['features'][i].split(",") #每一行形如0,0,1,1,0,1,1的string,所以按照逗號切割,返回一個list 4 for j in range(len(df['features'][i])): #遍歷該list,對於每個元素進行int轉換 5 df['features'][i][j] = int(df['features'][i][j]) 6 7 print(type(df['features'][0]))
(2)推薦用apply方法(耗時約30秒)
1 from time import time 2 from tqdm import tqdm 3 4 def func(x): 5 l = x.split(",") 6 for i in range(len(l)): 7 l[i] = int(l[i]) 8 return l 9 10 stime = time() 11 df['new_features'] = df['features'].apply(func) 12 endtime = time() 13 14 print("time:"+str(endtime-stime)+"s") 15 #df.head() 16 print("over")