pandas 遍歷有以下三種訪法。
- iterrows():在單獨的變量中返回索引和行項目,但顯着較慢
- itertuples():快於.iterrows(),但將索引與行項目一起返回,ir [0]是索引
- zip:最快,但不能訪問該行的索引
df= pd.DataFrame({'a': range(0, 10000), 'b': range(10000, 20000)})
0.for i in df:並不是遍歷行的方式
for i in df: print(i)
正式因為for in df不是直接遍歷行的方式所以我們研究了如下方法。
1.iterrows():在單獨的變量中返回索引和行項目,但顯着較慢
df.iterrows()其實返回也是一個tuple=>(索引,Series)
count=0 for i,r in df.iterrows(): print(i,'-->',r,type(r)) count+=1 if count>5: break
2.itertuples():快於.iterrows(),但將索引與行項目一起返回,ir [0]是索引
count=0 for tup in df.itertuples(): print(tup[0],'-->',tup[1::],type(tup[1:])) count+=1 if count>5: break
3.zip:最快,但不能訪問該行的索引
count=0 for tup in zip(df['a'], df['b']): print(tup,type(tup[1:])) count+=1 if count>5: break
4.性能比較
df = pd.DataFrame({'a': range(0, 10000), 'b': range(10000, 20000)}) import time list1 = [] start = time.time() for i,r in df.iterrows(): list1.append((r['a'], r['b'])) print("iterrows耗時 :",time.time()-start) list1 = [] start = time.time() for ir in df.itertuples(): list1.append((ir[1], ir[2])) print("itertuples耗時:",time.time()-start) list1 = [] start = time.time() for r in zip(df['a'], df['b']): list1.append((r[0], r[1])) print("zip耗時 :",time.time()-start)