狀況:在pyspark程序中發現寫
import pandas as pd from pyspark.sql import * from pyspark.sql.functions import * from pyspark.sql.types import * import copy spark = SparkSession.builder.appName("BR-54751").enableHiveSupport().getOrCreate() df = spark.sql( """ select screen_width, screen_height, count(distinct account_id) as people from ff_facts.daily_active_account_facts where local_dt between "20200511" and "20200515" and system_platform = "iOS" group by screen_height,screen_width order by screen_width,screen_height """ ) data = df.toPandas() # 設定手機寬是大於高(橫向) for i in range(data.shape[0]): if data.iloc[i,0]<data.iloc[i,1]: temp = data.iloc[i,0] data.iloc[i,0] = data.iloc[i,1] data.iloc[i,1] = temp # 手機尺寸按照字段寬 從小到大排序 data = data.sort_values(by="screen_width",ascending=True) data.reset_index(drop=True,inplace=True) print("total=",sum(data["people"]))
最后一句報錯,顯示sum()函數有問題
原因:
我在import中 聲明了from pyspark.sql.functions import *
程序中 sum()和pyspark中的sum()沖突了。
我默認sum()是python自帶的函數,但是由於聲明了pyspark中的函數,導致程序解析了pyspark中的sum()函數,造成了沖突
解決:
方式1-----直接刪除 from pyspark.sql.functions import *
方式2 -----將sum(data["people"]) 改為 data["people"].sum()