1.問題說明
輸出數據集的基本信息,比如最大值,最小值,平均值等
統計確實的變量和樣本個數
通過箱式圖判斷異常點
2.求最大值、最小值和平均值
求最大值:
import pandas as pd import numpy as np
data = pd.read_csv("C:\\Users\\Administrator\\Desktop\\catering_sale.csv") data1 = data.describe() print(data1.max())
運行結果:
銷量 9106.44
dtype: float64
求最小值:
import pandas as pd import numpy as np data = pd.read_csv("C:\\Users\\Administrator\\Desktop\\catering_sale.csv") data1 = data.describe() print(data1.min())
運行結果:
銷量 22.0
dtype: float64
求平均值:
import pandas as pd import numpy as np data = pd.read_csv("C:\\Users\\Administrator\\Desktop\\catering_sale.csv") data1 = data.describe() print(data1.mean())
運行結果:
銷量 2621.079309
dtype: float64
3.缺失值的數量
import pandas as pd import numpy as np data = pd.read_csv("C:\\Users\\Administrator\\Desktop\\catering_sale.csv") data2 = data.isnull().sum() print(data2)
運行結果:
日期 0
銷量 1
dtype: int64
4.箱式圖判斷異常點
plt.figure() plt.rcParams['font.sans-serif']=[u'SimHei'] plt.rcParams['axes.unicode_minus']=False p = data.boxplot(return_type='dict') #畫箱式圖 x = p['fliers'][0].get_xdata() y = p['fliers'][0].get_ydata() y.sort() for i in range(len(x)): if i > 0: plt.annotate(y[i], xy=(x[i], y[i]), xytext=(x[i]+0.05 - 0.8/(y[i]-y[i-1]), y[i])) else: plt.annotate(y[i], xy=(x[i], y[i]), xytext=(x[i]+0.08, y[i])) plt.show()
運行結果:
5.完整代碼
import pandas as pd import numpy as np import matplotlib.pyplot as plt #導入圖像庫 data = pd.read_csv("C:\\Users\\Administrator\\Desktop\\catering_sale.csv") data1 = data.describe() data2 = data.isnull().sum() print(data1.mean()) print(data2) plt.figure() plt.rcParams['font.sans-serif']=[u'SimHei'] plt.rcParams['axes.unicode_minus']=False p = data.boxplot(return_type='dict') #畫箱式圖 x = p['fliers'][0].get_xdata() y = p['fliers'][0].get_ydata() y.sort() for i in range(len(x)): if i > 0: plt.annotate(y[i], xy=(x[i], y[i]), xytext=(x[i]+0.05 - 0.8/(y[i]-y[i-1]), y[i])) else: plt.annotate(y[i], xy=(x[i], y[i]), xytext=(x[i]+0.08, y[i])) plt.show()