探索學生對酒的消費情況
數據見github
步驟1 - 導入必要的庫
import pandas as pd import numpy as np
步驟2 - 數據集
path4 = "./data/student-mat.csv"
步驟3 將數據命名為student
student = pd.read_csv(path4)
student.head()
輸出:


步驟4 從'school'到'guardian'將數據切片
stud_alcoh = student.loc[: , "school":"guardian"] stud_alcoh.head()
輸出:

步驟5 創建一個捕獲字符串的lambda函數
captalizer = lambda x: x.upper()
步驟6 使'Fjob'列都大寫
stud_alcoh['Fjob'].apply(captalizer)
輸出:
0 TEACHER
1 OTHER
2 OTHER
3 SERVICES
4 OTHER
5 OTHER
6 OTHER
7 TEACHER
8 OTHER
9 OTHER
10 HEALTH
11 OTHER
12 SERVICES
13 OTHER
14 OTHER
15 OTHER
16 SERVICES
17 OTHER
18 SERVICES
19 OTHER
20 OTHER
21 HEALTH
22 OTHER
23 OTHER
24 HEALTH
25 SERVICES
26 OTHER
27 SERVICES
28 OTHER
29 TEACHER
...
365 OTHER
366 SERVICES
367 SERVICES
368 SERVICES
369 TEACHER
370 SERVICES
371 SERVICES
372 AT_HOME
373 OTHER
374 OTHER
375 OTHER
376 OTHER
377 SERVICES
378 OTHER
379 OTHER
380 TEACHER
381 OTHER
382 SERVICES
383 SERVICES
384 OTHER
385 OTHER
386 AT_HOME
387 OTHER
388 SERVICES
389 OTHER
390 SERVICES
391 SERVICES
392 OTHER
393 OTHER
394 AT_HOME
Name: Fjob, dtype: object
步驟7 打印數據集的最后幾行元素
stud_alcoh.tail()
輸出:

步驟8 注意到原始數據框仍然是小寫字母,接下來改進一下
stud_alcoh['Mjob'] = stud_alcoh['Mjob'].apply(captalizer) stud_alcoh['Fjob'] = stud_alcoh['Fjob'].apply(captalizer) stud_alcoh.tail()
輸出:

步驟9 創建一個名為majority的函數,它返回一個布爾值到一個名為legal_drinker的新列(多數年齡大於17歲)
def majority(x): if x > 17: return True else: return False
stud_alcoh['legal_drinker'] = stud_alcoh['age'].apply(majority) stud_alcoh.head()
輸出:

步驟10 將數據集的每個數字乘以10
def times10(x): if type(x) is int: return 10 * x return x
stud_alcoh.applymap(times10).head(10)
輸出:

參考鏈接:
1、http://pandas.pydata.org/pandas-docs/stable/cookbook.html#cookbook
2、https://www.analyticsvidhya.com/blog/2016/01/12-pandas-techniques-python-data-manipulation/
3、https://github.com/guipsamora/pandas_exercises
