1 map()是python的高階函數,python高階函數是指可以把函數作為參數的函數,函數式編程就是指這種高度抽象的編程范式.
要理解高階函數,首先要明確函數可以賦給變量,函數名本身也是一個變量,也可以給其賦其它值,所以不能隨便起變量名,防止與函數名沖突.
map()函數的特殊的地方在它的第一個參數是個指向函數的變量,第二個參數是一個序列,常為list,它將list中的每一個元素輸入函數,最后將每個返回值合並成一個新的list返回.
參考:https://www.liaoxuefeng.com/wiki/897692888725344/923030148673312
2 由於map()函數中的第二個參數向第一個參數傳參時,是一個一個傳,利用這個特點,可是實現字符串的分割.
map()函數在python2中返回的是列表,而在python3中返回的是一個map對象.要想輸出顯示要加list()

a = 12300000 def shuchu(k): return k print(map(shuchu, str(a))) b = list(map(shuchu, str(a))) print(b) # <map object at 0x7ff47f3c0828> # ['1', '2', '3', '0', '0', '0', '0', '0'] # 直接用字典對list是無法映射的,下面的寫法錯誤 d = {1:'(', -1:')'} c = list(map(d, [1,1,-1])) # TypeError: 'dict' object is not callable
參考:https://www.cnblogs.com/linshuhui/p/8980927.html
3 map函數可以做兩個dataframe表格的某些列的融合.

import pandas as pd import numpy as np df1 = pd.DataFrame( {'A':[1,2,3,'df2的索引里沒有這個,所以融合后是空'], 'B':['a','b','c','d'], 'C':['Tom','Jack','Bob','roushi'] }) print(df1) df2 = pd.DataFrame( {'A':[1,2,3,4], 'B':[6,7,8,9]}) print(df2) # 相當於從df1的A列與df2的索引融合后,再做映射 df1['df1的A列與df2的索引做融合后再映射'] = df1['A'].map(df2['B']) print(df1) # A B C # 0 1 a Tom # 1 2 b Jack # 2 3 c Bob # 3 df2的索引里沒有這個,所以融合后是空 d roushi # A B # 0 1 6 # 1 2 7 # 2 3 8 # 3 4 9 # A B C df1的A列與df2的索引做融合后再映射 # 0 1 a Tom 7.0 # 1 2 b Jack 8.0 # 2 3 c Bob 9.0 # 3 df2的索引里沒有這個,所以融合后是空 d roushi NaN
4 基本用法
a.字典映射

import pandas as pd from pandas import Series, DataFrame data = DataFrame({'food':['bacon','pulled pork','bacon','Pastrami', 'corned beef','Bacon','pastrami','honey ham','nova lox'], 'ounces':[4,3,12,6,7.5,8,3,5,6]}) meat_to_animal = { 'bacon':'pig', 'pulled pork':'pig', 'pastrami':'cow', 'corned beef':'cow', 'honey ham':'pig', 'nova lox':'salmon' } # Python lower() 方法轉換字符串中所有大寫字符為小寫。 因為meat_to_animal中的食物是小寫,food列的食物是大寫 data['animal'] = data['food'].map(str.lower).map(meat_to_animal) print(data) print(data.info()) a = data['food'].map(lambda x: meat_to_animal[x.lower()]) print(a) # food ounces animal # 0 bacon 4.0 pig # 1 pulled pork 3.0 pig # 2 bacon 12.0 pig # 3 Pastrami 6.0 cow # 4 corned beef 7.5 cow # 5 Bacon 8.0 pig # 6 pastrami 3.0 cow # 7 honey ham 5.0 pig # 8 nova lox 6.0 salmon # <class 'pandas.core.frame.DataFrame'> # RangeIndex: 9 entries, 0 to 8 # Data columns (total 3 columns): # food 9 non-null object # ounces 9 non-null float64 # animal 9 non-null object # dtypes: float64(1), object(2) # memory usage: 296.0+ bytes # None # 0 pig # 1 pig # 2 pig # 3 cow # 4 cow # 5 pig # 6 cow # 7 pig # 8 salmon # Name: food, dtype: object import pandas as pd df1 = pd.DataFrame({'a':[1,2,3,4,5], 'b':['一','二','三','四','五']}) df2 = pd.DataFrame({'c':[5,4,2,1,2,3]}) d = df2['c'].map(dict(zip(df1['a'],df1['b']))) print(d) # 0 五 # 1 四 # 2 二 # 3 一 # 4 二 # 5 三 # Name: c, dtype: object
b.與lambda結合使用,函數較簡單時,用lambda非常快.

import pandas as pd from pandas import Series, DataFrame index = pd.date_range('2017-08-15', periods=10) ser = Series(list(range(10)), index=index) print(ser) ser.index = ser.index.map(lambda x: x.day) print(ser) # 2017-08-15 0 # 2017-08-16 1 # 2017-08-17 2 # 2017-08-18 3 # 2017-08-19 4 # 2017-08-20 5 # 2017-08-21 6 # 2017-08-22 7 # 2017-08-23 8 # 2017-08-24 9 # Freq: D, dtype: int64 # 15 0 # 16 1 # 17 2 # 18 3 # 19 4 # 20 5 # 21 6 # 22 7 # 23 8 # 24 9 # dtype: int64 # 實現兩個list中元素相乘后再求和 # 注意這里map作用於list時,是將list中的元素一個一個的傳過去 a = [1,2,3,4] b = [2,3,4,5] sumab = sum(map(lambda x,y:x*y, a,b)) print(sumab)
c.函數映射
d.series映射
參見3
5 Pool和ThreadPool兩個模塊, 一個基於進程工作, 一個基於線程工作。
使用Pool:

import datetime as dt import matplotlib.pyplot as plt import dask.dataframe as dd from multiprocessing import Pool listdata = [] processnum = 12 user_repay = pd.read_hdf('../data/user_repay_second.h5') for i in range(processnum): datai = fenpei(user_repay, i, processnum) # print(datai['index'].nunique())# 以index分箱
listdata.append([i, datai]) del datai gc.collect() time1 = dt.datetime.now() with Pool(processnum) as p: p.map(tfun, listdata) print((dt.datetime.now() - time1).total_seconds()) del listdata gc.collect()
使用ThreadPool:

import time from datetime import datetime from multiprocessing.dummy import Pool as ThreadPool from functools import partial def add(x, y): print(datetime.now(), "enter add func...") time.sleep(2) print(datetime.now(), "leave add func...") return x+y def add_wrap(args): return add(*args) if __name__ == "__main__": pool = ThreadPool(4) # 池的大小為4
print(pool.map(add_wrap, [(1,2),(3,4),(5,6)])) #close the pool and wait for the worker to exit
pool.close() pool.join()
參考:https://blog.csdn.net/moxiaomomo/article/details/77075125
https://www.liaoxuefeng.com/wiki/1016959663602400/1017629247922688