map()函數


1 map()是python的高階函數,python高階函數是指可以把函數作為參數的函數,函數式編程就是指這種高度抽象的編程范式.

要理解高階函數,首先要明確函數可以賦給變量,函數名本身也是一個變量,也可以給其賦其它值,所以不能隨便起變量名,防止與函數名沖突.

map()函數的特殊的地方在它的第一個參數是個指向函數的變量,第二個參數是一個序列,常為list,它將list中的每一個元素輸入函數,最后將每個返回值合並成一個新的list返回.

參考:https://www.liaoxuefeng.com/wiki/897692888725344/923030148673312

2 由於map()函數中的第二個參數向第一個參數傳參時,是一個一個傳,利用這個特點,可是實現字符串的分割.

map()函數在python2中返回的是列表,而在python3中返回的是一個map對象.要想輸出顯示要加list()

 

a = 12300000
def shuchu(k):
    return k
print(map(shuchu, str(a)))
b = list(map(shuchu, str(a)))
print(b)
# <map object at 0x7ff47f3c0828>
# ['1', '2', '3', '0', '0', '0', '0', '0']

# 直接用字典對list是無法映射的,下面的寫法錯誤
d = {1:'(', -1:')'}
c = list(map(d, [1,1,-1]))
# TypeError: 'dict' object is not callable
View Code

 

參考:https://www.cnblogs.com/linshuhui/p/8980927.html

3 map函數可以做兩個dataframe表格的某些列的融合.

import pandas as pd
import numpy as np

df1 = pd.DataFrame( {'A':[1,2,3,'df2的索引里沒有這個,所以融合后是空'],
                     'B':['a','b','c','d'],
                     'C':['Tom','Jack','Bob','roushi']
                   })
print(df1)
df2 = pd.DataFrame( {'A':[1,2,3,4],
                     'B':[6,7,8,9]})
print(df2)
# 相當於從df1的A列與df2的索引融合后,再做映射
df1['df1的A列與df2的索引做融合后再映射'] = df1['A'].map(df2['B'])
print(df1)
#                      A  B       C
# 0                    1  a     Tom
# 1                    2  b    Jack
# 2                    3  c     Bob
# 3  df2的索引里沒有這個,所以融合后是空  d  roushi
#    A  B
# 0  1  6
# 1  2  7
# 2  3  8
# 3  4  9
#                      A  B       C  df1的A列與df2的索引做融合后再映射
# 0                    1  a     Tom                   7.0
# 1                    2  b    Jack                   8.0
# 2                    3  c     Bob                   9.0
# 3  df2的索引里沒有這個,所以融合后是空  d  roushi                   NaN
View Code

4 基本用法

a.字典映射

import pandas as pd
from pandas import Series, DataFrame

data = DataFrame({'food':['bacon','pulled pork','bacon','Pastrami',
   'corned beef','Bacon','pastrami','honey ham','nova lox'],
     'ounces':[4,3,12,6,7.5,8,3,5,6]})
meat_to_animal = {
 'bacon':'pig',
 'pulled pork':'pig',
 'pastrami':'cow',
 'corned beef':'cow',
 'honey ham':'pig',
 'nova lox':'salmon' }
# Python lower() 方法轉換字符串中所有大寫字符為小寫。  因為meat_to_animal中的食物是小寫,food列的食物是大寫
data['animal'] = data['food'].map(str.lower).map(meat_to_animal)
print(data)
print(data.info())
a = data['food'].map(lambda x: meat_to_animal[x.lower()])
print(a)
#           food  ounces  animal
# 0        bacon     4.0     pig
# 1  pulled pork     3.0     pig
# 2        bacon    12.0     pig
# 3     Pastrami     6.0     cow
# 4  corned beef     7.5     cow
# 5        Bacon     8.0     pig
# 6     pastrami     3.0     cow
# 7    honey ham     5.0     pig
# 8     nova lox     6.0  salmon
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 9 entries, 0 to 8
# Data columns (total 3 columns):
# food      9 non-null object
# ounces    9 non-null float64
# animal    9 non-null object
# dtypes: float64(1), object(2)
# memory usage: 296.0+ bytes
# None
# 0       pig
# 1       pig
# 2       pig
# 3       cow
# 4       cow
# 5       pig
# 6       cow
# 7       pig
# 8    salmon
# Name: food, dtype: object

import pandas as pd
df1 = pd.DataFrame({'a':[1,2,3,4,5],
                    'b':['','','','','']})
df2 = pd.DataFrame({'c':[5,4,2,1,2,3]})
d = df2['c'].map(dict(zip(df1['a'],df1['b'])))
print(d)
# 0    五
# 1    四
# 2    二
# 3    一
# 4    二
# 5    三
# Name: c, dtype: object
View Code

b.與lambda結合使用,函數較簡單時,用lambda非常快.

import pandas as pd
from pandas import Series, DataFrame

index = pd.date_range('2017-08-15', periods=10)
ser = Series(list(range(10)), index=index)
print(ser)
ser.index = ser.index.map(lambda x: x.day)
print(ser)
# 2017-08-15    0
# 2017-08-16    1
# 2017-08-17    2
# 2017-08-18    3
# 2017-08-19    4
# 2017-08-20    5
# 2017-08-21    6
# 2017-08-22    7
# 2017-08-23    8
# 2017-08-24    9
# Freq: D, dtype: int64
# 15    0
# 16    1
# 17    2
# 18    3
# 19    4
# 20    5
# 21    6
# 22    7
# 23    8
# 24    9
# dtype: int64

# 實現兩個list中元素相乘后再求和
# 注意這里map作用於list時,是將list中的元素一個一個的傳過去
a = [1,2,3,4]
b = [2,3,4,5]
sumab = sum(map(lambda x,y:x*y, a,b))
print(sumab)
View Code

c.函數映射

d.series映射

參見3

5 Pool和ThreadPool兩個模塊, 一個基於進程工作, 一個基於線程工作。

使用Pool:

import datetime as dt import matplotlib.pyplot as plt import dask.dataframe as dd from multiprocessing import Pool listdata = [] processnum = 12 user_repay = pd.read_hdf('../data/user_repay_second.h5') for i in range(processnum): datai = fenpei(user_repay, i, processnum) # print(datai['index'].nunique())# 以index分箱
 listdata.append([i, datai]) del datai gc.collect() time1 = dt.datetime.now() with Pool(processnum) as p: p.map(tfun, listdata) print((dt.datetime.now() - time1).total_seconds()) del listdata gc.collect()
View Code

使用ThreadPool:

import time from datetime import datetime from multiprocessing.dummy import Pool as ThreadPool from functools import partial def add(x, y): print(datetime.now(), "enter add func...") time.sleep(2) print(datetime.now(), "leave add func...") return x+y def add_wrap(args): return add(*args) if __name__ == "__main__": pool = ThreadPool(4) # 池的大小為4
    print(pool.map(add_wrap, [(1,2),(3,4),(5,6)])) #close the pool and wait for the worker to exit
 pool.close() pool.join()
View Code

參考:https://blog.csdn.net/moxiaomomo/article/details/77075125

          https://www.liaoxuefeng.com/wiki/1016959663602400/1017629247922688


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM