【Alias Method for Sampling】原理
對於處理離散分布的隨機變量的取樣問題,Alias Method for Sampling 是一種很高效的方式。
在初始好之后,每次取樣的復雜度為 O(1)。
、、、
【Python 代碼】
# !/usr/bin/env python # encoding: utf-8 __author__ = 'ScarlettZero' # 20180522 # AliasMethod Sampling import time import numpy as np import pandas as pd import numpy.random as npr def alias_setup(probs): ''' :param probs: 某個概率分布 :return: Alias數組與Prob數組 ''' K =len(probs) # K為類別數目 Prob =np.zeros(K) # 對應Prob數組:落在原類型的概率 Alias =np.zeros(K,dtype=np.int) # 對應Alias數組:每一列第二層的類型 #Sort the data into the outcomes with probabilities #that are larger and smaller than 1/K smaller =[] # 存儲比1小的列 larger =[] # 存儲比1大的列 for kk,prob in enumerate(probs): Prob[kk] =K*prob # 概率(每個類別概率乘以K,使得總和為K) if Prob[kk] <1.0: # 然后分為兩類:大於1的和小於1的 smaller.append(kk) else: larger.append(kk) # Loop though and create little binary mixtures that appropriately allocate # the larger outcomes over the overall uniform mixture. #通過拼湊,將各個類別都湊為1 while len(smaller) > 0 and len(larger) > 0: small = smaller.pop() large = larger.pop() Alias[small] = large #填充Alias數組 Prob[large] = Prob[large]-(1.0 - Prob[small]) #將大的分到小的上 if Prob[large] <1.0: smaller.append(large) else: larger.append(large) print("Prob is :", Prob) print("Alias is :", Alias) return Alias,Prob def alias_draw(Alias,Prob): ''' :param J: Alias數組 :param q: Prob數組 :return:一次采樣結果 ''' K=len(Alias) # Draw from the overall uniform mixture. kk = int(np.floor(npr.rand()*K)) #隨機取一列 # Draw from the binary mixture, either keeping the small one, or choosing the associated larger one. # 采樣過程:隨機取某一列k(即[1,4]的隨機整數,再隨機產生一個[0-1]的小數c,) # 如果Prob[kk]大於c, if npr.rand() <Prob[kk]: #比較 return kk else: return Alias[kk] if __name__ == '__main__': start=time.time() K = 5 # K初始化為5類 N = 5 # Get a random probability vector. # probs = npr.dirichlet(np.ones(K), 1).ravel() # .ravel(): 將多維數組降為一維 probs =[0.2,0.3,0.1,0.2,0.2] # Construct the table Alias, Prob = alias_setup(probs) # Prob is : [ 0.25058826 0.69258202 0.83010441 0.87901003 1. ] # Alias is : [4 4 4 4 0] ###### # Generate variates. # X 為有多少樣本需要采樣 X = np.zeros(N) for nn in range(N): X[nn] = alias_draw(Alias, Prob) print("最終的采樣結果X為:",X) end=time.time() spend=end-start print("耗時為:%0.4f s"%spend) sure_k = np.random.choice(5, 1, p=probs) print("surek為:",sure_k) # 關於SEM的並行,我先嘗試了在 sample k 的時候使用Alias Method,但是和之前比效率方面沒見得有提升(之前SEM是利用 sure_k = np.random.choice(aspects_num, 1, p=p) 進行sample k的) # Alias必須是多次采樣才有效率上的提升的。如果每一次sample都新來一次alias那是沒有用的
運行結果:
【Reference】
2、The Alias Method: Efficient Sampling with Many Discrete Outcomes