Alias Method for Sampling 采樣方法

本文轉載自查看原文 2018-05-27 21:28 1295 AI-ML-DL/ DataMining

【Alias Method for Sampling】原理

對於處理離散分布的隨機變量的取樣問題，Alias Method for Sampling 是一種很高效的方式。

在初始好之后，每次取樣的復雜度為 $O (1)$

$O (1)$

【Python 代碼】

# !/usr/bin/env python
# encoding: utf-8
__author__ = 'ScarlettZero'

# 20180522
# AliasMethod Sampling

import time
import numpy as np
import pandas as pd
import numpy.random as npr

def alias_setup(probs):
    '''

    :param probs: 某個概率分布
    :return: Alias數組與Prob數組
    '''
    K =len(probs) # K為類別數目
    Prob =np.zeros(K) # 對應Prob數組：落在原類型的概率
    Alias =np.zeros(K,dtype=np.int) # 對應Alias數組：每一列第二層的類型

    #Sort the data into the outcomes with probabilities
    #that are larger and smaller than 1/K
    smaller =[] # 存儲比1小的列
    larger =[] # 存儲比1大的列

    for kk,prob in enumerate(probs):
        Prob[kk] =K*prob # 概率（每個類別概率乘以K，使得總和為K）
        if Prob[kk] <1.0: # 然后分為兩類：大於1的和小於1的
            smaller.append(kk)
        else:
            larger.append(kk)

    # Loop though and create little binary mixtures that appropriately allocate
    # the larger outcomes over the overall uniform mixture.

    #通過拼湊，將各個類別都湊為1
    while len(smaller) > 0 and len(larger) > 0:
        small = smaller.pop()
        large = larger.pop()

        Alias[small] = large #填充Alias數組
        Prob[large] = Prob[large]-(1.0 - Prob[small]) #將大的分到小的上

        if Prob[large] <1.0:
            smaller.append(large)
        else:
            larger.append(large)
    print("Prob is :", Prob)
    print("Alias is :", Alias)
    return Alias,Prob

def alias_draw(Alias,Prob):
    '''
    :param J: Alias數組
    :param q: Prob數組
    :return:一次采樣結果
    '''
    K=len(Alias)

    # Draw from the overall uniform mixture.
    kk = int(np.floor(npr.rand()*K)) #隨機取一列

    # Draw from the binary mixture, either keeping the small one, or choosing the associated larger one.
    # 采樣過程：隨機取某一列k（即[1,4]的隨機整數，再隨機產生一個[0-1]的小數c，）
    # 如果Prob[kk]大於c，
    if npr.rand() <Prob[kk]: #比較
        return kk
    else:
        return Alias[kk]

if __name__ == '__main__':
    start=time.time()

    K = 5  # K初始化為5類
    N = 5

    # Get a random probability vector.
    # probs = npr.dirichlet(np.ones(K), 1).ravel()  # .ravel(): 將多維數組降為一維
    probs =[0.2,0.3,0.1,0.2,0.2]
    # Construct the table
    Alias, Prob = alias_setup(probs)

    # Prob is : [ 0.25058826  0.69258202  0.83010441  0.87901003  1.        ]
    # Alias is : [4 4 4 4 0]
    ######

    # Generate variates.
    # X 為有多少樣本需要采樣
    X = np.zeros(N)
    for nn in range(N):
        X[nn] = alias_draw(Alias, Prob)
    print("最終的采樣結果X為：",X)

    end=time.time()
    spend=end-start
    print("耗時為：%0.4f s"%spend)

    sure_k = np.random.choice(5, 1, p=probs)
    print("surek為：",sure_k)
    # 關於SEM的並行，我先嘗試了在 sample k 的時候使用Alias Method，但是和之前比效率方面沒見得有提升（之前SEM是利用  sure_k = np.random.choice(aspects_num, 1, p=p) 進行sample k的）
    # Alias必須是多次采樣才有效率上的提升的。如果每一次sample都新來一次alias那是沒有用的

運行結果：

【Reference】

1、Alias Method離散分布隨機取樣

2、The Alias Method: Efficient Sampling with Many Discrete Outcomes

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 抽樣方法（Sampling Method） Tensorflow的采樣方法：candidate sampling(zhuan) 離散采樣算法---Alias采樣方法隨機采樣方法整理與講解（MCMC、Gibbs Sampling等）隨機采樣方法整理與講解（MCMC、Gibbs Sampling等）采樣方法 - Sampling Matters in Deep Embedding Learning（Distance weighted sampling） - 1 - 論文學習隨機模擬的基本思想和常用采樣方法(sampling) [轉] 隨機模擬的基本思想和常用采樣方法（sampling）漫談“采樣”（sampling） Alias采樣算法