Pytorch中的隨機性問題：np.random.seed()、np.random.RandomState()、cudnn.benchmark和cudnn.deterministic

本文轉載自查看原文 2021-04-01 15:36 527 Pytorch/ Python

一、

在利用python處理數據的時候，經常會用到numpy API:

np.random.seed() 與 np.random.RandomState()

但這兩個函數的用法，一直不太好理解。在網上查找了許多文章，研究了下他們的異同。做個記錄。

1,np.random.seed()

設置seed()里的數字就相當於設置了一個盛有隨機數的“聚寶盆”，一個數字代表一個“聚寶盆”。

當在seed()的括號里設置相同的seed，“聚寶盆”就是一樣的，當然每次拿出的隨機數就會相同。

如果不設置seed，則每次會生成不同的隨機數，但是有時候明明設置了seed()沒有變，生成的隨機數組還是不同。

np.random.seed(0) a = np.random.rand(10) b = np.random.rand(10) print(a) print("\n") print(b) #輸出結果 [0.5488135 0.71518937 0.60276338 0.54488318 0.4236548 0.64589411 0.43758721 0.891773 0.96366276 0.38344152] [0.79172504 0.52889492 0.56804456 0.92559664 0.07103606 0.0871293 0.0202184 0.83261985 0.77815675 0.87001215]

設置了seed沒變,但是輸出不一樣。

其實，第二遍的np.random.rand(10)已經不是在之前設置的np.random.seed(0)下了，所以第二遍的隨機數組只是在默認random下隨機挑選的樣本數值。

那如何讓兩次隨機數組一樣？

只需要再輸入一遍np.random.seed(0)。

np.random.seed(0)
a = np.random.rand(4,3)

np.random.seed(0)
b =  np.random.rand(4,3)

print(a)
print("\n")
print(b)


#輸出
[[0.5488135  0.71518937 0.60276338]
 [0.54488318 0.4236548  0.64589411]
 [0.43758721 0.891773   0.96366276]
 [0.38344152 0.79172504 0.52889492]]


[[0.5488135  0.71518937 0.60276338]
 [0.54488318 0.4236548  0.64589411]
 [0.43758721 0.891773   0.96366276]
 [0.38344152 0.79172504 0.52889492]]

用兩個自定義函數舉例，進一步理解下np.random.seed()的用法

def rng():
    for i in range(5):
        np.random.seed(123)
        print(np.random.rand(4))

rng()


#輸出
[0.69646919 0.28613933 0.22685145 0.55131477]
[0.69646919 0.28613933 0.22685145 0.55131477]
[0.69646919 0.28613933 0.22685145 0.55131477]
[0.69646919 0.28613933 0.22685145 0.55131477]
[0.69646919 0.28613933 0.22685145 0.55131477]



def rng2():
    np.random.seed(123)
    for i in range(5):
        print(np.random.rand(4))
        
rng2()


#輸出
[0.69646919 0.28613933 0.22685145 0.55131477]
[0.71946897 0.42310646 0.9807642  0.68482974]
[0.4809319  0.39211752 0.34317802 0.72904971]
[0.43857224 0.0596779  0.39804426 0.73799541]
[0.18249173 0.17545176 0.53155137 0.53182759]

2,np.random.RandomState()

numpy.random.RandomState()是一個偽隨機數生成器, 此命令將會產生一個隨機狀態種子,在該狀態下生成的隨機序列（正態分布）一定會有相同的模式。

偽隨機數是用確定性的算法計算出來的似來自[0,1]均勻分布的隨機數序列。並不真正的隨機，但具有類似於隨機數的統計特征，如均勻性、獨立性等。（來自百度）

但是，不同的隨機種子狀態將會有不同的數據生成模式。這一特點在隨機數據生成的統計格式控制顯得很重要。

np.random.RandomState()跟numpy.random.seed()的用法幾乎一樣。

rng = np.random.RandomState(0)
a = rng.rand(4)

rng = np.random.RandomState(0)
b = rng.rand(4)

print(a)
print("\n")
print(b)

#輸出
[0.5488135  0.71518937 0.60276338 0.54488318]


[0.5488135  0.71518937 0.60276338 0.54488318]

生成一樣的隨機數組，這點和numpy.random.seed（）是一樣的因為是偽隨機數，所以必須在rng這個變量下使用，如果不這樣做，就得不到相同的隨機數組。

即便再次輸入numpy.random.RandomState()，這是因為np.random.rand()在默認狀態下，是從默認隨機數組里挑出的隨機樣本

rng = np.random.RandomState(0)
a = rng.randn(4)
b = rng.randn(4)

print(a)
print(b)

#輸出
[1.76405235 0.40015721 0.97873798 2.2408932 ]
[ 1.86755799 -0.97727788  0.95008842 -0.15135721]

同樣用用兩個自定義函數舉例，進一步理解下np.random.RandomState()的用法

def rng1():
    for i in range(4):
        rng = np.random.RandomState(0)
        print("i = ",i)
        print(rng.rand(3,2))
rng1()


#輸出
i =  0
[[0.5488135  0.71518937]
 [0.60276338 0.54488318]
 [0.4236548  0.64589411]]
i =  1
[[0.5488135  0.71518937]
 [0.60276338 0.54488318]
 [0.4236548  0.64589411]]
i =  2
[[0.5488135  0.71518937]
 [0.60276338 0.54488318]
 [0.4236548  0.64589411]]
i =  3
[[0.5488135  0.71518937]
 [0.60276338 0.54488318]
 [0.4236548  0.64589411]]




def rng3():
    rng =np.random.RandomState(0)
    for i in range(4):
        print("i = ",i)
        print(rng.rand(3,2))
rng3()

#輸出
i =  0
[[0.5488135  0.71518937]
 [0.60276338 0.54488318]
 [0.4236548  0.64589411]]
i =  1
[[0.43758721 0.891773  ]
 [0.96366276 0.38344152]
 [0.79172504 0.52889492]]
i =  2
[[0.56804456 0.92559664]
 [0.07103606 0.0871293 ]
 [0.0202184  0.83261985]]
i =  3
[[0.77815675 0.87001215]
 [0.97861834 0.79915856]
 [0.46147936 0.78052918]]

編輯於 2020-11-06

原文鏈接：https://zhuanlan.zhihu.com/p/66507920#:~:text=numpy.random.RandomState()%E6%98%AF,%E4%BC%9A%E6%9C%89%E7%9B%B8%E5%90%8C%E7%9A%84%E6%A8%A1%E5%BC%8F%E3%80%82&text=%E4%BD%86%E6%98%AF%EF%BC%8C%E4%B8%8D%E5%90%8C%E7%9A%84%E9%9A%8F%E6%9C%BA%E7%A7%8D%E5%AD%90,%E4%B8%8D%E5%90%8C%E7%9A%84%E6%95%B0%E6%8D%AE%E7%94%9F%E6%88%90%E6%A8%A1%E5%BC%8F%E3%80%82

問題

在很多情況下我們都能看到代碼里有這樣一行：

torch.backends.cudnn.benchmark = true

1	torch.backends.cudnn.benchmark = true

而且大家都說這樣可以增加程序的運行效率。那到底有沒有這樣的效果，或者什么情況下應該這樣做呢？

解決辦法

總的來說，大部分情況下，設置這個 flag 可以讓內置的 cuDNN 的 auto-tuner 自動尋找最適合當前配置的高效算法，來達到優化運行效率的問題。

一般來講，應該遵循以下准則：

如果網絡的輸入數據維度或類型上變化不大，設置 torch.backends.cudnn.benchmark = true 可以增加運行效率；
如果網絡的輸入數據在每次 iteration 都變化的話，會導致 cnDNN 每次都會去尋找一遍最優配置，這樣反而會降低運行效率。

這下就清晰明了很多了。

轉載：https://www.pytorchtutorial.com/when-should-we-set-cudnn-benchmark-to-true/

為什么使用相同的網絡結構，跑出來的效果完全不同，用的學習率，迭代次數，batch size 都是一樣？固定隨機數種子是非常重要的。但是如果你使用的是PyTorch等框架，還要看一下框架的種子是否固定了。還有，如果你用了cuda，別忘了cuda的隨機數種子。這里還需要用到torch.backends.cudnn.deterministic.

torch.backends.cudnn.deterministic是啥？顧名思義，將這個 flag 置為True的話，每次返回的卷積算法將是確定的，即默認算法。如果配合上設置 Torch 的隨機種子為固定值的話，應該可以保證每次運行網絡的時候相同輸入的輸出是固定的，代碼大致這樣

def init_seeds(seed=0): torch.manual_seed(seed) # sets the seed for generating random numbers. torch.cuda.manual_seed(seed) # Sets the seed for generating random numbers for the current GPU. It’s safe to call this function if CUDA is not available; in that case, it is silently ignored. torch.cuda.manual_seed_all(seed) # Sets the seed for generating random numbers on all GPUs. It’s safe to call this function if CUDA is not available; in that case, it is silently ignored. if seed == 0: torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False

編輯於 2020-05-15

原文鏈接：https://zhuanlan.zhihu.com/p/141063432

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 np.random.seed()與np.random.RandomState() 區別 np.random.seed() np.random.RandomState(123) 怎么理解np.random.seed()? python(np.random.seed()) numpy：np.random.seed() np.random.seed(0)的作用：作用：使得隨機數據可預測。 python指定概率隨機取值理解np.random.seed() numpy中np.random.seed()的詳細用法 numpy中的np.random.mtrand.RandomState