NumPy 之 案例(隨機漫步)


import numpy as np

The numpy.random module supplements(補充) the built-in Python random with functions for efficiently generating whole arrays of sample values from many kinds of probalility distributions. For example, you can get a 4x4 array of samples from the standard normal distribution using normal:

samples = np.random.normal(size=(4,4))
samples
array([[-0.49777854,  1.01894039,  0.3542692 ,  1.0187122 ],
       [-0.07139068, -0.44245259, -2.05535526,  0.49974435],
       [ 0.80183078, -0.11299759,  1.22759314,  1.37571884],
       [ 0.32086762, -0.79930024, -0.31965109,  0.23004107]])

Python's built-in random module, by contrast(對比), only samples one value at a time. As you can see from this benchmark, numpy.random is well over an order of magnitude faster for generating very large samples:

from random import normalvariate

n = 100000

'python 運行時間:'
%timeit samples = [normalvariate(0,1) for _ in range(n)]
'python 運行時間:'
127 ms ± 7.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
'np.random 運行時間:'
%timeit np.random.normal(size=n)
'np.random 運行時間:'
4.2 ms ± 277 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%time 

# 我后來還是喜歡 %%time 這樣的計時方式

from random import normalvariate 

n = 1000000

'Python run time:'
samples = [normalvariate(0,1) for _ in range(n)]
Wall time: 1.08 s

We say that these are pseudorandom numbers(偽隨機數) because they are generated by an algorithim with deterministic behavior(確定行為的算法生成的) You can change NumPy's random number generation seed number generation seed using np.random.seed:

"cj還是不理解seed(), 是讓算法不改變嗎? 每次都是同一個?"

np.random.seed(1234)
"現在理解了, seed(n),叫做隨機種子, n表示在接下來的n次(抽樣)的結果是一樣的"
'cj還是不理解seed(), 是讓算法不改變嗎? 每次都是同一個?'
'現在理解了, seed(n),叫做隨機種子, n表示在接下來的n次(抽樣)的結果是一樣的'

The data generation functions in numpy.random use a global random seed. To avoid global state, you can use numpy.random.RandomState to create a random number generator islolated(單獨的) from others.

rng = np.random.RandomState(1234)

rng.randn(10)
array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 , -0.72058873,
        0.88716294,  0.85958841, -0.6365235 ,  0.01569637, -2.24268495])

See Table 4-8 for a partial list of functions available in numpy.random. I wil give some examples of leveraging(利用) these function's ablility to generate large arrays of samples all at onece in the next section.

  • seed Seed the random number generator (不明白這個函數還是)
  • permutation Return a random pemutation(排列) of a sequence, or return a permuted range
  • shuffle Randomly permute(轉換) a sequence in-place (隨機洗牌)

  • rand Draw samples from a uniform distribution U~[0, 1], rand(shape)(均勻分布, 每個值出現的可能性是一樣的)
  • uniform U~[0, 1], uniform(low=0, high=1, size)
  • randint Draw random integers from a given low-to-high range. (a,b) 邊界都是可以取到的

  • randn 標准正態分布\(N(\mu=0, \sigma=1)\), randn(shape)
  • normal 正態分布\(N(\mu, \sigma, size)\), normal(low=0, high=1, size)
  • binomial 二項分布
  • chisquare 卡方分布
  • beta
  • gamma
"permutaion(x), 產生0-x范圍內x個隨機自然數的一個排列"
np.random.permutation(6)

"shuffle(seq) 將一個序列隨機打亂, in-place 哦 "
arr1 = np.arange(6)
np.random.shuffle(arr1)
arr1
'permutaion(x), 產生0-x范圍內x個隨機自然數的一個排列'
array([1, 2, 5, 0, 3, 4])
'shuffle(seq) 將一個序列隨機打亂, in-place 哦 '
array([5, 2, 3, 0, 4, 1])
"rand(shape) 0-1的均勻分布哦"
"shape=(2x2x3)-> 2里的每個1是3,每個1里面是1 "
"[[], []]"
"[  [  [6],[6],[6] ],   [ [6],[6],[6]  ]  ]"

np.random.rand(2,1,1)

"uniform()"
np.random.uniform(3,4,5)
'rand(shape) 0-1的均勻分布哦'
'shape=(2x2x3)-> 2里的每個1是3,每個1里面是1 '
'[[], []]'
'[  [  [6],[6],[6] ],   [ [6],[6],[6]  ]  ]'
array([[[0.06152103]],

       [[0.8725525 ]]])
'uniform()'
array([3.35219682, 3.62783057, 3.51758469, 3.37434633, 3.64026243])
"randn(shape), 標准正態分布"
np.random.randn(1,2,3)

"滿足normal(loc=0, scale=1, size=None) 均值80, 標准差10"
np.random.normal(80, 20, 6)
'randn(shape), 標准正態分布'

array([[[0.49112636, 0.90638754, 0.05000051],
        [1.21431522, 0.67847748, 1.3797269 ]]])

'滿足normal(loc=0, scale=1, size=None) 均值80, 標准差10'

array([55.07130243, 56.34397557, 68.95608996, 31.40875572, 89.80741058,
       37.38567435])

"binorma(n, p, size), n次試驗, p次成功概率, 5個樣本量"
np.random.binomial(100, 0.4, 5)

"chisquare(10,2) 服從於自由度為10的卡方分布下的2個樣本"
np.random.chisquare(10, 2)
'binorma(n, p, size), n次試驗, p次成功概率, 樣本量'

array([47, 42, 41, 34, 44])

array([20.07301382, 14.54581473])

Example:Random Walks

The simulation of random walks(隨機漫步) provides an illustrative(實例) of utilizing(使用) array operations. Let's first consider a simple random walk starting at 0 with steps of 1 and -1 occuring(出現) with equal probalility. (1,-1 等概率出現)

Here is a pure Python way to implement a single random walk with 1000 steps using the built-in random module.

import random
import matplotlib.pyplot as plt


def random_walk(position=0, steps=1000, walk=[]):
    """
    position=0  # 初始位置
    walk = [] # 結果列表
    steps = 1000
    """
    for i in range(steps):
        state = 1 if random.randint(0,1) else -1
        position += state
        # 將每次位置存入結果列表中
        walk.append(position)
        
    return walk

# test
walk_result = random_walk()

"plot the first 100 values on one of these random walks:"
"默認折線圖"

plt.plot(walk_result[:100])
'plot the first 100 values on one of these random walks:'

'默認折線圖'

[<matplotlib.lines.Line2D at 0x19329c95208>]

png

You might make the observation that walk is simply the
cumulative(累積的) sum of the random steps and could be evaluate as an array expression, Thus, I use the np.random module to draw 1000 coin flips at once, set these to 1 and -1, and compute the cumulative sum:

nsteps = 1000

"用size代替for循環, 面向數組編程哦"
draws = np.random.randint(0, 2, size=nsteps)

"np.where 三元很厲害, 代替if-else"
steps = np.where(draws > 0, 1, -1)

'cumsum 累積求和'
walk = steps.cumsum()

plt.plot(walk)
'用size代替for循環, 面向數組編程哦'

'np.where 三元很厲害, 代替if-else'

'cumsum 累積求和'

[<matplotlib.lines.Line2D at 0x1932a032278>]

png

From this we can begin to extract(提取) statistics like the minmun and maxmun value along the walks trajectory(軌跡)

"walk的極值"
walk.min()
walk.max()

type(walk)
'walk的極值'

-17

15

numpy.ndarray

# cj test cumsum()
cj_arr = np.array([[1,2,3],[4,5,6]])
cj_arr

"cumsum()累積求和, 返回的是narray 可看到累積的過程哦"
np.cumsum(cj_arr, axis=0) # 往下, 顯示沒列的和

np.cumsum(cj_arr, axis=1) # 往右, 顯示每行的和
array([[1, 2, 3],
       [4, 5, 6]])

'cumsum()累積求和, 返回的是narray 可看到累積的過程哦'

array([[1, 2, 3],
       [5, 7, 9]], dtype=int32)

array([[ 1,  3,  6],
       [ 4,  9, 15]], dtype=int32)

A more complicated(更復雜的) statistic is the first crossing time, the step at which the random walk reaches a particular value. Here we might want to know how long it took the random walk to get at least 10 steps away from the orgin 0 in either direction. (距離原點為10, 需要多少次) np.abs(walk) >= 10 gives us a boolean array indicating(指明) where(是否) the walk has reached or exceeded 10, but we want the index of the first 10 or -10, Turn out(結果是), we can compute this using argmax, which returns the first index of maximum value in the boolean array(True is the maximum): -> argmax()返回數組中, 最大值的第一個索引, 配合maximum使用

"查詢累積數組中, 絕對距離大於10的,的第一個值的索引 "
(np.abs(walk) >= 10).argmax()
'查詢累積數組中, 絕對距離大於10的,的第一個值的索引 '

49

Note that(注意) using argmax here is not always efficient because it always makes a full scan of the array. -> 用argmax()來找最大值索引, 效率是不高的,因為需要遍歷整個數組. In this special case, once a True is observed we know it to be the maximum value.

# cj test

np.max([1,3,2,5,7])

"maximum返回一個數組, 逐個比較傳入數組的值,和第二個參數比較, quit其大,替換"
np.maximum([10,1,3,2,5,7],6)
7

'maximum返回一個數組, 逐個比較傳入數組的值,和第二個參數比較, quit其大,替換'

array([10,  6,  6,  6,  6,  7])

"待補充中..."
np.maximum(np.abs(walk), 10).argmax()
530

# cj_arr1 = np.array([10,6,6,6,6,7]).argmax()
# cj_arr1
0

Simulating Many Random Walk at Once

If your goal to simulate many random walks, say 5000 of them, you can generate all of the random walks with minor modiffcations(微小的修改) to the preceding code(之前的代碼). If passed a 2-tuple, the numpy.random functions will generate a two-dimensional array of draws, and we can compute the cumulative sum across the rows to compute all 5000 random walks in one shot:

# cj test
np.random.randint(0,2, size=(3,4))
array([[0, 0, 0, 1],
       [0, 1, 0, 1],
       [0, 0, 1, 1]])

"一次生成500次行走記錄, 每次走1000不的大數據集5000 x 1000"
nwalks = 5000
nsteps = 1000

# 0 or 1
%time draws = np.random.randint(0, 2, size=(nwalks, nsteps))

"5000000萬次, 這耗時也太短了,厲害了, 1秒=10^3ms"

" >0, 1, <0, -1 "
steps = np.where(draws > 0, 1, -1)

"軸1, 列方向, 右邊, 按照每行累積"
walks = steps.cumsum(axis=1)

walks

'一次生成500次行走記錄, 每次走1000不的大數據集5000 x 1000'

Wall time: 58 ms

'5000000萬次, 這耗時也太短了,厲害了'

' >0, 1, <0, -1 '

'軸1, 列方向, 右邊, 按照每行累積'

array([[ -1,   0,  -1, ...,  22,  21,  22],
       [ -1,  -2,  -3, ...,  52,  53,  54],
       [  1,   0,  -1, ..., -20, -19, -20],
       ...,
       [ -1,  -2,  -3, ..., -48, -47, -48],
       [  1,   2,   1, ...,  -8,  -7,  -8],
       [ -1,  -2,  -1, ..., -10, -11, -10]], dtype=int32)

Now, we can compute the maximun and minimum values obtained(獲得) over all of the walks: ->獲取最大值, 最小值

"整個數組的"

'最大值', walks.max()
'最小值', walks.min()
'整個數組的'

('最大值', 112)

('最小值', -109)

Out of these walks, let's compute the minimum crossing time to 30 or -30. This is slightly tricky(稍微有些棘手) because not all 5000 of them reach 30. We can check this using the any method:

hist30 = (np.abs(walks) >= 30).any(axis=1)  # 按照行

hist30

'統計 number that hit 30 or -30, 有多少行'
hist30.sum()
array([ True,  True,  True, ...,  True,  True,  True])

'統計 number that hit 30 or -30, 有多少行'

3441

We can use this boolean array to select out the rows of walks that actually cross the absolute 30 level an d call argmax across axis=1 to get the crossing times:

'cj待理解'
crossing_times = (np.abs(walks[hist30]) >= 30).argmax(1)
crossing_times.mean()
'cj待理解'

497.102877070619

Feel free to experiment(積極地嘗試) with other distributions for the steps other than equal-sized coin flips(硬幣試驗). You need only use a different random number generation function, like normal to generate normally distribute steps with some mean and standard deviation(標准差)

import sys

steps = np.random.normal(loc=0, scale=0.25, size=(5000, 1000))

"查看這個對象占多少內存"

"{}的{}占用{}字節".format(steps.shape, steps.dtype, sys.getsizeof(steps))
'查看這個對象占多少內存'

'(5000, 1000)的float64占用40000112字節'

Conclusion

While(當然) much of the rest of the book will focus on building data wrangling(數據整理) skills with pandas, we will continue to work in a similar array-based style. In Appendix A, we will dig deeper(深入挖掘) into NumPy features to help you further develop your array computing skills.


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM