如上圖所示,計算區間[a b]上f(x)的積分即求曲線與X軸圍成紅色區域的面積。下面使用蒙特卡洛法計算區間[2 3]上的定積分:∫(x2+4*x*sin(x))dx
1 # -*- coding: utf-8 -*- 2 import numpy as np 3 import matplotlib.pyplot as plt 4 5 def f(x): 6 return x**2 + 4*x*np.sin(x) 7 8 def intf(x): 9 return x**3/3.0+4.0*np.sin(x) - 4.0*x*np.cos(x) 10 11 a = 2; 12 b = 3; 13 14 # use N draws 15 N= 10000 16 17 X = np.random.uniform(low=a, high=b, size=N) # N values uniformly drawn from a to b 18 Y =f(X) # CALCULATE THE f(x) 19 20 # 蒙特卡洛法計算定積分:面積=寬度*平均高度 21 Imc= (b-a) * np.sum(Y)/ N; 22 23 exactval=intf(b)-intf(a) 24 25 print "Monte Carlo estimation=",Imc, "Exact number=", intf(b)-intf(a) 26 27 # --How does the accuracy depends on the number of points(samples)? Lets try the same 1-D integral 28 # The Monte Carlo methods yield approximate answers whose accuracy depends on the number of draws. 29 Imc=np.zeros(1000) 30 Na = np.linspace(0,1000,1000) 31 32 exactval= intf(b)-intf(a) 33 34 for N in np.arange(0,1000): 35 X = np.random.uniform(low=a, high=b, size=N) # N values uniformly drawn from a to b 36 Y =f(X) # CALCULATE THE f(x) 37 Imc[N]= (b-a) * np.sum(Y)/ N; 38 39 plt.plot(Na[10:],np.sqrt((Imc[10:]-exactval)**2), alpha=0.7) 40 plt.plot(Na[10:], 1/np.sqrt(Na[10:]), 'r') 41 plt.xlabel("N") 42 plt.ylabel("sqrt((Imc-ExactValue)$^2$)") 43 plt.show()
>>>
Monte Carlo estimation= 11.8181144118 Exact number= 11.8113589251
從上圖可以看出,隨着采樣點數的增加,計算誤差逐漸減小。想要提高模擬結果的精確度有兩個途徑:其一是增加試驗次數N;其二是降低方差σ2. 增加試驗次數勢必使解題所用計算機的總時間增加,要想以此來達到提高精度之目的顯然是不合適的。下面來介紹重要抽樣法來減小方差,提高積分計算的精度。
重要性抽樣法的特點在於,它不是從給定的過程的概率分布抽樣,而是從修改的概率分布抽樣,使對模擬結果有重要作用的事件更多出現,從而提高抽樣效率,減少花費在對模擬結果無關緊要的事件上的計算時間。比如在區間[a b]上求g(x)的積分,若采用均勻抽樣,在函數值g(x)比較小的區間內產生的抽樣點跟函數值較大處區間內產生的抽樣點的數目接近,顯然抽樣效率不高,可以將抽樣概率密度函數改為f(x),使f(x)與g(x)的形狀相近,就可以保證對積分計算貢獻較大的抽樣值出現的機會大於貢獻小的抽樣值,即可以將積分運算改寫為:
x是按照概率密度f(x)抽樣獲得的隨機變量,顯然在區間[a b]內應該有:
因此,可容易將積分值I看成是隨機變量 Y = g(x)/f(x)的期望,式子中xi是服從概率密度f(x)的采樣點
下面的例子采用一個正態分布函數f(x)來近似g(x)=sin(x)*x,並依據正態分布選取采樣值計算區間[0 pi]上的積分個∫g(x)dx
1 # -*- coding: utf-8 -*- 2 # Example: Calculate ∫sin(x)xdx 3 4 # The function has a shape that is similar to Gaussian and therefore 5 # we choose here a Gaussian as importance sampling distribution. 6 7 from scipy import stats 8 from scipy.stats import norm 9 import numpy as np 10 import matplotlib.pyplot as plt 11 12 13 mu = 2; 14 sig =.7; 15 16 f = lambda x: np.sin(x)*x 17 infun = lambda x: np.sin(x)-x*np.cos(x) 18 p = lambda x: (1/np.sqrt(2*np.pi*sig**2))*np.exp(-(x-mu)**2/(2.0*sig**2)) 19 normfun = lambda x: norm.cdf(x-mu, scale=sig) 20 21 plt.figure(figsize=(18,8)) # set the figure size 22 23 # range of integration 24 xmax =np.pi 25 xmin =0 26 27 # Number of draws 28 N =1000 29 30 # Just want to plot the function 31 x=np.linspace(xmin, xmax, 1000) 32 plt.subplot(1,2,1) 33 plt.plot(x, f(x), 'b', label=u'Original $x\sin(x)$') 34 plt.plot(x, p(x), 'r', label=u'Importance Sampling Function: Normal') 35 plt.xlabel('x') 36 plt.legend() 37 # ============================================= 38 # EXACT SOLUTION 39 # ============================================= 40 Iexact = infun(xmax)-infun(xmin) 41 print Iexact 42 # ============================================ 43 # VANILLA MONTE CARLO 44 # ============================================ 45 Ivmc = np.zeros(1000) 46 for k in np.arange(0,1000): 47 x = np.random.uniform(low=xmin, high=xmax, size=N) 48 Ivmc[k] = (xmax-xmin)*np.mean(f(x)) 49 50 51 # ============================================ 52 # IMPORTANCE SAMPLING 53 # ============================================ 54 # CHOOSE Gaussian so it similar to the original functions 55 56 # Importance sampling: choose the random points so that 57 # more points are chosen around the peak, less where the integrand is small. 58 Iis = np.zeros(1000) 59 for k in np.arange(0,1000): 60 # DRAW FROM THE GAUSSIAN: xis~N(mu,sig^2) 61 xis = mu + sig*np.random.randn(N,1); 62 xis = xis[ (xis<xmax) & (xis>xmin)] ; 63 64 # normalization for gaussian from 0..pi 65 normal = normfun(np.pi)-normfun(0) # 注意:概率密度函數在采樣區間[0 pi]上的積分需要等於1 66 Iis[k] =np.mean(f(xis)/p(xis))*normal # 因此,此處需要乘一個系數即p(x)在[0 pi]上的積分 67 68 plt.subplot(1,2,2) 69 plt.hist(Iis,30, histtype='step', label=u'Importance Sampling'); 70 plt.hist(Ivmc, 30, color='r',histtype='step', label=u'Vanilla MC'); 71 plt.vlines(np.pi, 0, 100, color='g', linestyle='dashed') 72 plt.legend() 73 plt.show()
從圖中可以看出曲線sin(x)*x的形狀和正態分布曲線的形狀相近,因此在曲線峰值處的采樣點數目會比曲線上位置低的地方要多。精確計算的結果為pi,從上面的右圖中可以看出:兩種方法均計算定積分1000次,靠近精確值pi=3.1415處的結果最多,離精確值越遠數目越少,顯然這符合常規。但是采用傳統方法(紅色直方圖)計算出的積分值方的差明顯比采用重要抽樣法(藍色直方圖)要大。因此,采用重要抽樣法計算可以降低方差,提高精度。另外需要注意的是:關於函數f(x)的選擇會對計算結果的精度產生影響,當我們選擇的函數f(x)與g(x)相差較大時,計算結果的方差也會加大。
參考:
http://iacs-courses.seas.harvard.edu/courses/am207/blog/lecture-3.html