這個方法通過搭建一個神經網絡$ u_\theta(t,x)$來近似非線性偏微分方程的解$ u_(t,x)$,即$ u_\theta(t,x) \approx u(t,x)$,其中 $u_\theta :[0,T] \times \mathcal{D} \to \mathbb{R}$ 表示一個以$\theta$為參數的神經網絡來近似的函數.
The continuous time approach for the parabolic PDE as described in在文章 (Raissi et al., 2017 (Part I))中描述的針對拋物線形偏微分方程的連續時間方法是基於神經網絡近似函數$u_\theta \colon [0,T] \times \mathcal{D} \to \mathbb{R} $和解函數$u$的殘差來求解的.
$$ \begin{align} r_\theta (t,x) := \partial_t u_\theta (t,x) + \mathcal{N}[u_\theta] (t,x). \end{align} $$
為了將偏微分方程的殘差 $r_\theta$ 和損失函數合並起來進行最小化, PINNs 需要進一步微分來計算微分算子 $\partial_t u_\theta$ 和 $\mathcal{N}[u_\theta]$. 這也使得PINNs中的 $r_\theta$ 項和原網絡$u_\theta(t,x)$共享相同的參數,也可以滿足非線性偏微分方程中的隱含的物理信息. 這些求導可以依靠自動微分技術或者像Tensorflow或者Pytorch等機器學習庫實現.
求解初邊值問題的PINN方法現在通過最小化以下損失泛函來進行。
$$ \begin{align} \phi_\theta(X) := \phi_\theta^r(X^r) + \phi_\theta^0(X^0) + \phi_\theta^b(X^b), \end{align} $$
其中 $X$ 表示收集到的訓練集數據而損失函數$\phi_\theta$ 包括以下兩項:
- 殘差的均方誤差 $$ \begin{align*} \phi_\theta^r(X^r) := \frac{1}{N_r}\sum_{i=1}^{N_r} \left|r_\theta\left(t_i^r, x_i^r\right)\right|^2 \end{align*} $$ 通過在大量采樣點 $X^r:=\{(t_i^r, x_i^r)\}_{i=1}^{N_r} \subset (0,T] \times \mathcal{D}$ 上計算得到,其中 $r_\theta$ 就是物理知情神經網絡,
- 初邊界條件的均方誤差 $$ \begin{align*} \phi_\theta^0(X^0) := \frac{1}{N_0} \sum_{i=1}^{N_0} \left|u_\theta\left(t_i^0, x_i^0\right) - u_0\left(x_i^0\right)\right|^2 \quad \text{ and } \quad \phi_\theta^b(X^b) := \frac{1}{N_b} \sum_{i=1}^{N_b} \left|u_\theta\left(t_i^b, x_i^b\right) - u_b\left(t_i^b, x_i^b\right)\right|^2 \end{align*} $$ 通過采樣點 $X^0:=\{(t^0_i,x^0_i)\}_{i=1}^{N_0} \subset \{0\} \times \mathcal{D}$ 和 $X^b:=\{(t^b_i,x^b_i)\}_{i=1}^{N_b} \subset (0,T] \times \partial \mathcal{D}$ 計算獲得, 其中 $u_\theta$ 代表了神經網絡近似解 $u\colon[0,T] \times \mathcal{D} \to \mathbb{R}$.
需要注意的是,訓練數據 $X$ 完全由時空坐標構成.
示例: 博格斯方程(Burgers equation)
為了展示PINNs方法,我們考慮以下定義在$\mathcal{D} = [-1,1]$上的一維博格斯方程,
$$ \begin{aligned} \partial_t u + u \, \partial_x u - (0.01/\pi) \, \partial_{xx} u &= 0, \quad &&\quad (t,x) \in (0,1] \times (-1,1),\\ u(0,x) &= - \sin(\pi \, x), \quad &&\quad x \in [-1,1],\\ u(t,-1) = u(t,1) &= 0, \quad &&\quad t \in (0,1]. \end{aligned} $$這個方程經常應用於交通流、流體力學和氣體動力學並且可以從納維斯托克斯方程(Navier-Stokes equations)中導出, 詳見 (Basdevant et al., 1986).
1.導入依賴包,設定問題需要的特定數據
示例代碼運行版本為TensorFlow 2.3.0
. 具體實現主要依賴科學計算庫 NumPy 和機器學習庫 TensorFlow.
所有計算在i7 CPU (10th Gen),16 GByte DDR3 RAM (2.6 GHz) 上完成,大概用時幾分鍾.
# Import TensorFlow and NumPy import tensorflow as tf import numpy as np # Set data type DTYPE='float32' tf.keras.backend.set_floatx(DTYPE) # Set constants pi = tf.constant(np.pi, dtype=DTYPE) viscosity = .01/pi # Define initial condition def fun_u_0(x): return -tf.sin(pi * x) # Define boundary condition def fun_u_b(t, x): n = x.shape[0] return tf.zeros((n,1), dtype=DTYPE) # Define residual of the PDE def fun_r(t, x, u, u_t, u_x, u_xx): return u_t + u * u_x - viscosity * u_xx
2. 采集數據點
我們假設配置點 $X_r$ 以及初邊值條件點 $X_0$ and $X_b$ 都以均勻分布的方式產生. 雖然均勻分布的數據得到的結果已經足夠優秀, 但作者 (Raissi et al., 2017 (Part I)) 采用了一種空間填充拉丁超立方體抽樣策略(Stein, 1987). 我們的數值實驗表明這種抽樣策略的確對收斂效率的提升有一些促進作用,但為了簡化代碼,本文還是采用均勻分布的抽樣策略. 訓練數據量為 $N_0 = N_b =50$ , $N_f=10000$.
# Set number of data points N_0 = 50 N_b = 50 N_r = 10000 # Set boundary tmin = 0. tmax = 1. xmin = -1. xmax = 1. # Lower bounds lb = tf.constant([tmin, xmin], dtype=DTYPE) # Upper bounds ub = tf.constant([tmax, xmax], dtype=DTYPE) # Set random seed for reproducible results tf.random.set_seed(0) # Draw uniform sample points for initial boundary data t_0 = tf.ones((N_0,1), dtype=DTYPE)*lb[0] x_0 = tf.random.uniform((N_0,1), lb[1], ub[1], dtype=DTYPE) X_0 = tf.concat([t_0, x_0], axis=1) # Evaluate intitial condition at x_0 u_0 = fun_u_0(x_0) # Boundary data t_b = tf.random.uniform((N_b,1), lb[0], ub[0], dtype=DTYPE) x_b = lb[1] + (ub[1] - lb[1]) * tf.keras.backend.random_bernoulli((N_b,1), 0.5, dtype=DTYPE) X_b = tf.concat([t_b, x_b], axis=1) # Evaluate boundary condition at (t_b,x_b) u_b = fun_u_b(t_b, x_b) # Draw uniformly sampled collocation points t_r = tf.random.uniform((N_r,1), lb[0], ub[0], dtype=DTYPE) x_r = tf.random.uniform((N_r,1), lb[1], ub[1], dtype=DTYPE) X_r = tf.concat([t_r, x_r], axis=1) # Collect boundary and inital data in lists X_data = [X_0, X_b] u_data = [u_0, u_b]
接下來可以繪制出包括初邊界點在內的所有采樣點,紅色為采樣點,其他顏色為初值點和邊界點.
import matplotlib.pyplot as plt fig = plt.figure(figsize=(9,6)) plt.scatter(t_0, x_0, c=u_0, marker='X', vmin=-1, vmax=1) plt.scatter(t_b, x_b, c=u_b, marker='X', vmin=-1, vmax=1) plt.scatter(t_r, x_r, c='r', marker='.', alpha=0.1) plt.xlabel('$t$') plt.ylabel('$x$') plt.title('Positions of collocation points and boundary data'); #plt.savefig('Xdata_Burgers.jpg', bbox_inches='tight', dpi=300)
3. 開始搭建網絡
在這個示例中,參考文章 (Raissi et al., 2017 (Part I)),搭建了一個有如下前向傳播結構的神經網絡:
- 將輸入放縮到區間 $[-1,1]$,
- 共8個全連接層,每層20個神經元,每層包括一個雙曲正切激活函數,
- 一個全連接輸出層.
這樣設置產生的網絡具有$3021$可訓練參數(第一個隱藏層:$2\cdot 20+20=60$;七個中間層:每個$20\cdot 20+20=420$;輸出層: $20 \cdot 1 + 1 = 21$).
def init_model(num_hidden_layers=8, num_neurons_per_layer=20): # Initialize a feedforward neural network model = tf.keras.Sequential() # Input is two-dimensional (time + one spatial dimension) model.add(tf.keras.Input(2)) # Introduce a scaling layer to map input to [lb, ub] scaling_layer = tf.keras.layers.Lambda( lambda x: 2.0*(x - lb)/(ub - lb) - 1.0) model.add(scaling_layer) # Append hidden layers for _ in range(num_hidden_layers): model.add(tf.keras.layers.Dense(num_neurons_per_layer, activation=tf.keras.activations.get('tanh'), kernel_initializer='glorot_normal')) # Output is one-dimensional model.add(tf.keras.layers.Dense(1)) return model
4. 定義損失函數和梯度
下面的代碼定義了一個函數來計算偏微分方程在樣本點$X_r = \{(t^r_i,x^r_i)\}_{i=1}^{N_r}$處的殘差:
$$ \begin{align} r_\theta (t,x) := \partial_t u_\theta (t,x) + \mathcal{N}[u_\theta] (t,x). \end{align} $$
其中必要的求導運算步驟通過Tensorflow中的自動微分工具箱來實現。而對於博格斯方程,這需要計算$\partial_t u_\theta$, $\partial_x u_\theta$ 和 $\partial_{xx} u_\theta$.
在TensorFlow中,可以通過 GradientTape
實現,具體請參看文檔documentation,這可以持續跟蹤那些 watched
變量, 也就是 t
和 x,
以此計算導數.
def get_r(model, X_r): # A tf.GradientTape is used to compute derivatives in TensorFlow with tf.GradientTape(persistent=True) as tape: # Split t and x to compute partial derivatives t, x = X_r[:, 0:1], X_r[:,1:2] # Variables t and x are watched during tape # to compute derivatives u_t and u_x tape.watch(t) tape.watch(x) # Determine residual u = model(tf.stack([t[:,0], x[:,0]], axis=1)) # Compute gradient u_x within the GradientTape # since we need second derivatives u_x = tape.gradient(u, x) u_t = tape.gradient(u, t) u_xx = tape.gradient(u_x, x) del tape return fun_r(t, x, u, u_t, u_x, u_xx)
下一個函數用來計算問題的損失函數
$$ \begin{align} \phi_\theta(X) := \phi_\theta^r(X^r) + \phi_\theta^0(X^0) + \phi_\theta^b(X^b), \end{align} $$
這是一個訓練數據的函數. 其中,樣本點為 X_r,
初值和邊值條件的數據被保存在 X_data = [X_0, X_b]
和 u_data = [u_0, u_b]
中.
def compute_loss(model, X_r, X_data, u_data): # Compute phi^r r = get_r(model, X_r) phi_r = tf.reduce_mean(tf.square(r)) # Initialize loss loss = phi_r # Add phi^0 and phi^b to the loss for i in range(len(X_data)): u_pred = model(X_data[i]) loss += tf.reduce_mean(tf.square(u_data[i] - u_pred)) return loss
下面這個函數用於計算損失函數關於兩個未知量對應的梯度 $\phi_\theta$, 也就是TensorFlow中的 trainable_variables
, 也就是 $\nabla_\theta \phi_\theta$. 這也是通過 GradientTape
, 來實現的. 但這不會跟蹤模型中的參數 $\theta$ ,參數可以通過 model.trainable_variables
查看.
def get_grad(model, X_r, X_data, u_data): with tf.GradientTape(persistent=True) as tape: # This tape is for derivatives with # respect to trainable variables tape.watch(model.trainable_variables) loss = compute_loss(model, X_r, X_data, u_data) g = tape.gradient(loss, model.trainable_variables) del tape return loss, g
5. 設置優化器並訓練模型
初始化模型,並設置學習率.
$$ \delta(n) = 0.01 \, \textbf{1}_{\{n < 1000\}} + 0.001 \, \textbf{1}_{\{1000 \le n < 3000\}} + 0.0005 \, \textbf{1}_{\{3000 \le n\}} $$然后為模型設置好 tf.keras.optimizer
.
# Initialize model aka u_\theta model = init_model() # We choose a piecewise decay of the learning rate, i.e., the # step size in the gradient descent type algorithm # the first 1000 steps use a learning rate of 0.01 # from 1000 - 3000: learning rate = 0.001 # from 3000 onwards: learning rate = 0.0005 lr = tf.keras.optimizers.schedules.PiecewiseConstantDecay([1000,3000],[1e-2,1e-3,5e-4]) # Choose the optimizer optim = tf.keras.optimizers.Adam(learning_rate=lr)
訓練模型進行 $N=5000$ 次迭代 (大概用時3分鍾). 其中, train_step()
用於包括了每次迭代過程
注意: @tf.function
在Python中也被稱作 Decorator
. 這個Decorator將下面的函數重新定義為一個TensorFlow圖,這可能會顯著加快訓練. 在這個例子中 train_step
, 就是作為一個TensorFlow 圖,通過這種方式可以明顯提升訓練的速度.
from time import time # Define one training step as a TensorFlow function to increase speed of training @tf.function def train_step(): # Compute current loss and gradient w.r.t. parameters loss, grad_theta = get_grad(model, X_r, X_data, u_data) # Perform gradient descent step optim.apply_gradients(zip(grad_theta, model.trainable_variables)) return loss # Number of training epochs N = 5000 hist = [] # Start timer t0 = time() for i in range(N+1): loss = train_step() # Append current loss to hist hist.append(loss.numpy()) # Output current loss after 50 iterates if i%50 == 0: print('It {:05d}: loss = {:10.8e}'.format(i,loss)) # Print computation time print('\nComputation time: {} seconds'.format(time()-t0))
繪制求出的解函數圖像
from mpl_toolkits.mplot3d import Axes3D # Set up meshgrid N = 600 tspace = np.linspace(lb[0], ub[0], N + 1) xspace = np.linspace(lb[1], ub[1], N + 1) T, X = np.meshgrid(tspace, xspace) Xgrid = np.vstack([T.flatten(),X.flatten()]).T # Determine predictions of u(t, x) upred = model(tf.cast(Xgrid,DTYPE)) # Reshape upred U = upred.numpy().reshape(N+1,N+1) # Surface plot of solution u(t,x) fig = plt.figure(figsize=(9,6)) ax = fig.add_subplot(111, projection='3d') ax.plot_surface(T, X, U, cmap='viridis'); ax.view_init(35,35) ax.set_xlabel('$t$') ax.set_ylabel('$x$') ax.set_zlabel('$u_\\theta(t,x)$') ax.set_title('Solution of Burgers equation'); #plt.savefig('Burgers_Solution.jpg', bbox_inches='tight', dpi=300);
繪制收斂曲線
fig = plt.figure(figsize=(9,6)) ax = fig.add_subplot(111) ax.semilogy(range(len(hist)), hist,'k-') ax.set_xlabel('$n_{epoch}$') ax.set_ylabel('$\\phi_{n_{epoch}}$'); #plt.savefig('evolution of loss.jpg', bbox_inches='tight', dpi=300)