这个方法通过搭建一个神经网络$ u_\theta(t,x)$来近似非线性偏微分方程的解$ u_(t,x)$,即$ u_\theta(t,x) \approx u(t,x)$,其中 $u_\theta :[0,T] \times \mathcal{D} \to \mathbb{R}$ 表示一个以$\theta$为参数的神经网络来近似的函数.
The continuous time approach for the parabolic PDE as described in在文章 (Raissi et al., 2017 (Part I))中描述的针对抛物线形偏微分方程的连续时间方法是基于神经网络近似函数$u_\theta \colon [0,T] \times \mathcal{D} \to \mathbb{R} $和解函数$u$的残差来求解的.
$$ \begin{align} r_\theta (t,x) := \partial_t u_\theta (t,x) + \mathcal{N}[u_\theta] (t,x). \end{align} $$
为了将偏微分方程的残差 $r_\theta$ 和损失函数合并起来进行最小化, PINNs 需要进一步微分来计算微分算子 $\partial_t u_\theta$ 和 $\mathcal{N}[u_\theta]$. 这也使得PINNs中的 $r_\theta$ 项和原网络$u_\theta(t,x)$共享相同的参数,也可以满足非线性偏微分方程中的隐含的物理信息. 这些求导可以依靠自动微分技术或者像Tensorflow或者Pytorch等机器学习库实现.
求解初边值问题的PINN方法现在通过最小化以下损失泛函来进行。
$$ \begin{align} \phi_\theta(X) := \phi_\theta^r(X^r) + \phi_\theta^0(X^0) + \phi_\theta^b(X^b), \end{align} $$
其中 $X$ 表示收集到的训练集数据而损失函数$\phi_\theta$ 包括以下两项:
- 残差的均方误差 $$ \begin{align*} \phi_\theta^r(X^r) := \frac{1}{N_r}\sum_{i=1}^{N_r} \left|r_\theta\left(t_i^r, x_i^r\right)\right|^2 \end{align*} $$ 通过在大量采样点 $X^r:=\{(t_i^r, x_i^r)\}_{i=1}^{N_r} \subset (0,T] \times \mathcal{D}$ 上计算得到,其中 $r_\theta$ 就是物理知情神经网络,
- 初边界条件的均方误差 $$ \begin{align*} \phi_\theta^0(X^0) := \frac{1}{N_0} \sum_{i=1}^{N_0} \left|u_\theta\left(t_i^0, x_i^0\right) - u_0\left(x_i^0\right)\right|^2 \quad \text{ and } \quad \phi_\theta^b(X^b) := \frac{1}{N_b} \sum_{i=1}^{N_b} \left|u_\theta\left(t_i^b, x_i^b\right) - u_b\left(t_i^b, x_i^b\right)\right|^2 \end{align*} $$ 通过采样点 $X^0:=\{(t^0_i,x^0_i)\}_{i=1}^{N_0} \subset \{0\} \times \mathcal{D}$ 和 $X^b:=\{(t^b_i,x^b_i)\}_{i=1}^{N_b} \subset (0,T] \times \partial \mathcal{D}$ 计算获得, 其中 $u_\theta$ 代表了神经网络近似解 $u\colon[0,T] \times \mathcal{D} \to \mathbb{R}$.
需要注意的是,训练数据 $X$ 完全由时空坐标构成.
示例: 博格斯方程(Burgers equation)
为了展示PINNs方法,我们考虑以下定义在$\mathcal{D} = [-1,1]$上的一维博格斯方程,
$$ \begin{aligned} \partial_t u + u \, \partial_x u - (0.01/\pi) \, \partial_{xx} u &= 0, \quad &&\quad (t,x) \in (0,1] \times (-1,1),\\ u(0,x) &= - \sin(\pi \, x), \quad &&\quad x \in [-1,1],\\ u(t,-1) = u(t,1) &= 0, \quad &&\quad t \in (0,1]. \end{aligned} $$这个方程经常应用于交通流、流体力学和气体动力学并且可以从纳维斯托克斯方程(Navier-Stokes equations)中导出, 详见 (Basdevant et al., 1986).
1.导入依赖包,设定问题需要的特定数据
示例代码运行版本为TensorFlow 2.3.0
. 具体实现主要依赖科学计算库 NumPy 和机器学习库 TensorFlow.
所有计算在i7 CPU (10th Gen),16 GByte DDR3 RAM (2.6 GHz) 上完成,大概用时几分钟.
# Import TensorFlow and NumPy import tensorflow as tf import numpy as np # Set data type DTYPE='float32' tf.keras.backend.set_floatx(DTYPE) # Set constants pi = tf.constant(np.pi, dtype=DTYPE) viscosity = .01/pi # Define initial condition def fun_u_0(x): return -tf.sin(pi * x) # Define boundary condition def fun_u_b(t, x): n = x.shape[0] return tf.zeros((n,1), dtype=DTYPE) # Define residual of the PDE def fun_r(t, x, u, u_t, u_x, u_xx): return u_t + u * u_x - viscosity * u_xx
2. 采集数据点
我们假设配置点 $X_r$ 以及初边值条件点 $X_0$ and $X_b$ 都以均匀分布的方式产生. 虽然均匀分布的数据得到的结果已经足够优秀, 但作者 (Raissi et al., 2017 (Part I)) 采用了一种空间填充拉丁超立方体抽样策略(Stein, 1987). 我们的数值实验表明这种抽样策略的确对收敛效率的提升有一些促进作用,但为了简化代码,本文还是采用均匀分布的抽样策略. 训练数据量为 $N_0 = N_b =50$ , $N_f=10000$.
# Set number of data points N_0 = 50 N_b = 50 N_r = 10000 # Set boundary tmin = 0. tmax = 1. xmin = -1. xmax = 1. # Lower bounds lb = tf.constant([tmin, xmin], dtype=DTYPE) # Upper bounds ub = tf.constant([tmax, xmax], dtype=DTYPE) # Set random seed for reproducible results tf.random.set_seed(0) # Draw uniform sample points for initial boundary data t_0 = tf.ones((N_0,1), dtype=DTYPE)*lb[0] x_0 = tf.random.uniform((N_0,1), lb[1], ub[1], dtype=DTYPE) X_0 = tf.concat([t_0, x_0], axis=1) # Evaluate intitial condition at x_0 u_0 = fun_u_0(x_0) # Boundary data t_b = tf.random.uniform((N_b,1), lb[0], ub[0], dtype=DTYPE) x_b = lb[1] + (ub[1] - lb[1]) * tf.keras.backend.random_bernoulli((N_b,1), 0.5, dtype=DTYPE) X_b = tf.concat([t_b, x_b], axis=1) # Evaluate boundary condition at (t_b,x_b) u_b = fun_u_b(t_b, x_b) # Draw uniformly sampled collocation points t_r = tf.random.uniform((N_r,1), lb[0], ub[0], dtype=DTYPE) x_r = tf.random.uniform((N_r,1), lb[1], ub[1], dtype=DTYPE) X_r = tf.concat([t_r, x_r], axis=1) # Collect boundary and inital data in lists X_data = [X_0, X_b] u_data = [u_0, u_b]
接下来可以绘制出包括初边界点在内的所有采样点,红色为采样点,其他颜色为初值点和边界点.
import matplotlib.pyplot as plt fig = plt.figure(figsize=(9,6)) plt.scatter(t_0, x_0, c=u_0, marker='X', vmin=-1, vmax=1) plt.scatter(t_b, x_b, c=u_b, marker='X', vmin=-1, vmax=1) plt.scatter(t_r, x_r, c='r', marker='.', alpha=0.1) plt.xlabel('$t$') plt.ylabel('$x$') plt.title('Positions of collocation points and boundary data'); #plt.savefig('Xdata_Burgers.jpg', bbox_inches='tight', dpi=300)
3. 开始搭建网络
在这个示例中,参考文章 (Raissi et al., 2017 (Part I)),搭建了一个有如下前向传播结构的神经网络:
- 将输入放缩到区间 $[-1,1]$,
- 共8个全连接层,每层20个神经元,每层包括一个双曲正切激活函数,
- 一个全连接输出层.
这样设置产生的网络具有$3021$可训练参数(第一个隐藏层:$2\cdot 20+20=60$;七个中间层:每个$20\cdot 20+20=420$;输出层: $20 \cdot 1 + 1 = 21$).
def init_model(num_hidden_layers=8, num_neurons_per_layer=20): # Initialize a feedforward neural network model = tf.keras.Sequential() # Input is two-dimensional (time + one spatial dimension) model.add(tf.keras.Input(2)) # Introduce a scaling layer to map input to [lb, ub] scaling_layer = tf.keras.layers.Lambda( lambda x: 2.0*(x - lb)/(ub - lb) - 1.0) model.add(scaling_layer) # Append hidden layers for _ in range(num_hidden_layers): model.add(tf.keras.layers.Dense(num_neurons_per_layer, activation=tf.keras.activations.get('tanh'), kernel_initializer='glorot_normal')) # Output is one-dimensional model.add(tf.keras.layers.Dense(1)) return model
4. 定义损失函数和梯度
下面的代码定义了一个函数来计算偏微分方程在样本点$X_r = \{(t^r_i,x^r_i)\}_{i=1}^{N_r}$处的残差:
$$ \begin{align} r_\theta (t,x) := \partial_t u_\theta (t,x) + \mathcal{N}[u_\theta] (t,x). \end{align} $$
其中必要的求导运算步骤通过Tensorflow中的自动微分工具箱来实现。而对于博格斯方程,这需要计算$\partial_t u_\theta$, $\partial_x u_\theta$ 和 $\partial_{xx} u_\theta$.
在TensorFlow中,可以通过 GradientTape
实现,具体请参看文档documentation,这可以持续跟踪那些 watched
变量, 也就是 t
和 x,
以此计算导数.
def get_r(model, X_r): # A tf.GradientTape is used to compute derivatives in TensorFlow with tf.GradientTape(persistent=True) as tape: # Split t and x to compute partial derivatives t, x = X_r[:, 0:1], X_r[:,1:2] # Variables t and x are watched during tape # to compute derivatives u_t and u_x tape.watch(t) tape.watch(x) # Determine residual u = model(tf.stack([t[:,0], x[:,0]], axis=1)) # Compute gradient u_x within the GradientTape # since we need second derivatives u_x = tape.gradient(u, x) u_t = tape.gradient(u, t) u_xx = tape.gradient(u_x, x) del tape return fun_r(t, x, u, u_t, u_x, u_xx)
下一个函数用来计算问题的损失函数
$$ \begin{align} \phi_\theta(X) := \phi_\theta^r(X^r) + \phi_\theta^0(X^0) + \phi_\theta^b(X^b), \end{align} $$
这是一个训练数据的函数. 其中,样本点为 X_r,
初值和边值条件的数据被保存在 X_data = [X_0, X_b]
和 u_data = [u_0, u_b]
中.
def compute_loss(model, X_r, X_data, u_data): # Compute phi^r r = get_r(model, X_r) phi_r = tf.reduce_mean(tf.square(r)) # Initialize loss loss = phi_r # Add phi^0 and phi^b to the loss for i in range(len(X_data)): u_pred = model(X_data[i]) loss += tf.reduce_mean(tf.square(u_data[i] - u_pred)) return loss
下面这个函数用于计算损失函数关于两个未知量对应的梯度 $\phi_\theta$, 也就是TensorFlow中的 trainable_variables
, 也就是 $\nabla_\theta \phi_\theta$. 这也是通过 GradientTape
, 来实现的. 但这不会跟踪模型中的参数 $\theta$ ,参数可以通过 model.trainable_variables
查看.
def get_grad(model, X_r, X_data, u_data): with tf.GradientTape(persistent=True) as tape: # This tape is for derivatives with # respect to trainable variables tape.watch(model.trainable_variables) loss = compute_loss(model, X_r, X_data, u_data) g = tape.gradient(loss, model.trainable_variables) del tape return loss, g
5. 设置优化器并训练模型
初始化模型,并设置学习率.
$$ \delta(n) = 0.01 \, \textbf{1}_{\{n < 1000\}} + 0.001 \, \textbf{1}_{\{1000 \le n < 3000\}} + 0.0005 \, \textbf{1}_{\{3000 \le n\}} $$然后为模型设置好 tf.keras.optimizer
.
# Initialize model aka u_\theta model = init_model() # We choose a piecewise decay of the learning rate, i.e., the # step size in the gradient descent type algorithm # the first 1000 steps use a learning rate of 0.01 # from 1000 - 3000: learning rate = 0.001 # from 3000 onwards: learning rate = 0.0005 lr = tf.keras.optimizers.schedules.PiecewiseConstantDecay([1000,3000],[1e-2,1e-3,5e-4]) # Choose the optimizer optim = tf.keras.optimizers.Adam(learning_rate=lr)
训练模型进行 $N=5000$ 次迭代 (大概用时3分钟). 其中, train_step()
用于包括了每次迭代过程
注意: @tf.function
在Python中也被称作 Decorator
. 这个Decorator将下面的函数重新定义为一个TensorFlow图,这可能会显著加快训练. 在这个例子中 train_step
, 就是作为一个TensorFlow 图,通过这种方式可以明显提升训练的速度.
from time import time # Define one training step as a TensorFlow function to increase speed of training @tf.function def train_step(): # Compute current loss and gradient w.r.t. parameters loss, grad_theta = get_grad(model, X_r, X_data, u_data) # Perform gradient descent step optim.apply_gradients(zip(grad_theta, model.trainable_variables)) return loss # Number of training epochs N = 5000 hist = [] # Start timer t0 = time() for i in range(N+1): loss = train_step() # Append current loss to hist hist.append(loss.numpy()) # Output current loss after 50 iterates if i%50 == 0: print('It {:05d}: loss = {:10.8e}'.format(i,loss)) # Print computation time print('\nComputation time: {} seconds'.format(time()-t0))
绘制求出的解函数图像
from mpl_toolkits.mplot3d import Axes3D # Set up meshgrid N = 600 tspace = np.linspace(lb[0], ub[0], N + 1) xspace = np.linspace(lb[1], ub[1], N + 1) T, X = np.meshgrid(tspace, xspace) Xgrid = np.vstack([T.flatten(),X.flatten()]).T # Determine predictions of u(t, x) upred = model(tf.cast(Xgrid,DTYPE)) # Reshape upred U = upred.numpy().reshape(N+1,N+1) # Surface plot of solution u(t,x) fig = plt.figure(figsize=(9,6)) ax = fig.add_subplot(111, projection='3d') ax.plot_surface(T, X, U, cmap='viridis'); ax.view_init(35,35) ax.set_xlabel('$t$') ax.set_ylabel('$x$') ax.set_zlabel('$u_\\theta(t,x)$') ax.set_title('Solution of Burgers equation'); #plt.savefig('Burgers_Solution.jpg', bbox_inches='tight', dpi=300);
绘制收敛曲线
fig = plt.figure(figsize=(9,6)) ax = fig.add_subplot(111) ax.semilogy(range(len(hist)), hist,'k-') ax.set_xlabel('$n_{epoch}$') ax.set_ylabel('$\\phi_{n_{epoch}}$'); #plt.savefig('evolution of loss.jpg', bbox_inches='tight', dpi=300)