【前言】Drop Path是NAS中常用到的一種正則化方法,由於網絡訓練的過程中常常是動態的,Drop Path就成了一個不錯的正則化工具,在FractalNet、NASNet等都有廣泛使用。
Dropout
Dropout是最早的用於解決過擬合的方法,是所有drop類方法的大前輩。Dropout在12年被Hinton提出,並且在ImageNet Classification with Deep Convolutional Neural Network工作AlexNet中使用到了Dropout。
原理 :在前向傳播的時候,讓某個神經元激活以概率1-keep_prob(0<p<1)停止工作。
功能 : 這樣可以讓模型泛化能力更強,因為其不會過於以來某些局部的節點。訓練階段以keep_prob的概率保留,以1-keep_prob的概率關閉;測試階段所有的神經元都不關閉,但是對訓練階段應用了dropout的神經元,輸出值需要乘以keep_prob。
具體是這樣的:
假設一個神經元的輸出激活值為
a
,在不使用dropout的情況下,其輸出期望值為a
,如果使用了dropout,神經元就可能有保留和關閉兩種狀態,把它看作一個離散型隨機變量,它就符合概率論中的0-1分布,其輸出激活值的期望變為p*a+(1-p)*0=pa
,此時若要保持期望和不使用dropout時一致,就要除以p
。
作者:種子_fe
鏈接:https://www.imooc.com/article/30129
實現 : pytorch中的實現如下。
class _DropoutNd(Module):
__constants__ = ['p', 'inplace']
p: float
inplace: bool
def __init__(self, p: float = 0.5, inplace: bool = False) -> None:
super(_DropoutNd, self).__init__()
if p < 0 or p > 1:
raise ValueError("dropout probability has to be between 0 and 1, "
"but got {}".format(p))
self.p = p
self.inplace = inplace
def extra_repr(self) -> str:
return 'p={}, inplace={}'.format(self.p, self.inplace)
class Dropout(_DropoutNd):
def forward(self, input: Tensor) -> Tensor:
return F.dropout(input, self.p, self.training, self.inplace)
funtional.py中的dropout實現:
def dropout(input: Tensor, p: float = 0.5, training: bool = True, inplace: bool = False) -> Tensor:
r"""
During training, randomly zeroes some of the elements of the input
tensor with probability :attr:`p` using samples from a Bernoulli
distribution.
See :class:`~torch.nn.Dropout` for details.
Args:
p: probability of an element to be zeroed. Default: 0.5
training: apply dropout if is ``True``. Default: ``True``
inplace: If set to ``True``, will do this operation in-place. Default: ``False``
"""
if has_torch_function_unary(input):
return handle_torch_function(dropout, (input,), input, p=p, training=training, inplace=inplace)
if p < 0.0 or p > 1.0:
raise ValueError("dropout probability has to be between 0 and 1, " "but got {}".format(p))
return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training)
最終在Dropout.cpp中找到具體實現:
template<bool feature_dropout, bool alpha_dropout, bool inplace, typename T>
Ctype<inplace> _dropout_impl(T& input, double p, bool train) {
TORCH_CHECK(p >= 0 && p <= 1, "dropout probability has to be between 0 and 1, but got ", p);
if (p == 0 || !train || input.numel() == 0) {
return input;
}
if (p == 1) {
return multiply<inplace>(input, at::zeros({}, input.options()));
}
at::Tensor b; // used for alpha_dropout only
auto noise = feature_dropout ? make_feature_noise(input) : at::empty_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT);
noise.bernoulli_(1 - p);
if (alpha_dropout) {
constexpr double alpha = 1.7580993408473766;
double a = 1. / std::sqrt((alpha * alpha * p + 1) * (1 - p));
b = noise.add(-1).mul_(alpha * a).add_(alpha * a * p);
noise.mul_(a);
} else {
noise.div_(1 - p);
}
if (!alpha_dropout) {
return multiply<inplace>(input, noise);
} else {
return multiply<inplace>(input, noise).add_(b);
}
}
流程:
- 判斷p的范圍 以及訓練狀態
- 使用1-p的概率得到伯努利分布(0-1分布)
- (input / 1-p) * 伯努利分布
Drop Path
原理 :字如其名,Drop Path就是隨機將深度學習網絡中的多分支結構隨機刪除。
功能 :一般可以作為正則化手段加入網絡,但是會增加網絡訓練的難度。尤其是在NAS問題中,如果設置的drop prob過高,模型甚至有可能不收斂。
實現 :
def drop_path(x, drop_prob: float = 0., training: bool = False):
if drop_prob == 0. or not training:
return x
keep_prob = 1 - drop_prob
shape = (x.shape[0],) + (1,) * (x.ndim - 1) # work with diff dim tensors, not just 2D ConvNets
random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
random_tensor.floor_() # binarize
output = x.div(keep_prob) * random_tensor
return output
class DropPath(nn.Module):
"""Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
"""
def __init__(self, drop_prob=None):
super(DropPath, self).__init__()
self.drop_prob = drop_prob
def forward(self, x):
return drop_path(x, self.drop_prob, self.training)
有了Dropout的理論鋪墊,這里的實現就比較明了了,具體使用的時候一般是這樣的:
x = x + self.drop_path(self.conv(x))
Drop Path不能直接這樣使用:
x = self.drop_path(x)
Reference
https://www.cnblogs.com/dan-baishucaizi/p/14703263.html