Upsample(上采樣,插值)
Upsample
torch.nn.
Upsample
(size=None, scale_factor=None, mode='nearest', align_corners=None)
Upsamples a given multi-channel 1D (temporal), 2D (spatial) or 3D (volumetric) data.
對給定的多通道1D(時間)、2D(空間)或3D(容量)數據進行上采樣。
The input data is assumed to be of the form minibatch x channels x [optional depth] x [optional height] x width. Hence, for spatial inputs, we expect a 4D Tensor and for volumetric inputs, we expect a 5D Tensor.
輸入格式:
一維數據 [N, C, W]
二維數據 [N, C, H, W]
三維數據 [N, C, D, H, W]
The algorithms available for upsampling are nearest neighbor and linear, bilinear, bicubic and trilinear for 3D, 4D and 5D input Tensor, respectively.
Upsample可用的算法是最近鄰和線性,雙線性,雙三次和三線性插值算法。
One can either give a scale_factor or the target output size to calculate the output size. (You cannot give both, as it is ambiguous)
可以給出scale_factor或目標輸出大小來計算輸出大小(不能同時給出兩者)。
參數說明:
size: (int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int], optional) – output spatial sizes
scale_factor :(float or Tuple[float] or Tuple[float, float] or Tuple[float, float, float],optional) – multiplier for spatial size. Has to match input size if it is a tuple.
輸出尺寸可以指定size,也可以通過縮放的倍數指定。
mode: (str, optional) – the upsampling algorithm: one of 'nearest'
, 'linear'
, 'bilinear'
, 'bicubic'
and 'trilinear'
. Default: 'nearest'
align_corners (bool, optional) – if True
, the corner pixels of the input and output tensors are aligned, and thus preserving the values at those pixels. This only has effect when mode
is 'linear'
, 'bilinear'
, or 'trilinear'
. Default: False
輸入輸出形狀:
注意:
在align_corners = True的情況下,線性插值模式(線性,雙線性,雙三次和三線性)不會按比例對齊輸出像素和輸入像素,因此輸出值可能取決於輸入大小。 這是這些模式(0.3.1版之前)的默認行為。 從那時起,默認行為是align_corners = False。 有關如何影響輸出的具體示例,請參見下文。
關於align_corners的解釋參考知乎這篇文章。
輸入為:
對它上采樣兩倍后,得到下圖:
首先介紹 align_corners=False,它是 pytorch 中 interpolate 的默認選項。這種設定下,我們認定像素值位於像素塊的中心。首先觀察綠色框內的像素,我們會發現它們嚴格遵守了 bilinear 的定義。而對於角上的四個點,其像素值保持了原圖的值。邊上的點則根據角點的值,進行了 bilinear 插值。所以,我們從全局來看,內部和邊緣處采用了比較不同的規則。
在 align_corners=True世界觀里,像素值位於網格上,如上圖所示:
這里仔細的讀者會發現,3*3 的圖像上采兩倍后,變成了 5*5。更廣泛地來講,對於輸入尺寸是 (2x+1) * (2x+1) 的圖片,其經過 align_corners=True 的上采樣后,尺寸變為 (4x+1) * (4x+1)。所以雖然內容上整齊了,外在的數目上,卻沒了那種 2 的整數次冪的美感。(注:這里指定了輸出為5x5)
- 當**align_corners = True**時,像素被視為網格的格子上的點,拐角處的像素對齊.可知是點之間是等間距的
- 當**align_corners = False**時, 像素被視為網格的交叉線上的點, 拐角處的點依然是原圖像的拐角像素,但是插值的點間卻按照上圖的取法取,導致點與點之間是不等距的
如果要下采樣/常規調整大小,則應使用interpolate()。
UpsamplingNearest2d
torch.nn.
UpsamplingNearest2d
(size=None, scale_factor=None)
UpsamplingBilinear2d
torch.nn.
UpsamplingBilinear2d
(size=None, scale_factor=None)
ConvTranspose(轉置卷積)
torch.nn.
ConvTranspose1d
torch.nn.
ConvTranspose1d
(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')
對由多個輸入平面組成的輸入圖像應用1D轉置卷積算子。
這個模塊可以看作是相對於其輸入的Conv1d的梯度。它也被稱為分數步卷積或反卷積、去卷積(盡管它不是一個實際的數學上的反卷積操作)。
Parameters:
in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1 控制交叉相關的步幅。
padding (int or tuple, optional) – dilation * (kernel_size - 1) - padding
zero-padding will be added to both sides of the input. Default: 0 控制兩側的隱式零填充量dilation * (kernel_size - 1) - padding。 有關詳細信息,請參見下面的注釋。
output_padding (int or tuple, optional) – Additional size added to one side of the output shape. Default: 0 控制添加到輸出形狀一側的附加大小。詳情請參閱下面的說明。
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1 控制輸入和輸出之間的連接。 in_channels和out_channels必須都可以被groups整除。
bias (bool, optional) – If True
, adds a learnable bias to the output. Default: True
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1 controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation
does. 控制卷積核核點之間的間隔,也被稱為à trous算法。這很難描述,但是這個鏈接很好地展示了膨脹的作用。
注意:
Depending of the size of your kernel, several (of the last) columns of the input might be lost, because it is a valid cross-correlation, and not a full cross-correlation. It is up to the user to add proper padding.
根據卷積核的大小,輸入的(最后一列)幾列可能會丟失,因為它是有效的互相關,而不是完整的互相關。 用戶可以自行添加適當的填充。
The padding argument effectively adds dilation * (kernel_size - 1) - padding amount of zero padding to both sides of the input. This is set so that when a Conv1d and a ConvTranspose1d are initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, when stride > 1, Conv1d maps multiple input shapes to the same output shape. output_padding is provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note that output_padding is only used to find output shape, but does not actually add zero-padding to output.
padding參數有效地將diation *(kernel_size-1)-padding個零填充的填充量添加到輸入的兩側。 進行設置是為了使Conv1d和ConvTranspose1d用相同的參數初始化時,它們在輸入和輸出形狀方面彼此相反。 但是,當stride> 1時,Conv1d會將多個輸入形狀映射到相同的輸出形狀。 提供output_padding可以通過有效地增加一側的計算輸出形狀來解決這種歧義。 請注意,output_padding僅用於查找輸出形狀,但實際上並未向輸出添加零填充。
In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting torch.backends.cudnn.deterministic = True
. Please see the notes on Reproducibility for background.
在某些情況下,將CUDA后端與CuDNN一起使用時,該運算符可能會選擇不確定的算法來提高性能。 如果不希望這樣做,則可以通過將torch.backends.cudnn.deterministic = True設置為確定性操作(可能以性能為代價)。 請參閱有關可重現性的說明作為背景。
輸入輸出形狀:
Input: (N, Cin, Lin)
Output: (N,Cout,Lout) where
Lout=(Lin−1)×stride−2×padding+dilation×(kernel_size−1)+output_padding+1
Variables
ConvTranspose1d.weight (Tensor)
ConvTranspose1d.bias (Tensor)
torch.nn.
ConvTranspose2d
torch.nn.
ConvTranspose2d
(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')
The parameters kernel_size
, stride
, padding
, output_padding
can either be:
-
a single
int
– in which case the same value is used for the height and width dimensions 高寬兩個方向參數相同 -
a
tuple
of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension分別為高寬兩個方向指定參數
Shape:
- Input: (N,Cin,Hin,Win)
- Output: (N,Cout,Hout,Wout) where
Hout=(Hin−1)×stride[0]−2×padding[0]+dilation[0]×(kernel_size[0]−1)+output_padding[0]+1
Hout=(Hin−1)×stride[0]−2×padding[0]+dilation[0]×(kernel_size[0]−1)+output_padding[0]+1
torch.nn.
ConvTranspose3d
torch.nn.
ConvTranspose3d
(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')
ConvTranspose2d原理,深度網絡如何進行上采樣?
unpooling(上池化)
另一種上采樣的方法是,參考代碼:segnet_pytorch:
# Stage 5 x51 = F.relu(self.bn51(self.conv51(x4p))) x52 = F.relu(self.bn52(self.conv52(x51))) x53 = F.relu(self.bn53(self.conv53(x52))) #這個id5記錄的是池化操作時最大值的index,其要設置參數return_indices為True x5p, id5 = F.max_pool2d(x53,kernel_size=2, stride=2,return_indices=True) # Stage 5d #這個是進行最大值上采樣的函數,其是根據id5來把值放到什么位置,其它位置沒有值的地方 #補0 x5d = F.max_unpool2d(x5p, id5, kernel_size=2, stride=2) x53d = F.relu(self.bn53d(self.conv53d(x5d))) x52d = F.relu(self.bn52d(self.conv52d(x53d))) x51d = F.relu(self.bn51d(self.conv51d(x52d)))
測試:
#測試上采樣 m=nn.MaxPool2d((3,3),stride=(1,1),return_indices=True) upm=nn.MaxUnpool2d((3,3),stride=(1,1)) data4=torch.randn(1,1,3,3) output5,indices=m(data4) output6=upm(output5,indices) print('\ndata4:',data4, '\nmaxPool2d',output5, '\nindices:',indices, '\noutput6:',output6)
其輸出為:
data4: tensor([[[[ 2.3151, -1.0391, 0.1074], [ 1.9360, 0.2524, 2.3735], [-0.1151, 0.4684, -1.8800]]]]) maxPool2d tensor([[[[2.3735]]]]) indices: tensor([[[[5]]]]) output6: tensor([[[[0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 2.3735], [0.0000, 0.0000, 0.0000]]]])
通過轉置卷積或upsample+卷積的方式上采樣:
def upconv2x2(in_channels, out_channels, mode='transpose'): if mode == 'transpose': # 這個上采用需要設置其輸入通道,輸出通道.其中kernel_size、stride # 大小要跟對應下采樣設置的值一樣大小。這樣才可恢復到相同的wh。這里時反卷積 # 操作。 return nn.ConvTranspose2d( in_channels, out_channels, kernel_size=2, stride=2) else: # out_channels is always going to be the same # as in_channels # 這里不會改變通道數,其中scale_factor是上采用的放大因子,其是相對於當前的 # 輸入大小的倍數 return nn.Sequential( nn.Upsample(mode='bilinear', scale_factor=2, align_corners=True)) # 這里的代碼是在這里設置多一個卷積,這樣子就起到了可以修改其輸出通道的功能 # 了。 # 相當於功能跟ConvTranspose2d()差不多,只是上采樣的方法不同 conv1x1((in_channels, out_channels)) def conv1x1(in_channels, out_channels, groups=1): return nn.Sequential(nn.Conv2d( in_channels, out_channels, kernel_size=1, groups=groups, stride=1), nn.BatchNorm2d(out_channels))
PixelShuffle
在PyTorch中,上采樣的層被封裝在torch.nn
中的Vision Layers
里面,一共有4種:
- ① PixelShuffle
- ② Upsample
- ③ UpsamplingNearest2d
- ④ UpsamplingBilinear2d
該類定義如下:
class torch.nn.PixleShuffle(upscale_factor)
這里的upscale_factor就是放大的倍數,數據類型為int。
以四維輸入(N,C,H,W)為例,Pixelshuffle會將為(∗,r2C,H,W)的Tensor給reshape成(∗,C,rH,rW)的Tensor。形式化地說,它的輸入輸出的shape如下:
輸入:(N,C x upscale_factor2,H,W)
輸出:(N,C ,Hx upscale_factor,Wx upscale_factor)
相當於將特征圖轉化為圖片像素。
>>> ps = nn.PixelShuffle(3) >>> input = torch.tensor(1, 9, 4, 4) >>> output = ps(input) >>> print(output.size()) torch.Size([1, 1, 12, 12])