Upsample(上采樣,插值)
Upsample
torch.nn.Upsample(size=None, scale_factor=None, mode='nearest', align_corners=None)
Upsamples a given multi-channel 1D (temporal), 2D (spatial) or 3D (volumetric) data.
對給定的多通道1D(時間)、2D(空間)或3D(容量)數據進行上采樣。
The input data is assumed to be of the form minibatch x channels x [optional depth] x [optional height] x width. Hence, for spatial inputs, we expect a 4D Tensor and for volumetric inputs, we expect a 5D Tensor.
輸入格式:
一維數據 [N, C, W]
二維數據 [N, C, H, W]
三維數據 [N, C, D, H, W]
The algorithms available for upsampling are nearest neighbor and linear, bilinear, bicubic and trilinear for 3D, 4D and 5D input Tensor, respectively.
Upsample可用的算法是最近鄰和線性,雙線性,雙三次和三線性插值算法。
One can either give a scale_factor or the target output size to calculate the output size. (You cannot give both, as it is ambiguous)
可以給出scale_factor或目標輸出大小來計算輸出大小(不能同時給出兩者)。
參數說明:
size: (int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int], optional) – output spatial sizes
scale_factor :(float or Tuple[float] or Tuple[float, float] or Tuple[float, float, float],optional) – multiplier for spatial size. Has to match input size if it is a tuple.
輸出尺寸可以指定size,也可以通過縮放的倍數指定。
mode: (str, optional) – the upsampling algorithm: one of 'nearest', 'linear', 'bilinear', 'bicubic' and 'trilinear'. Default: 'nearest'
align_corners (bool, optional) – if True, the corner pixels of the input and output tensors are aligned, and thus preserving the values at those pixels. This only has effect when mode is 'linear', 'bilinear', or 'trilinear'. Default: False
輸入輸出形狀:

注意:
在align_corners = True的情況下,線性插值模式(線性,雙線性,雙三次和三線性)不會按比例對齊輸出像素和輸入像素,因此輸出值可能取決於輸入大小。 這是這些模式(0.3.1版之前)的默認行為。 從那時起,默認行為是align_corners = False。 有關如何影響輸出的具體示例,請參見下文。
關於align_corners的解釋參考知乎這篇文章。
輸入為:

對它上采樣兩倍后,得到下圖:

首先介紹 align_corners=False,它是 pytorch 中 interpolate 的默認選項。這種設定下,我們認定像素值位於像素塊的中心。首先觀察綠色框內的像素,我們會發現它們嚴格遵守了 bilinear 的定義。而對於角上的四個點,其像素值保持了原圖的值。邊上的點則根據角點的值,進行了 bilinear 插值。所以,我們從全局來看,內部和邊緣處采用了比較不同的規則。

在 align_corners=True世界觀里,像素值位於網格上,如上圖所示:

這里仔細的讀者會發現,3*3 的圖像上采兩倍后,變成了 5*5。更廣泛地來講,對於輸入尺寸是 (2x+1) * (2x+1) 的圖片,其經過 align_corners=True 的上采樣后,尺寸變為 (4x+1) * (4x+1)。所以雖然內容上整齊了,外在的數目上,卻沒了那種 2 的整數次冪的美感。(注:這里指定了輸出為5x5)

- 當**align_corners = True**時,像素被視為網格的格子上的點,拐角處的像素對齊.可知是點之間是等間距的
- 當**align_corners = False**時, 像素被視為網格的交叉線上的點, 拐角處的點依然是原圖像的拐角像素,但是插值的點間卻按照上圖的取法取,導致點與點之間是不等距的
如果要下采樣/常規調整大小,則應使用interpolate()。
UpsamplingNearest2d
torch.nn.UpsamplingNearest2d(size=None, scale_factor=None)
UpsamplingBilinear2d
torch.nn.UpsamplingBilinear2d(size=None, scale_factor=None)
ConvTranspose(轉置卷積)
torch.nn.ConvTranspose1d
torch.nn.ConvTranspose1d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')
對由多個輸入平面組成的輸入圖像應用1D轉置卷積算子。
這個模塊可以看作是相對於其輸入的Conv1d的梯度。它也被稱為分數步卷積或反卷積、去卷積(盡管它不是一個實際的數學上的反卷積操作)。
Parameters:
in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1 控制交叉相關的步幅。
padding (int or tuple, optional) – dilation * (kernel_size - 1) - padding zero-padding will be added to both sides of the input. Default: 0 控制兩側的隱式零填充量dilation * (kernel_size - 1) - padding。 有關詳細信息,請參見下面的注釋。
output_padding (int or tuple, optional) – Additional size added to one side of the output shape. Default: 0 控制添加到輸出形狀一側的附加大小。詳情請參閱下面的說明。
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1 控制輸入和輸出之間的連接。 in_channels和out_channels必須都可以被groups整除。
bias (bool, optional) – If True, adds a learnable bias to the output. Default: True
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1 controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation does. 控制卷積核核點之間的間隔,也被稱為à trous算法。這很難描述,但是這個鏈接很好地展示了膨脹的作用。
注意:
Depending of the size of your kernel, several (of the last) columns of the input might be lost, because it is a valid cross-correlation, and not a full cross-correlation. It is up to the user to add proper padding.
根據卷積核的大小,輸入的(最后一列)幾列可能會丟失,因為它是有效的互相關,而不是完整的互相關。 用戶可以自行添加適當的填充。
The padding argument effectively adds dilation * (kernel_size - 1) - padding amount of zero padding to both sides of the input. This is set so that when a Conv1d and a ConvTranspose1d are initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, when stride > 1, Conv1d maps multiple input shapes to the same output shape. output_padding is provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note that output_padding is only used to find output shape, but does not actually add zero-padding to output.
padding參數有效地將diation *(kernel_size-1)-padding個零填充的填充量添加到輸入的兩側。 進行設置是為了使Conv1d和ConvTranspose1d用相同的參數初始化時,它們在輸入和輸出形狀方面彼此相反。 但是,當stride> 1時,Conv1d會將多個輸入形狀映射到相同的輸出形狀。 提供output_padding可以通過有效地增加一側的計算輸出形狀來解決這種歧義。 請注意,output_padding僅用於查找輸出形狀,但實際上並未向輸出添加零填充。
In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting torch.backends.cudnn.deterministic = True. Please see the notes on Reproducibility for background.
在某些情況下,將CUDA后端與CuDNN一起使用時,該運算符可能會選擇不確定的算法來提高性能。 如果不希望這樣做,則可以通過將torch.backends.cudnn.deterministic = True設置為確定性操作(可能以性能為代價)。 請參閱有關可重現性的說明作為背景。
輸入輸出形狀:
Input: (N, Cin, Lin)
Output: (N,Cout,Lout) where
Lout=(Lin−1)×stride−2×padding+dilation×(kernel_size−1)+output_padding+1
Variables
ConvTranspose1d.weight (Tensor)
ConvTranspose1d.bias (Tensor)
torch.nn.ConvTranspose2d
torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')
The parameters kernel_size, stride, padding, output_padding can either be:
-
a single
int– in which case the same value is used for the height and width dimensions 高寬兩個方向參數相同 -
a
tupleof two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension分別為高寬兩個方向指定參數
Shape:
- Input: (N,Cin,Hin,Win)
- Output: (N,Cout,Hout,Wout) where
Hout=(Hin−1)×stride[0]−2×padding[0]+dilation[0]×(kernel_size[0]−1)+output_padding[0]+1
Hout=(Hin−1)×stride[0]−2×padding[0]+dilation[0]×(kernel_size[0]−1)+output_padding[0]+1
torch.nn.ConvTranspose3d
torch.nn.ConvTranspose3d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')
ConvTranspose2d原理,深度網絡如何進行上采樣?
unpooling(上池化)
另一種上采樣的方法是,參考代碼:segnet_pytorch:
# Stage 5 x51 = F.relu(self.bn51(self.conv51(x4p))) x52 = F.relu(self.bn52(self.conv52(x51))) x53 = F.relu(self.bn53(self.conv53(x52))) #這個id5記錄的是池化操作時最大值的index,其要設置參數return_indices為True x5p, id5 = F.max_pool2d(x53,kernel_size=2, stride=2,return_indices=True) # Stage 5d #這個是進行最大值上采樣的函數,其是根據id5來把值放到什么位置,其它位置沒有值的地方 #補0 x5d = F.max_unpool2d(x5p, id5, kernel_size=2, stride=2) x53d = F.relu(self.bn53d(self.conv53d(x5d))) x52d = F.relu(self.bn52d(self.conv52d(x53d))) x51d = F.relu(self.bn51d(self.conv51d(x52d)))
測試:
#測試上采樣 m=nn.MaxPool2d((3,3),stride=(1,1),return_indices=True) upm=nn.MaxUnpool2d((3,3),stride=(1,1)) data4=torch.randn(1,1,3,3) output5,indices=m(data4) output6=upm(output5,indices) print('\ndata4:',data4, '\nmaxPool2d',output5, '\nindices:',indices, '\noutput6:',output6)
其輸出為:
data4: tensor([[[[ 2.3151, -1.0391, 0.1074], [ 1.9360, 0.2524, 2.3735], [-0.1151, 0.4684, -1.8800]]]]) maxPool2d tensor([[[[2.3735]]]]) indices: tensor([[[[5]]]]) output6: tensor([[[[0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 2.3735], [0.0000, 0.0000, 0.0000]]]])
通過轉置卷積或upsample+卷積的方式上采樣:
def upconv2x2(in_channels, out_channels, mode='transpose'): if mode == 'transpose': # 這個上采用需要設置其輸入通道,輸出通道.其中kernel_size、stride # 大小要跟對應下采樣設置的值一樣大小。這樣才可恢復到相同的wh。這里時反卷積 # 操作。 return nn.ConvTranspose2d( in_channels, out_channels, kernel_size=2, stride=2) else: # out_channels is always going to be the same # as in_channels # 這里不會改變通道數,其中scale_factor是上采用的放大因子,其是相對於當前的 # 輸入大小的倍數 return nn.Sequential( nn.Upsample(mode='bilinear', scale_factor=2, align_corners=True)) # 這里的代碼是在這里設置多一個卷積,這樣子就起到了可以修改其輸出通道的功能 # 了。 # 相當於功能跟ConvTranspose2d()差不多,只是上采樣的方法不同 conv1x1((in_channels, out_channels)) def conv1x1(in_channels, out_channels, groups=1): return nn.Sequential(nn.Conv2d( in_channels, out_channels, kernel_size=1, groups=groups, stride=1), nn.BatchNorm2d(out_channels))
PixelShuffle
在PyTorch中,上采樣的層被封裝在torch.nn中的Vision Layers里面,一共有4種:
- ① PixelShuffle
- ② Upsample
- ③ UpsamplingNearest2d
- ④ UpsamplingBilinear2d
該類定義如下:
class torch.nn.PixleShuffle(upscale_factor)
這里的upscale_factor就是放大的倍數,數據類型為int。
以四維輸入(N,C,H,W)為例,Pixelshuffle會將為(∗,r2C,H,W)的Tensor給reshape成(∗,C,rH,rW)的Tensor。形式化地說,它的輸入輸出的shape如下:
輸入:(N,C x upscale_factor2,H,W)
輸出:(N,C ,Hx upscale_factor,Wx upscale_factor)
相當於將特征圖轉化為圖片像素。
>>> ps = nn.PixelShuffle(3) >>> input = torch.tensor(1, 9, 4, 4) >>> output = ps(input) >>> print(output.size()) torch.Size([1, 1, 12, 12])
