hourglassnet網絡解析


hourglassnet中文名稱是沙漏網絡,起初用於人體關鍵點檢測,代碼,https://github.com/bearpaw/pytorch-pose

后來被廣泛的應用到其他領域,我知道的有雙目深度估計,關於雙目深度估計,自己最近會寫一篇blog,這里先簡單介紹一下。雙目深度估計第一次用hourglassnet是在psmnet(https://github.com/JiaRenChang/PSMNet)中使用的的,后來的很多雙目深度估計的工作也有很多繼承這種hourglass的使用方法,比如gwcnet(https://github.com/xy-guo/GwcNet)

在這里就詳細解說一下hourglassnet的網絡結構,hourglassnet作者已經公開了代碼,這里參考這個代碼:https://github.com/bearpaw/pytorch-pose/blob/master/pose/models/hourglass.py

代碼如下

import torch.nn as nn
import torch.nn.functional as F
from tensorboardX import SummaryWriter
# from .preresnet import BasicBlock, Bottleneck
import torch
from torch.autograd import Variable

class Bottleneck(nn.Module):
    expansion = 2

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()

        self.bn1 = nn.BatchNorm2d(inplanes)
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=True)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=True)
        self.bn3 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 2, kernel_size=1, bias=True)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.bn1(x)
        out = self.relu(out)
        out = self.conv1(out)

        out = self.bn2(out)
        out = self.relu(out)
        out = self.conv2(out)

        out = self.bn3(out)
        out = self.relu(out)
        out = self.conv3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual

        return out

# houglass實際上是一個大的auto encoder
class Hourglass(nn.Module):
    def __init__(self, block, num_blocks, planes, depth):
        super(Hourglass, self).__init__()
        self.depth = depth
        self.block = block
        self.hg = self._make_hour_glass(block, num_blocks, planes, depth)

    def _make_residual(self, block, num_blocks, planes):
        layers = []
        for i in range(0, num_blocks):
            layers.append(block(planes*block.expansion, planes))
        return nn.Sequential(*layers)

    def _make_hour_glass(self, block, num_blocks, planes, depth):
        hg = []
        for i in range(depth):
            res = []
            for j in range(3):
                res.append(self._make_residual(block, num_blocks, planes))
            if i == 0:
                res.append(self._make_residual(block, num_blocks, planes))
            hg.append(nn.ModuleList(res))
        return nn.ModuleList(hg)

    def _hour_glass_forward(self, n, x):
        up1 = self.hg[n-1][0](x)
        low1 = F.max_pool2d(x, 2, stride=2)
        low1 = self.hg[n-1][1](low1)

        if n > 1:
            low2 = self._hour_glass_forward(n-1, low1)
        else:
            low2 = self.hg[n-1][3](low1)
        low3 = self.hg[n-1][2](low2)
        up2 = F.interpolate(low3, scale_factor=2)
        out = up1 + up2
        return out

    def forward(self, x):
        return self._hour_glass_forward(self.depth, x)


class HourglassNet(nn.Module):
    '''Hourglass model from Newell et al ECCV 2016'''
    def __init__(self, block, num_stacks=2, num_blocks=4, num_classes=16):
        super(HourglassNet, self).__init__()

        self.inplanes = 64
        self.num_feats = 128
        self.num_stacks = num_stacks
        self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3,
                               bias=True)
        self.bn1 = nn.BatchNorm2d(self.inplanes)
        self.relu = nn.ReLU(inplace=True)
        self.layer1 = self._make_residual(block, self.inplanes, 1)
        self.layer2 = self._make_residual(block, self.inplanes, 1)
        self.layer3 = self._make_residual(block, self.num_feats, 1)
        self.maxpool = nn.MaxPool2d(2, stride=2)

        # build hourglass modules
        ch = self.num_feats*block.expansion
        hg, res, fc, score, fc_, score_ = [], [], [], [], [], []
        for i in range(num_stacks):
            hg.append(Hourglass(block, num_blocks, self.num_feats, 4))
            res.append(self._make_residual(block, self.num_feats, num_blocks))
            fc.append(self._make_fc(ch, ch))
            score.append(nn.Conv2d(ch, num_classes, kernel_size=1, bias=True))
            if i < num_stacks-1:
                fc_.append(nn.Conv2d(ch, ch, kernel_size=1, bias=True))
                score_.append(nn.Conv2d(num_classes, ch, kernel_size=1, bias=True))
        self.hg = nn.ModuleList(hg)
        self.res = nn.ModuleList(res)
        self.fc = nn.ModuleList(fc)
        self.score = nn.ModuleList(score)
        self.fc_ = nn.ModuleList(fc_)
        self.score_ = nn.ModuleList(score_)

    def _make_residual(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=True),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def _make_fc(self, inplanes, outplanes):
        bn = nn.BatchNorm2d(inplanes)
        conv = nn.Conv2d(inplanes, outplanes, kernel_size=1, bias=True)
        return nn.Sequential(
                conv,
                bn,
                self.relu,
            )

    def forward(self, x):
        out = []
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)

        x = self.layer1(x)
        x = self.maxpool(x)
        x = self.layer2(x)
        x = self.layer3(x)

        for i in range(self.num_stacks):
            y = self.hg[i](x)
            y = self.res[i](y)
            y = self.fc[i](y)
            score = self.score[i](y)
            out.append(score)
            if i < self.num_stacks-1:
                fc_ = self.fc_[i](y)
                score_ = self.score_[i](score)
                x = x + fc_ + score_

        return out


if __name__ == "__main__":
    model = HourglassNet(Bottleneck, num_stacks=2, num_blocks=4, num_classes=2)
    model2 = Hourglass(block=Bottleneck, num_blocks=4, planes=128, depth=4)
    input_data = Variable(torch.rand(2, 3, 256, 256))
    input_data2 = Variable(torch.rand(2, 256, 64, 64))
    
    output = model(input_data)
    print(output)
    # writer = SummaryWriter(log_dir='../log', comment='source_arc')
    # with writer:
    #     writer.add_graph(model2, (input_data2, ))
View Code

這里一步一步講

以往的auto-ecoder最小的單元可能是一個卷積層,這里作者最小的單元是一個Bottleneck

作者先寫了hourglss這個module,hourglass具體的網絡結構如下,圖片有點兒大,可以右鍵在新窗口中打開高清圖片

 

為了區分我還是說明一下幾個概念,

bottleneck構成hourglass模塊

hourglass模塊以及其他模塊構成最后的hourglass net

bottle模塊代碼如下

class Bottleneck(nn.Module):
    expansion = 2

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()

        self.bn1 = nn.BatchNorm2d(inplanes)
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=True)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=True)
        self.bn3 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 2, kernel_size=1, bias=True)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.bn1(x)
        out = self.relu(out)
        out = self.conv1(out)

        out = self.bn2(out)
        out = self.relu(out)
        out = self.conv2(out)

        out = self.bn3(out)
        out = self.relu(out)
        out = self.conv3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual

        return out
View Code

hourglass模塊代碼如下

# houglass實際上是一個大的auto encoder
class Hourglass(nn.Module):
    def __init__(self, block, num_blocks, planes, depth):
        super(Hourglass, self).__init__()
        self.depth = depth
        self.block = block
        self.hg = self._make_hour_glass(block, num_blocks, planes, depth)

    def _make_residual(self, block, num_blocks, planes):
        layers = []
        for i in range(0, num_blocks):
            layers.append(block(planes*block.expansion, planes))
        return nn.Sequential(*layers)

    def _make_hour_glass(self, block, num_blocks, planes, depth):
        hg = []
        for i in range(depth):
            res = []
            for j in range(3):
                res.append(self._make_residual(block, num_blocks, planes))
            if i == 0:
                res.append(self._make_residual(block, num_blocks, planes))
            hg.append(nn.ModuleList(res))
        return nn.ModuleList(hg)

    def _hour_glass_forward(self, n, x):
        up1 = self.hg[n-1][0](x)
        low1 = F.max_pool2d(x, 2, stride=2)
        low1 = self.hg[n-1][1](low1)

        if n > 1:
            low2 = self._hour_glass_forward(n-1, low1)
        else:
            low2 = self.hg[n-1][3](low1)
        low3 = self.hg[n-1][2](low2)
        up2 = F.interpolate(low3, scale_factor=2)
        out = up1 + up2
        return out

    def forward(self, x):
        return self._hour_glass_forward(self.depth, x)
View Code

不僅僅是這里用到了bottleneck模塊,后面的整體網絡中也用到了此模塊

如上圖,bottleneck這個模塊作為一個基本的單元構成了hourglass模塊,可以看出網絡還是挺龐大的,中間用pool進行降維,之后用F.interpolate函數進行升維,F.interpolate有一個參數是縮放多少倍,代替了反卷積復雜的步驟,直接進行成倍縮放。關於這個函數和反卷積之間的區別,我也不是特別理解

這樣就基本上構成了一個大的auto-encoder,傳統意義上來說,比如說分割,或者是其他的dense prediction的任務,到這里就結束了,因為一個auto-encoder就能夠解決問題,但是作者不這樣做,作者把這個架構作為一個基本的單元進行疊加,還可以重復很多這樣的單元來提高精度,顯然顯存是一個很大的瓶頸,所以作者在實驗的時候只疊了兩層,見下圖

而在疊兩層之前,顯然需要對feature進行降維, 作者這里也是比較粗暴,用了三個大的layer,每個layer用4個基本的bottleneck,所以一共是12個bottleneck對圖像進行降維以及提取high-level的feature,這個作者也在paper說明了,因為關鍵點檢測依賴於高層次的語義信息,所以需要多加一些網絡層。

實際上到這里,網絡的參數已經少了,但是作者后面還跟了兩個hourglass結構,每個hourglass網絡結構后面跟一個輸出,如上圖的紅色部分,所以作者實際上有兩個輸出,相當與是對中間提前加上監督信息。為了保證所有的channel是一致的,需要用一個score_模塊進行通道的重新映射,然后和fc_得到的結果相加

上圖中的一個hourglass后面跟了一個res模塊,res模塊是由4個bottleneck組成,不太清楚作者這里為何還用一個res模塊

以及fc模塊進行通道融合,最后score模塊來保證正輸出的channel和ground truth是一樣的

大概就是這樣的

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM