項目筆記《DeepLung:Deep 3D Dual Path Nets for Automated Pulmonary Nodule Detection and Classification》(二)(上)模型設計


我只講講檢測部分的模型,后面兩樣性分類的試驗我沒有做,這篇論文采用了很多肺結節檢測論文都采用的u-net結構,准確地說是具有DPN結構的3D版本的u-net,直接上圖。

DPN是顏水成老師團隊的成果,簡單講就是dense 與 residual的結合,如上圖,輸入特征圖一部分通過residual與輸出相加,另一部分與residual的結果再串聯,個人覺得這個網絡好亂,不簡潔的網絡都不是好網絡,恰好文章中還給出了只采用residual的版本,所以我其實要講的是這個只有residual的u-net,上圖。

可以看到,輸入是96*96*96的立方體,里面包含標記的結節,經過24個3*3*3的卷積核,通道數變為24,然后經過4個stage,尺寸縮減為1/16,接下來是分辨率放大階段,采用反卷積實現,連續兩個階段都是反卷積后與低層特征串聯,然后經過兩個卷積操作,通道數變為15,圖示中以3*5顯示,是為了更清楚地表明,最后輸出的proposal中,每個位置有三個,分別采用三種尺寸,設置的三個anchor尺寸是[5,10,20],每個位置預測z,y,x,d,p分別是結節的三維坐標以及直徑,置信度。

下面看一下源碼,采用pytorch框架。

首先是residual block的設計,位於layers.py文件

class PostRes(nn.Module):
    def __init__(self, n_in, n_out, stride = 1):
        super(PostRes, self).__init__()
        self.conv1 = nn.Conv3d(n_in, n_out, kernel_size = 3, stride = stride, padding = 1)
        self.bn1 = nn.BatchNorm3d(n_out)
        self.relu = nn.ReLU(inplace = True)
        self.conv2 = nn.Conv3d(n_out, n_out, kernel_size = 3, padding = 1)
        self.bn2 = nn.BatchNorm3d(n_out)

        if stride != 1 or n_out != n_in:
            self.shortcut = nn.Sequential(
                nn.Conv3d(n_in, n_out, kernel_size = 1, stride = stride),
                nn.BatchNorm3d(n_out))
        else:
            self.shortcut = None

    def forward(self, x):
        residual = x
        if self.shortcut is not None:
            residual = self.shortcut(x)
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        
        out += residual
        out = self.relu(out)
        return out

 可以看到采用結構與2D的residual基本一致,采用的都是conv-bn-relu,根據步長和輸入輸出的尺寸,采用identity或1*1卷積作為skip connection。

然后就是網絡,位於res18.py文件

class Net(nn.Module):  
def __init__(self): super(Net, self).__init__() # The first few layers consumes the most memory, so use simple convolution to save memory. # Call these layers preBlock, i.e., before the residual blocks of later layers. self.preBlock = nn.Sequential( nn.Conv3d(1, 24, kernel_size = 3, padding = 1), nn.BatchNorm3d(24), nn.ReLU(inplace = True), nn.Conv3d(24, 24, kernel_size = 3, padding = 1), nn.BatchNorm3d(24), nn.ReLU(inplace = True)) # 3 poolings, each pooling downsamples the feature map by a factor 2. # 3 groups of blocks. The first block of each group has one pooling. num_blocks_forw = [2,2,3,3] num_blocks_back = [3,3] self.featureNum_forw = [24,32,64,64,64] self.featureNum_back = [128,64,64] for i in range(len(num_blocks_forw)): blocks = [] for j in range(num_blocks_forw[i]): if j == 0: blocks.append(PostRes(self.featureNum_forw[i], self.featureNum_forw[i+1])) else: blocks.append(PostRes(self.featureNum_forw[i+1], self.featureNum_forw[i+1])) setattr(self, 'forw' + str(i + 1), nn.Sequential(*blocks)) for i in range(len(num_blocks_back)): blocks = [] for j in range(num_blocks_back[i]): if j == 0: if i==0: addition = 3 else: addition = 0 blocks.append(PostRes(self.featureNum_back[i+1]+self.featureNum_forw[i+2]+addition, self.featureNum_back[i])) else: blocks.append(PostRes(self.featureNum_back[i], self.featureNum_back[i])) setattr(self, 'back' + str(i + 2), nn.Sequential(*blocks)) self.maxpool1 = nn.MaxPool3d(kernel_size=2,stride=2,return_indices =True) self.maxpool2 = nn.MaxPool3d(kernel_size=2,stride=2,return_indices =True) self.maxpool3 = nn.MaxPool3d(kernel_size=2,stride=2,return_indices =True) self.maxpool4 = nn.MaxPool3d(kernel_size=2,stride=2,return_indices =True) self.unmaxpool1 = nn.MaxUnpool3d(kernel_size=2,stride=2) self.unmaxpool2 = nn.MaxUnpool3d(kernel_size=2,stride=2) self.path1 = nn.Sequential( nn.ConvTranspose3d(64, 64, kernel_size = 2, stride = 2), nn.BatchNorm3d(64), nn.ReLU(inplace = True)) self.path2 = nn.Sequential( nn.ConvTranspose3d(64, 64, kernel_size = 2, stride = 2), nn.BatchNorm3d(64*k), nn.ReLU(inplace = True)) self.drop = nn.Dropout3d(p = 0.5, inplace = False) self.output = nn.Sequential(nn.Conv3d(self.featureNum_back[0], 64, kernel_size = 1), nn.ReLU(), #nn.Dropout3d(p = 0.3), nn.Conv3d(64, 5 * len(config['anchors']), kernel_size = 1)) def forward(self, x, coord): out = self.preBlock(x)#16 out_pool,indices0 = self.maxpool1(out) out1 = self.forw1(out_pool)#32 out1_pool,indices1 = self.maxpool2(out1) out2 = self.forw2(out1_pool)#64 #out2 = self.drop(out2) out2_pool,indices2 = self.maxpool3(out2) out3 = self.forw3(out2_pool)#64 out3_pool,indices3 = self.maxpool4(out3) out4 = self.forw4(out3_pool)#64 #out4 = self.drop(out4) rev3 = self.path1(out4) comb3 = self.back3(torch.cat((rev3, out3), 1))#64+64 #comb3 = self.drop(comb3) rev2 = self.path2(comb3) comb2 = self.back2(torch.cat((rev2, out2,coord), 1))#128 comb2 = self.drop(comb2) out = self.output(comb2) size = out.size() out = out.view(out.size(0), out.size(1), -1) #out = out.transpose(1, 4).transpose(1, 2).transpose(2, 3).contiguous() out = out.transpose(1, 2).contiguous().view(size[0], size[2], size[3], size[4], len(config['anchors']), 5) #out = out.view(-1, 5) return out

 看代碼的時候有個地方比較繞,就是forw模塊和back模塊的迭代實現,個人覺得還不如直接一個模塊一個模塊地寫出來,雖然多點代碼,但比較清晰。還有就是path模塊,其實就是反卷積模塊。

網絡結構就是這些,其實難點在loss的定義,以及標簽的映射,下面來看一下loss的定義,標簽映射以及數據增強部分待到(中)(下)部再講。

loss的定義采用的也是pytorch網絡的定義,位於layers.py文件。

上代碼。

class Loss(nn.Module):
    def __init__(self, num_hard = 0):
        super(Loss, self).__init__()
        self.sigmoid = nn.Sigmoid()
        self.classify_loss = nn.BCELoss() #二分類交叉熵損失
        self.regress_loss = nn.SmoothL1Loss() #平滑L1損失
        self.num_hard = num_hard #hardming 數目

    def forward(self, output, labels, train = True):
        batch_size = labels.size(0) #標簽的第0維度,樣本數
        output = output.view(-1, 5) #將輸出維度調整,以anchor為第二維度
        labels = labels.view(-1, 5) #將標簽維度對應調整,同上
        
        pos_idcs = labels[:, 0] > 0.5 #對標簽進行篩選,輸出為索引,示例[1,2,5]
        pos_idcs = pos_idcs.unsqueeze(1).expand(pos_idcs.size(0), 5) #對索引維度擴展,重復5次,示例[[1,1,1,1,1],[2,2,2,2,2],[5,5,5,5,5]]
        pos_output = output[pos_idcs].view(-1, 5) #篩選出與正標簽對應的輸出
        pos_labels = labels[pos_idcs].view(-1, 5) #篩選出正標簽

        neg_idcs = labels[:, 0] < -0.5 #同上,篩選負標簽索引,此處為負值
        neg_output = output[:, 0][neg_idcs] #注意,此處與上面不同,負標簽只考慮置信度即可,因為位置及直徑不計入損失,沒有意義
        neg_labels = labels[:, 0][neg_idcs]
        
        if self.num_hard > 0 and train:#判斷是否定義了,hardmining
            neg_output, neg_labels = hard_mining(neg_output, neg_labels, self.num_hard * batch_size) #只選擇置信度較高的負樣本作計算,對於易於分類的負樣本,都是蝦兵蟹將,不足慮
        neg_prob = self.sigmoid(neg_output)#對負樣本輸出進行sigmoid處理,生成0~1之間的值,符合置信度的范圍,可能大家要問輸出不就是0~1嗎,這里網絡最后沒有用sigmoid激活函數,所以最后輸出應該是沒有范圍的,
                         #這里我也比較不解,直接在網絡中加入sigmoid不就行了
        #classify_loss = self.classify_loss(
         #   torch.cat((pos_prob, neg_prob), 0),
          #  torch.cat((pos_labels[:, 0], neg_labels + 1), 0))
        if len(pos_output)>0:
            pos_prob = self.sigmoid(pos_output[:, 0]) #對正樣本進行sigmoid處理
            pz, ph, pw, pd = pos_output[:, 1], pos_output[:, 2], pos_output[:, 3], pos_output[:, 4] #依次輸出z,h,w,d以便與標簽結合求損失
            lz, lh, lw, ld = pos_labels[:, 1], pos_labels[:, 2], pos_labels[:, 3], pos_labels[:, 4] #依次輸出z,h,w,d以便與輸出結合求損失

            regress_losses = [              #回歸損失
                self.regress_loss(pz, lz),
                self.regress_loss(ph, lh),
                self.regress_loss(pw, lw),
                self.regress_loss(pd, ld)]
            regress_losses_data = [l.data[0] for l in regress_losses]
            classify_loss = 0.5 * self.classify_loss(                #對正樣本和負樣本分別求分類損失
            pos_prob, pos_labels[:, 0]) + 0.5 * self.classify_loss(
            neg_prob, neg_labels + 1)
            pos_correct = (pos_prob.data >= 0.5).sum() #那些輸出確實大於0.5的正樣本是正確預測的正樣本
            pos_total = len(pos_prob) #正樣本總數

        else: #如果沒有正標簽,由於負標簽又不用計算回歸損失,於是回歸損失就置零了,分類損失只計算負標簽的分類損失
            regress_losses = [0,0,0,0]
            classify_loss =  0.5 * self.classify_loss(
            neg_prob, neg_labels + 1)
            pos_correct = 0 #此時沒有正樣本或正標簽
            pos_total = 0 #總數也為0
            regress_losses_data = [0,0,0,0]
        classify_loss_data = classify_loss.data[0]

        #loss = classify_loss#pytorch 0.4 
        loss = classify_loss.clone()
        for regress_loss in regress_losses: #將回歸損失與分類損失相加,求出總損失(標量)
            loss += regress_loss

        neg_correct = (neg_prob.data < 0.5).sum() #那些輸出確實低於0.5的負樣本是正確預測的負樣本
        neg_total = len(neg_prob) #負樣本總數

        return [loss, classify_loss_data] + regress_losses_data + [pos_correct, pos_total, neg_correct, neg_total]

 對於損失的解釋都在代碼旁邊的注釋了,只是有一點不大明白,求負樣本損失的時候為何要把置信度加1?,應該是負標簽在打標簽的時候置為-1了,由此又想到一個問題,那些既非正也非負的樣本的置信度是如何設置的,應該不是隨機設置的,難道設為0了?

在(中)里面,我想把標簽映射以及數據增強,講一下,奈何自己還不太懂,等等吧,如果(中)完成,在(下)里簡單說一說訓練以及驗證,以及測試,這些都完成,那么deeplung筆記三部曲連在一起就完整了。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM