6.自定義損失函數

本文轉載自查看原文 2022-03-19 22:24 947

6.1 自定義損失函數

torch.nn模塊常用的損失函數：MSELoss，L1Loss，BCELoss......

非官方Loss：DiceLoss，HuberLoss，SobolevLoss...... 這些Loss Function專門針對一些非通用的模型，PyTorch不能將他們全部添加到庫中去，這些損失函數的實現需要自定義損失函數。

掌握如何自定義損失函數

6.1.1 以函數方式定義

如下所示：

def my_loss(output, target):
    loss = torch.mean((output - target)**2)
    return loss

6.1.2 以類方式定義

以類方式定義更加常用，在以類方式定義損失函數時，我們如果看每一個損失函數的繼承關系我們就可以發現Loss函數部分繼承自_loss, 部分繼承自_WeightedLoss, 而_WeightedLoss繼承自_loss， _loss繼承自 nn.Module。

我們可以將其當作神經網絡的一層來對待，同樣地，我們的損失函數類就需要繼承自nn.Module類。

以DiceLoss為例：

Dice Loss是一種在分割領域常見的損失函數，定義如下：

\[DSC = \frac{2|X∩Y|}{|X|+|Y|} \]

實現代碼如下：

class DiceLoss(nn.Module):
    def __init__(self,weight=None,size_average=True):
        super(DiceLoss,self).__init__()
        
	def forward(self,inputs,targets,smooth=1):
        inputs = F.sigmoid(inputs)       
        inputs = inputs.view(-1)
        targets = targets.view(-1)
        intersection = (inputs * targets).sum()                   
        dice = (2.*intersection + smooth)/(inputs.sum() + targets.sum() + smooth)  
        return 1 - dice

# 使用方法    
criterion = DiceLoss()
loss = criterion(input,targets)

除此之外，常見的損失函數還有BCE-Dice Loss，Jaccard/Intersection over Union (IoU) Loss，Focal Loss......

class DiceBCELoss(nn.Module):
    def __init__(self, weight=None, size_average=True):
        super(DiceBCELoss, self).__init__()

    def forward(self, inputs, targets, smooth=1):
        inputs = F.sigmoid(inputs)       
        inputs = inputs.view(-1)
        targets = targets.view(-1)
        intersection = (inputs * targets).sum()                     
        dice_loss = 1 - (2.*intersection + smooth)/(inputs.sum() + targets.sum() + smooth)  
        BCE = F.binary_cross_entropy(inputs, targets, reduction='mean')
        Dice_BCE = BCE + dice_loss
        
        return Dice_BCE
--------------------------------------------------------------------
    
class IoULoss(nn.Module):
    def __init__(self, weight=None, size_average=True):
        super(IoULoss, self).__init__()

    def forward(self, inputs, targets, smooth=1):
        inputs = F.sigmoid(inputs)       
        inputs = inputs.view(-1)
        targets = targets.view(-1)
        intersection = (inputs * targets).sum()
        total = (inputs + targets).sum()
        union = total - intersection 
        
        IoU = (intersection + smooth)/(union + smooth)
                
        return 1 - IoU
--------------------------------------------------------------------
    
ALPHA = 0.8
GAMMA = 2

class FocalLoss(nn.Module):
    def __init__(self, weight=None, size_average=True):
        super(FocalLoss, self).__init__()

    def forward(self, inputs, targets, alpha=ALPHA, gamma=GAMMA, smooth=1):
        inputs = F.sigmoid(inputs)       
        inputs = inputs.view(-1)
        targets = targets.view(-1)
        BCE = F.binary_cross_entropy(inputs, targets, reduction='mean')
        BCE_EXP = torch.exp(-BCE)
        focal_loss = alpha * (1-BCE_EXP)**gamma * BCE
                       
        return focal_loss
# 更多的可以參考鏈接1

注：

在自定義損失函數時，涉及到數學運算時，我們最好全程使用PyTorch提供的張量計算接口，這樣就不需要我們實現自動求導功能並且我們可以直接調用cuda，使用numpy或者scipy的數學運算時，操作會有些麻煩

6.2 動態調整學習率

通過一個適當的學習率衰減策略來提高我們的精度。這種設置方式在PyTorch中被稱為scheduler

如何根據需要選取已有的學習率調整策略
如何自定義設置學習調整策略並實現

6.2.1 使用官方scheduler

了解官方提供的API

在訓練神經網絡的過程中，學習率是最重要的超參數之一，作為當前較為流行的深度學習框架，PyTorch已經在torch.optim.lr_scheduler為我們封裝好了一些動態調整學習率的方法供我們使用，如下面列出的這些scheduler。

使用官方API

關於如何使用這些動態調整學習率的策略，PyTorch官方也很人性化的給出了使用實例代碼幫助大家理解，我們也將結合官方給出的代碼來進行解釋。

# 選擇一種優化器
optimizer = torch.optim.Adam(...) 
# 選擇上面提到的一種或多種動態調整學習率的方法
scheduler1 = torch.optim.lr_scheduler.... 
scheduler2 = torch.optim.lr_scheduler....
...
schedulern = torch.optim.lr_scheduler....
# 進行訓練
for epoch in range(100):
    train(...)
    validate(...)
    optimizer.step()
    # 需要在優化器參數更新之后再動態調整學習率
	scheduler1.step() 
	...
    schedulern.step()

注：

我們在使用官方給出的torch.optim.lr_scheduler時，需要將scheduler.step()放在optimizer.step()后面進行使用。

6.2.2 自定義scheduler

自定義函數adjust_learning_rate來改變param_group中lr的值

假設我們需要學習率每30輪下降為原來的1/10

def adjust_learning_rate(optimizer, epoch):
    lr = args.lr * (0.1 ** (epoch // 30))
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

有了adjust_learning_rate函數的定義，在訓練的過程就可以調用我們的函數來實現學習率的動態變化

def adjust_learning_rate(optimizer,...):
    ...
optimizer = torch.optim.SGD(model.parameters(),lr = args.lr,momentum = 0.9)
for epoch in range(10):
    train(...)
    validate(...)
    adjust_learning_rate(optimizer,epoch)

6.3 模型微調

遷移學習的一大應用場景是模型微調（finetune）。簡單來說，就是我們先找到一個同類的別人訓練好的模型，把別人現成的訓練好了的模型拿過來，換成自己的數據，通過訓練調整一下參數。在PyTorch中提供了許多預訓練好的網絡模型（VGG，ResNet系列，mobilenet系列......），這些模型都是PyTorch官方在相應的大型數據集訓練好的。學習如何進行模型微調，可以方便我們快速使用預訓練模型完成自己的任務。

掌握模型微調的流程
了解PyTorch提供的常用model
掌握如何指定訓練模型的部分層

6.3.1 模型微調的流程

在源數據集(如ImageNet數據集)上預訓練一個神經網絡模型，即源模型。
創建一個新的神經網絡模型，即目標模型。它復制了源模型上除了輸出層外的所有模型設計及其參數。我們假設這些模型參數包含了源數據集上學習到的知識，且這些知識同樣適用於目標數據集。我們還假設源模型的輸出層跟源數據集的標簽緊密相關，因此在目標模型中不予采用。
為目標模型添加一個輸出⼤小為⽬標數據集類別個數的輸出層，並隨機初始化該層的模型參數。
在目標數據集上訓練目標模型。我們將從頭訓練輸出層，而其余層的參數都是基於源模型的參數微調得到的。

![finetune](/Users/whn/Desktop/thorough-pytorch-main/第六章 PyTorch進階訓練技巧/figures/finetune.png)

6.3.2 使用已有模型結構

這里我們以torchvision中的常見模型為例，列出了如何在圖像分類任務中使用PyTorch提供的常見模型結構和參數。對於其他任務和網絡結構，使用方式是類似的：

實例化網絡

import torchvision.models as models
resnet18 = models.resnet18()
# resnet18 = models.resnet18(pretrained=False)  等價於與上面的表達式
alexnet = models.alexnet()
vgg16 = models.vgg16()
squeezenet = models.squeezenet1_0()
densenet = models.densenet161()
inception = models.inception_v3()
googlenet = models.googlenet()
shufflenet = models.shufflenet_v2_x1_0()
mobilenet_v2 = models.mobilenet_v2()
mobilenet_v3_large = models.mobilenet_v3_large()
mobilenet_v3_small = models.mobilenet_v3_small()
resnext50_32x4d = models.resnext50_32x4d()
wide_resnet50_2 = models.wide_resnet50_2()
mnasnet = models.mnasnet1_0()

傳遞pretrained參數

通過True或者False來決定是否使用預訓練好的權重，在默認狀態下pretrained = False，意味着我們不使用預訓練得到的權重，當pretrained = True，意味着我們將使用在一些數據集上預訓練得到的權重。

import torchvision.models as models
resnet18 = models.resnet18(pretrained=True)
alexnet = models.alexnet(pretrained=True)
squeezenet = models.squeezenet1_0(pretrained=True)
vgg16 = models.vgg16(pretrained=True)
densenet = models.densenet161(pretrained=True)
inception = models.inception_v3(pretrained=True)
googlenet = models.googlenet(pretrained=True)
shufflenet = models.shufflenet_v2_x1_0(pretrained=True)
mobilenet_v2 = models.mobilenet_v2(pretrained=True)
mobilenet_v3_large = models.mobilenet_v3_large(pretrained=True)
mobilenet_v3_small = models.mobilenet_v3_small(pretrained=True)
resnext50_32x4d = models.resnext50_32x4d(pretrained=True)
wide_resnet50_2 = models.wide_resnet50_2(pretrained=True)
mnasnet = models.mnasnet1_0(pretrained=True)

注意事項：

通常PyTorch模型的擴展為.pt或.pth，程序運行時會首先檢查默認路徑中是否有已經下載的模型權重，一旦權重被下載，下次加載就不需要下載了。
一般情況下預訓練模型的下載會比較慢，我們可以直接通過迅雷或者其他方式去這里查看自己的模型里面model_urls，然后手動下載，預訓練模型的權重在Linux和Mac的默認下載路徑是用戶根目錄下的.cache文件夾。在Windows下就是C:\Users\<username>\.cache\torch\hub\checkpoint。我們可以通過使用 torch.utils.model_zoo.load_url()設置權重的下載地址。

如果覺得麻煩，還可以將自己的權重下載下來放到同文件夾下，然后再將參數加載網絡。

self.model = models.resnet50(pretrained=False)
self.model.load_state_dict(torch.load('./model/resnet50-19c8e357.pth'))

如果中途強行停止下載的話，一定要去對應路徑下將權重文件刪除干凈，要不然可能會報錯。

6.3.3 訓練特定層

在默認情況下，參數的屬性.requires_grad = True，如果我們從頭開始訓練或微調不需要注意這里。但如果我們正在提取特征並且只想為新初始化的層計算梯度，其他參數不進行改變。那我們就需要通過設置requires_grad = False來凍結部分層。在PyTorch官方中提供了這樣一個例程。

def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False

在下面我們仍舊使用resnet18為例的將1000類改為4類，但是僅改變最后一層的模型參數，不改變特征提取的模型參數；注意我們先凍結模型參數的梯度，再對模型輸出部分的全連接層進行修改，這樣修改后的全連接層的參數就是可計算梯度的。

import torchvision.models as models
# 凍結參數的梯度
feature_extract = True
model = models.resnet18(pretrained=True)
set_parameter_requires_grad(model, feature_extract)
# 修改模型
num_ftrs = model.fc.in_features
model.fc = nn.Linear(in_features=512, out_features=4, bias=True)

之后在訓練過程中，model仍會進行梯度回傳，但是參數更新則只會發生在fc層。通過設定參數的requires_grad屬性，我們完成了指定訓練模型的特定層的目標，這對實現模型微調非常重要。

6.4 半精度訓練

我們觀察PyTorch默認的浮點數存儲方式用的是torch.float32，小數點后位數更多固然能保證數據的精確性，但絕大多數場景其實並不需要這么精確，只保留一半的信息也不會影響結果，也就是使用torch.float16格式。由於數位減了一半，因此被稱為“半精度”，具體如下圖：

![amp](/Users/whn/Desktop/thorough-pytorch-main/第六章 PyTorch進階訓練技巧/figures/float16.jpg)

顯然半精度能夠減少顯存占用，使得顯卡可以同時加載更多數據進行計算。本節會介紹如何在PyTorch中設置使用半精度計算。

如何在PyTorch中設置半精度訓練
使用半精度訓練的注意事項

6.4.1 半精度訓練的設置

在PyTorch中使用autocast配置半精度訓練，同時需要在下面三處加以設置：

import autocast

from torch.cuda.amp import autocast

模型設置

在模型定義中，使用python的裝飾器方法，用autocast裝飾模型中的forward函數。關於裝飾器的使用，可以參考這里：

@autocast()   
def forward(self, x):
    ...
    return x

訓練過程

在訓練過程中，只需在將數據輸入模型及其之后的部分放入“with autocast():“即可：

 for x in train_loader:
	x = x.cuda()
	with autocast():
        output = model(x)
        ...

注意：

半精度訓練主要適用於數據本身的size比較大（比如說3D圖像、視頻等）。當數據本身的size並不大時（比如手寫數字MNIST數據集的圖片尺寸只有28*28），使用半精度訓練則可能不會帶來顯著的提升。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 自定義損失函數自定義損失函數度量函數 Tensorflow 損失函數（loss function）及自定義損失函數（三） keras 中如何自定義損失函數 tensorflow 自定義損失函數示例 tensorflow2自定義損失函數 MindSpore自定義模型損失函數 pytorch自定義網絡層以及損失函數機器學習之路： tensorflow 自定義損失函數 tensorflow2 自定義損失函數使用的隱藏坑