怎么僅加載一部分內容的預訓練模型參數

本文轉載自查看原文 2019-07-24 10:24 1514 PyTorch

在pytorch中提供了很多預訓練好的模型，以分類為例，基本上都是用ImageNet數據集來訓練的，分為1000類。

但是很多時候我們要實現的分類項目可能並沒有這么簡單，比如我們可能並不僅僅只是實現單分類，可能想實現雙分類或者是多分類，這個時候就需要對模型進行一定的修改

修改的同時還希望該修改后的模型中與預訓練模型相同的部分仍能夠使用預訓練的參數來初始化，這時候應該怎么做？

1.單分類

這是最簡單的情況，就是將1000類更改為自己想要分的類別數即可。比如你想要對性別分類，分兩類，使用pytorch中的預訓練模型resnet18

#coding:utf-8
import torch
from torchvision import  models
from torch import nn

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# 然后選擇使用的模型
model_conv = models.resnet18(pretrained=True)

# resnet18僅有一個全連接層
# 得到該全連接層輸入神經元數.in_features
fc_features = model_conv.fc.in_features

# 默認的輸出神經元數為1000
# 這里修改為自己想進行的二分類，類別為2,即man和woman
model_conv.fc = nn.Linear(fc_features, 2)
model_conv.to(device)

這樣模型就設置成功了

2.雙分類或多分類

這里以雙分類為例，在上面的單分類中，我們僅是在原有的模型上修改了參數值，並沒有改變整個模型的結構

但是單我們要實現雙分類，如同時進行性別和人種分類，這個時候就需要在原來代碼的基礎上添加一些新的層，構造一個新的模型

如下面代碼：

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1):
    """3x3 convolution with padding"""
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=dilation, groups=groups, bias=False, dilation=dilation)


def conv1x1(in_planes, out_planes, stride=1):
    """1x1 convolution"""
    return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)


class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
                 base_width=64, dilation=1, norm_layer=None):
        super(BasicBlock, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if groups != 1 or base_width != 64:
            raise ValueError('BasicBlock only supports groups=1 and base_width=64')
        if dilation > 1:
            raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
        # Both self.conv1 and self.downsample layers downsample the input when stride != 1
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = norm_layer(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = norm_layer(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
                 base_width=64, dilation=1, norm_layer=None):
        super(Bottleneck, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        width = int(planes * (base_width / 64.)) * groups
        # Both self.conv2 and self.downsample layers downsample the input when stride != 1
        self.conv1 = conv1x1(inplanes, width)
        self.bn1 = norm_layer(width)
        self.conv2 = conv3x3(width, width, stride, groups, dilation)
        self.bn2 = norm_layer(width)
        self.conv3 = conv1x1(width, planes * self.expansion)
        self.bn3 = norm_layer(planes * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class ResNet(nn.Module):

    def __init__(self, block, layers, zero_init_residual=False,
                 groups=1, width_per_group=64, replace_stride_with_dilation=None,
                 norm_layer=None ,gender_classes=2, race_classes=4):
        super(ResNet, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        self._norm_layer = norm_layer

        self.inplanes = 64
        self.dilation = 1
        if replace_stride_with_dilation is None:
            # each element in the tuple indicates if we should replace
            # the 2x2 stride with a dilated convolution instead
            replace_stride_with_dilation = [False, False, False]
        if len(replace_stride_with_dilation) != 3:
            raise ValueError("replace_stride_with_dilation should be None "
                             "or a 3-element tuple, got {}".format(replace_stride_with_dilation))
        self.groups = groups
        self.base_width = width_per_group
        self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = norm_layer(self.inplanes)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
                                       dilate=replace_stride_with_dilation[0])
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
                                       dilate=replace_stride_with_dilation[1])
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
                                       dilate=replace_stride_with_dilation[2])
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        # 注釋掉之前的全連接層
        # self.fc = nn.Linear(512 * block.expansion, num_classes)

        # 變成兩個並行的全連接層
        self.gen_fc = nn.Linear(512 * block.expansion, gender_classes)
        self.race_fc = nn.Linear(512 * block.expansion, race_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

        # Zero-initialize the last BN in each residual branch,
        # so that the residual branch starts with zeros, and each residual block behaves like an identity.
        # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
        if zero_init_residual:
            for m in self.modules():
                if isinstance(m, Bottleneck):
                    nn.init.constant_(m.bn3.weight, 0)
                elif isinstance(m, BasicBlock):
                    nn.init.constant_(m.bn2.weight, 0)

    def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
        norm_layer = self._norm_layer
        downsample = None
        previous_dilation = self.dilation
        if dilate:
            self.dilation *= stride
            stride = 1
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                conv1x1(self.inplanes, planes * block.expansion, stride),
                norm_layer(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
                            self.base_width, previous_dilation, norm_layer))
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.inplanes, planes, groups=self.groups,
                                base_width=self.base_width, dilation=self.dilation,
                                norm_layer=norm_layer))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = x.view(x.size(0), -1)

       #變成兩個並行的全連接層
        gender = F.softmax(self.gen_fc(x), 1)
        race = F.softmax(self.race_fc(x), 1)

        return gender, race

def resnet18Owned(**kwargs):
    """Constructs a ResNet-18 model.
    """
    model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs)
    return model


def test():
    net = resnet18Owned(gender_classes=2,race_classes=4)
    gender, race = net(Variable(torch.randn(2,3,224,224)))
    print('gender :', gender.size(),gender)
    print('race :', race.size(), race)

if __name__ == '__main__':
    test()

這里舉的是一個比較簡單的例子，僅是將一個全連接層的resnet18更改為了兩個並行全連接層的resnet18，那么這個時候怎么使用之前訓練的resnet18模型參數呢？

#coding:utf-8
    import torch
    from torchvision import models
    from torch import nn

    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    
    #導入預訓練模型，得到結構和參數
    pretrained_resnet18 = models.resnet18(pretrained=True)
    pretrained_resnet18_dict = pretrained_resnet18.state_dict()

    #調用自己設置的模型，也得到結構即相應參數
    model_conv = resnet18Owned(gender_classes=2, race_classes=3)
    model_conv_dict = model_conv.state_dict()
    
    #當模型中的某層是同時在兩個模型中共有時才取出，即得到除了全連接層以外的所有層次對應的參數
    pretrained_resnet18_dict = {k:v for k,v in pretrained_resnet18_dict.items() if k in model_conv_dict}
    #然后用該新參數的值取更新你自己的模型的參數
    #這樣，除了你修改的全連接層外，其他層次的參數就都是預訓練模型的參數了
    model_conv_dict.update(pretrained_resnet18_dict)
    #然后將參數導入你的模型即可
    model_conv.load_state_dict(model_conv_dict)
    model_conv.to(device)

后面了解到有一種更簡單的方法：

就是當你設置好你自己的模型后，如果僅想使用預訓練模型相同結構處的參數，即在加載的時候將參數strict設置為False即可。該參數值默認為True，表示預訓練模型的層和自己定義的網絡結構層嚴格對應相等（比如層名和維度），否則無法加載，實現如下：

model_conv.load_state_dict(torch.utils.model_zoo.load_url('https://download.pytorch.org/models/resnet18-5c106cde.pth'), strict=False)

看看是否僅將strict設置為False即可

#coding:utf-8
import torch
from torchvision import models
from torch import nn

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# 調用自己設置的模型，也得到結構即相應參數
model_conv = resnet18Owned(gender_classes=2, race_classes=3)
model_conv.load_state_dict(torch.utils.model_zoo.load_url('https://download.pytorch.org/models/resnet18-5c106cde.pth'), strict=False)

model_conv.to(device)

這個是官方給的預訓練模型的下載地址 https://download.pytorch.org/models/resnet18-5c106cde.pth

⚠️如果你的torch版本是1.0.1及以下，那就使用torch.utils.model_zoo.load_url()；如果是1.1.0及以上，可以使用新方法torch.hub.load_state_dict_from_url()

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 tensorflow 使用預訓練好的模型的一部分參數 sql 如何刪除（代替）字段內某一部分內容 mysql-批量修改表字段中的某一部分內容 PHP 正則表達式替換一部分內容 Java的indexOf返回的是第一個匹配到的字符的索引位置，substring(a,b)獲得字符串的一部分內容 pytorch怎么使用定義好的模型的一部分 Hash函數知識點整理&Md5部分內容網頁部分內容顯示不全 easyUI 如何不跳轉頁面，只是加載替換center部分內容 pytorch 嘗試凍結一部分層,對其他層進行訓練