【Program】Pytorch入門④：TORCHVISION OBJECT DETECTION FINETUNING TUTORIAL

本文轉載自查看原文 2020-02-16 12:18 1281 Pytorch/ Mask-RCNN

開始前需要有一個准備工作，安裝coco的api，主要用到其中的IOU計算的庫來評價模型的性能。我折騰了一個晚上加一個上午的時間，在我12年買的老筆記本上，按照網上很多方法還是無法解決。卡在了pycocotools問題上，最終的錯誤是：e:\anaconda3\include\pyconfig.h(59): fatal error C1083: 無法打開包括文件: “io.h”: No such file or directory。

本教程使用Penn-Fudan的行人檢測和分割數據集來訓練Mask R-CNN實例分割模型。Penn-Fudan數據集中有170張圖像，包含345個行人的實例。圖像中場景主要是校園和城市街景，每張圖中至少有一個行人。

本文參考作者一個菜鳥的奮斗的文章：手把手教你訓練自己的Mask R-CNN圖像實例分割模型（PyTorch官方教程）。

數據處理

按照官網教程下載好數據集后，可以看到起數據結構,PennFudanPed/readme中有詳細的講解。

PennFudanPed/ PedMasks/ FudanPed00001_mask.png FudanPed00002_mask.png FudanPed00003_mask.png FudanPed00004_mask.png ... PNGImages/ FudanPed00001.png FudanPed00002.png FudanPed00003.png FudanPed00004.png

圖像如下：

一個菜鳥的奮斗給了數據預覽方法：

from PIL import Image

Image.open('PennFudanPed/PNGImages/FudanPed00001.png')

mask = Image.open('PennFudanPed/PedMasks/FudanPed00001_mask.png')

mask.putpalette([
0, 0, 0, # black background
255, 0, 0, # index 1 is red
255, 255, 0, # index 2 is yellow
255, 153, 0, # index 3 is orange
])

mask.show()

在訓練模型之前，需要寫好數據集的載入接口。其實針對每一個不同的問題，都需要首先做這一步，搭建模型和處理數據是同步進行的。貼上我簡要注釋的代碼：

# -*- coding:utf-8 -*-
#@Time  : 2020/2/14 9:13
#@Author: zhangqingbo
#@File  : 1_torchvision_object_detection_finetuning.py

import os
import numpy as np
import torch
from  PIL import  Image

class PennFudanDataset(object):
    def __init__(self, root, transforms):
        self.root = root
        self.transforms = transforms

        # load all image files, sorting them to ensure that
        # they are aligned
        self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
        self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))

    def __getitem__(self, idx):
        # load images ad masks
        img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
        mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
        img = Image.open(img_path).convert("RGB")

        # note that we haven't converted the mask to RGB,
        # because each color corresponds to a different instance
        # with 0 being background
        mask = Image.open(mask_path)

        # convert the PIL Image into a numpy array
        mask = np.array(mask)

        # instances are encoded as different colors
        obj_ids = np.unique(mask)

        # first id is the background, so remove it
        obj_ids = obj_ids[1:]

        # split the color-encoded mask to a set of binary masks
        # 下面這行代碼的解釋：以FudanPed00001為例，有兩個人目標，FudanPed00001_mask中用像素為0表示背景，
        # 用1表示目標1，用2表示目標2，以此類推，所以mask是一個559*536的二維矩陣，目標1所占的像素值全為1，
        # 目標2的像素值全為2，而obj_ids是取mask中的唯一值，即mask=[1, 2],原來mask=[0， 1, 2]，但
        # “obj_ids = obj_ids[1:]”已經截掉了元素0.
        # 而下面這行代碼，創建了masks（2*559*536），包含兩個mask（559*536），分別對應第一個目標和第二個目標，
        # 第一個mask中，目標1所占像素為True,其余全為False，第二個mask中，目標2所占像素為True,其余全為False。
        # 感覺這行代碼太騷氣！
        masks = mask == obj_ids[:, None, None]

        # get bounding box coordinates for each mask
        num_objs = len(obj_ids)
        boxes = []
        for i in range(num_objs):
            pos = np.where(masks[i])
            # pos[0]代表的是第幾行，即為縱坐標
            # pos[1]代表的是第幾列，即為橫坐標
            xmin = np.min(pos[1])
            xmax = np.max(pos[1])
            ymin = np.min(pos[0])
            ymax = np.max(pos[0])
            boxes.append([xmin, ymin, xmax,ymax])

        # convert everything into a torch.Tensor
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        # there is only one class
        labels = torch.ones((num_objs,), dtype=torch.int64)
        masks = torch.as_tensor(masks, dtype=torch.uint8)

        image_id = torch.tensor([idx])
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])

        # suppose all instances are not crowd
        iscrowd = torch.zeros((num_objs,), dtype=torch.int64)

        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["masks"] = masks
        target["image_id"] = image_id
        target["area"] = area
        target["iscrowd"] = iscrowd

        if self.transforms is not None:
            img, target = self.transforms(img, target)
        return img, target

    def __len__(self):
        return len(self.imgs)

搭建模型

Mask R-CNN是基於Faster R-CNN改造而來的。Faster R-CNN用於預測圖像中潛在的目標框和分類得分，而Mask R-CNN在此基礎上加了一個額外的分支，用於預測每個實例的分割mask。

有兩種方式來修改torchvision modelzoo中的模型，以達到預期的目的。第一種，采用預訓練的模型，在修改網絡最后一層后finetune。第二種，根據需要替換掉模型中的骨干網絡，如將ResNet替換成MobileNet等。

1. Finetuning from a pretrained model

# if you want to start from a model pre-trained on COCO and want to finetune it for your particular classes.
# Here is a possible way of doing it:
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

# load a model pre-trained on COCO
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

# replace the classifier with a new one, that has
# num_classes which is user-defined
num_classes = 2 # 1 class(person) + background

# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features

# replace the pre-trained head with a new one
model.roi_head.box_predictor = FastRCNNPredictor(in_features, num_classes)

2. Modifying the model to add a different backbone

# if you want to replace the backbone of the model with a different one。 
# 舉例來說，默認的骨干網絡（ResNet-50）對於某些應用來說可能參數過多不易部署，可以考慮將其替換成更輕量的網絡（如MobileNet）
# Here is a possible way of doing it:

import torchvision
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

# load a pre-trained model for classification and return only the features
backbone = torchvision.models.mobilenet_v2(pretrained=True).features
# FasterRNN needs to know the number of output channels in a backbone.
# For mobilenet_v2, it's 1280, so we need to add it here
backbone.out_channels = 1280

# let's make the RPN generate 5 x 3 anchors per spatial
# location, with 5 different sizes and 3 different aspect
# ratios. We have a Tuple[Tuple[int]] because each feature
# map could potentially have different sizes and
# aspect ratios
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
                                   aspect_ratios=((0.5, 1.0, 2.0),))

# let's define what are the feature maps that we will
# use to perform the region of interest cropping, as well as
# the size of the crop after rescaling.
# if your backbone returns a Tensor, featmap_names is expected to
# be [0]. More generally, the backbone should return an
# OrderedDict[Tensor], and in featmap_names you can choose which
# feature maps to use.
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
                                                output_size=7,
                                                sampling_ratio=2)
# put the pieces together inside a FasterRCNN model
model = FasterRCNN(backbone,
                   num_classes=2,
                   rpn_anchor_generator=anchor_generator,
                   box_roi_pool=roi_pooler)

本文的目的是在PennFudan數據集上訓練Mask R-CNN實例分割模型，即上述第一種情況。在torchvision.models.detection中有官方的網絡定義和接口的文件，可以直接使用。

import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor

def get_model_instance_segemention(nun_classes):
    # load an instance segmentation model pre-trained pre-trained on COCO
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
    # get number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    # now get the number of input features for the mask classifier
    in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
    hidden_layer = 256
    # num_classes = 2  # 1 class(person) + background
    # and replace the mask predictor with a new one
    model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,
                                                       hidden_layer,
                                                       num_classes)

    return model

"""
在PyTorch官方的references/detection/中，有一些封裝好的用於模型訓練和測試的函數.
其中references/detection/engine.py、references/detection/utils.py、references/detection/transforms.py是我們需要用到的。
首先，將這些文件拷貝過來.這一步也是折騰半天，官網教程沒有說的很清楚，原來是在GitHub/Pytorch里面有一個vision模塊，里面包含了utils.py,transform.py
h和engine.py這些文件。

# Download TorchVision repo to use some files from references/detection
>>git clone https://github.com/pytorch/vision.git
>>cd vision
>>git checkout v0.4.0
 
>>cp references/detection/utils.py ../
>>cp references/detection/transforms.py ../
>>cp references/detection/coco_eval.py ../
>>cp references/detection/engine.py ../
>>cp references/detection/coco_utils.py ../

"""

3. 數據增強/轉換

import transforms as T def get_transform(train): transforms = [] transforms.append(T.ToTensor()) if train: transforms.append(T.RandomHorizontalFlip(0.5)) return T.Compose(transforms)

4. 訓練（我沒訓練成功）

數據集、模型、數據增強的部分都已經寫好。在模型初始化、優化器及學習率調整策略選定后，就可以開始訓練了。

這里，設置模型訓練10個epochs，並且在每個epoch完成后在測試集上對模型的性能進行評價。

from engine import train_one_epoch, evaluate import utils def main(): # train on the GPU or on the CPU, if a GPU is not available device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') # our dataset has two classes only - background and person num_classes = 2 # use our dataset and defined transformations dataset = PennFudanDataset('PennFudanPed', get_transform(train=True)) dataset_test = PennFudanDataset('PennFudanPed', get_transform(train=False)) # split the dataset in train and test set indices = torch.randperm(len(dataset)).tolist() dataset = torch.utils.data.Subset(dataset, indices[:-50]) dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:]) # define training and validation data loaders data_loader = torch.utils.data.DataLoader( dataset, batch_size=2, shuffle=True, num_workers=4, collate_fn=utils.collate_fn) data_loader_test = torch.utils.data.DataLoader( dataset_test, batch_size=1, shuffle=False, num_workers=4, collate_fn=utils.collate_fn) # get the model using our helper function model = get_model_instance_segmentation(num_classes) # move model to the right device model.to(device) # construct an optimizer params = [p for p in model.parameters() if p.requires_grad] optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005) # and a learning rate scheduler lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1) # let's train it for 10 epochs num_epochs = 10 for epoch in range(num_epochs): # train for one epoch, printing every 10 iterations train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10) # update the learning rate lr_scheduler.step() # evaluate on the test dataset evaluate(model, data_loader_test, device=device) print("That's it!")

5. 模型測試

一個菜鳥的奮斗提供了測試代碼：

# pick one image from the test set

img, _ = dataset_test[0]

# put the model in evaluation mode

model.eval()

with torch.no_grad():

　　prediction = model([img.to(device)])

　　將測試圖像及對應的預測結果可視化：

Image.fromarray(img.mul(255).permute(1, 2, 0).byte().numpy())

Image.fromarray(prediction[0]['masks'][0, 0].mul(255).byte().cpu().numpy())

6. 總結

Pytorch官方已經將許多模型納入進了其框架，方便用戶很方便的調用，但是調用前要仔細查看源代碼，明確參數傳遞要求。經典模型的paper還是要好好看看，才能得到提升。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 tf源碼中的object_detection_tutorial.ipynb文件 Object Detection Pytorch tutorial 之Transfer Learning pytorch torchvision.ImageFolder的使用 object detection[YOLOv2] Pytorch tutorial 之Datar Loading and Processing (2) Pytorch tutorial 之Datar Loading and Processing (1) matplotlib 入門之Image tutorial pytorch與torchvision版本、tensorflow與keras版本 PyTorch源碼解讀之torchvision.models