飛槳|PaddlePaddle YoloV3學習筆記

本文轉載自查看原文 2020-08-27 12:28 649

學習了兩周PaddlePaddle，剛開始都是比較簡單的網絡，直到遇到YoloV3這個大家伙，它的程序內容涉及圖像增廣（訓練數據擴充），錨框生成（以及微調），候選區域生成、目標標注、特征提取、特征位置對應、損失函數構建、多尺度檢測等等，最終構成的是一個end2end的目標識別程序。我並沒有看原論文，直接按照Paddle課程中的ipython notebook過了一遍，把碰到的難點全部記錄下來。

一、目標檢測及相關著名模型

目標檢測是圖像識別的升級版，需要識別出圖片中包含的物體（多種物體），並且標注出位置信息（用框標出來）。如果目標檢測算法足夠快，便可處理視頻幀，做到視頻目標檢測。

用最基本的方法，記得吳恩達的機器學習公開課上提過，假設我們已經有了用於圖像識別的模型，但是無法標注出位置信息，我們可以使用不同大小的滑動窗口，在輸入圖片上進行滑動(從左到右從上往下全覆蓋，並且使用多種尺寸的滑動窗口），判斷目標並得到位置。但這樣計算量太大了，沒法使用。

所以新的目標檢測算法，會生成可能包含物體的候選區域，只在候選區域上進行識別。著名的算法有利用Selective Search的R-CNN，Fast R-CNN、使用RPN的Faster R-CNN，Mask R-CNN等。

YoloV3算法只使用一個網絡同時產生候選區域並預測出物體的類別和位置，叫做單階段檢測算法。

二、基礎知識、術語等概念

在圖片中，把物體框出來的方框，叫做bounding box 邊界框，一個方框可用兩種表達方式記錄，一般叫做xyxy(記錄方框最上角和右下角的坐標)和xywh(記錄方框中心坐標及長度和寬度)。

訓練集中，真實的包含目標的方框（一般是人工標注的）叫做ground truth box，真實框；模型預測出來的框叫prediction box預測框。

還有一種框，是人們假想出來的框，也可以是隨機生成的框，叫做錨框(anchor box)，圖像識別就發生在錨框里。

# 繪制錨框
def draw_anchor_box(center, length, scales, ratios, img_height, img_width):
    """
    以center為中心，產生一系列錨框
    其中length指定了一個基准的長度
    scales是包含多種尺寸比例的list
    ratios是包含多種長寬比的list
    img_height和img_width是圖片的尺寸，生成的錨框范圍不能超出圖片尺寸之外
    """
    bboxes = []
    for scale in scales:
        for ratio in ratios:
            h = length*scale*math.sqrt(ratio)
            w = length*scale/math.sqrt(ratio) 
            x1 = max(center[0] - w/2., 0.)
            y1 = max(center[1] - h/2., 0.)
            x2 = min(center[0] + w/2. - 1.0, img_width - 1.0)
            y2 = min(center[1] + h/2. - 1.0, img_height - 1.0)
            print(center[0], center[1], w, h)
            bboxes.append([x1, y1, x2, y2])

    for bbox in bboxes:
        draw_rectangle(currentAxis, bbox, edgecolor = 'b')

藍色方框就是生成的3個不同長寬比的錨框。

交並比（Intersection of Union）

這是個非常重要的概念，用來計算兩個方框的重合程度，公式為，

計算結果是0-1之間的，越大說明重合度越高，它的代碼實現也很機智，包含了所有可能的情況。

# 計算IoU，矩形框的坐標形式為xyxy，這個函數會被保存在box_utils.py文件中
def box_iou_xyxy(box1, box2):
    # 獲取box1左上角和右下角的坐標
    x1min, y1min, x1max, y1max = box1[0], box1[1], box1[2], box1[3]
    # 計算box1的面積
    s1 = (y1max - y1min + 1.) * (x1max - x1min + 1.)
    # 獲取box2左上角和右下角的坐標
    x2min, y2min, x2max, y2max = box2[0], box2[1], box2[2], box2[3]
    # 計算box2的面積
    s2 = (y2max - y2min + 1.) * (x2max - x2min + 1.)
    
    # 計算相交矩形框的坐標
    xmin = np.maximum(x1min, x2min)
    ymin = np.maximum(y1min, y2min)
    xmax = np.minimum(x1max, x2max)
    ymax = np.minimum(y1max, y2max)
    # 計算相交矩形行的高度、寬度、面積
    inter_h = np.maximum(ymax - ymin + 1., 0.)
    inter_w = np.maximum(xmax - xmin + 1., 0.)
    intersection = inter_h * inter_w
    # 計算相並面積
    union = s1 + s2 - intersection
    # 計算交並比
    iou = intersection / union
    return iou

# 計算IoU，矩形框的坐標形式為xywh
def box_iou_xywh(box1, box2):
    x1min, y1min = box1[0] - box1[2]/2.0, box1[1] - box1[3]/2.0
    x1max, y1max = box1[0] + box1[2]/2.0, box1[1] + box1[3]/2.0
    s1 = box1[2] * box1[3]

    x2min, y2min = box2[0] - box2[2]/2.0, box2[1] - box2[3]/2.0
    x2max, y2max = box2[0] + box2[2]/2.0, box2[1] + box2[3]/2.0
    s2 = box2[2] * box2[3]

    xmin = np.maximum(x1min, x2min)
    ymin = np.maximum(y1min, y2min)
    xmax = np.minimum(x1max, x2max)
    ymax = np.minimum(y1max, y2max)
    inter_h = np.maximum(ymax - ymin, 0.)
    inter_w = np.maximum(xmax - xmin, 0.)
    intersection = inter_h * inter_w

    union = s1 + s2 - intersection
    iou = intersection / union
    return iou

　三、蟲子識別數據集、及圖像增廣

這個數據集給了train val test 三部分數據，其中train和val包含xml格式的標注信息，標注信息其實就是蟲子的ground truth box（簡稱gtbox）坐標及蟲子類別（一共6類），畫出來如下圖。

為了模擬真實環境，需要對訓練集做一些增廣處理，擴大了數據集，抑制過擬合，提高泛化能力。

（1）隨機改變亮暗、對比度和顏色等

import numpy as np
import cv2
from PIL import Image, ImageEnhance
import random

# 隨機改變亮暗、對比度和顏色等
def random_distort(img):
    # 隨機改變亮度
    def random_brightness(img, lower=0.5, upper=1.5):
        e = np.random.uniform(lower, upper)
        return ImageEnhance.Brightness(img).enhance(e)
    # 隨機改變對比度
    def random_contrast(img, lower=0.5, upper=1.5):
        e = np.random.uniform(lower, upper)
        return ImageEnhance.Contrast(img).enhance(e)
    # 隨機改變顏色
    def random_color(img, lower=0.5, upper=1.5):
        e = np.random.uniform(lower, upper)
        return ImageEnhance.Color(img).enhance(e)

    ops = [random_brightness, random_contrast, random_color]
    np.random.shuffle(ops)

    img = Image.fromarray(img)
    img = ops[0](img)
    img = ops[1](img)
    img = ops[2](img)
    img = np.asarray(img)

    return img

由於沒有改變圖像大小，所以原物體分類、位置標注信息不變。

示例圖如下：

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 [Paddle學習筆記][12][基於YOLOv3的昆蟲檢測-模型預測] [Paddle學習筆記][08][基於YOLOv3的昆蟲檢測-數據處理] YOLOv3 【從零開始學習YOLOv3】3.YOLOv3的數據組織和處理 PaddlePaddle: 百度飛漿深度學習框架用飛槳來做中文OCR (1) 深度學習論文翻譯解析（一）：YOLOv3: An Incremental Improvement 怎樣訓練YOLOv3 yolov3測試自己的數據 yolov3 詳解