【Computer Vision】 復現分割網絡(1)——SegNet


Tags: ComputerVision

編譯

  1. src/caffe/layers/contrastive_loss_layer.cpp:56:30: error: no matching function for call to ‘max(double, float)’
    Dtype dist = std::max(margin - sqrt(dist_sq_.cpu_data()[i]), Dtype(0.0));

Replace line 56 by this one :
Dtype dist = std::max(margin - (float)sqrt(dist_sq_.cpu_data()[i]), Dtype(0.0));
2. .build_release/lib/libcaffe.so: undefined reference to `cv::imread(cv::String const&, int)'

Change Makefile:
LIBRARIES += glog gflags protobuf leveldb snappy
lmdb boost_system hdf5_hl hdf5 m
opencv_core opencv_highgui opencv_imgproc
add :opencv_imgcodecs

數據處理

  1. median frequency balancing的計算
    圖片分割經常會遇到class unbalance的情況,如果你的target是要求每個類別的accuracy 都很高那么在訓練的時候做class balancing 很重要,如果你的target要求只要求圖片總體的pixel accuracy好,那么class balancing 此時就不是很重要,因為占比小的class, accuray 雖然小,但是對總體的Pixel accuracy影響也較小。
    那么看下本文中的meidan frequency balancing是如何計算的:
    對於一個多類別圖片數據庫,每個類別都會有一個class frequency, 該類別像素數目除以數據庫總像素數目, 求出所有class frequency 的median 值,除以該類別對應的frequency 得到weight:

\[weight_i = median(weights)/weight_i \]

這樣可以保證占比小的class, 權重大於1, 占比大的class, 權重小於1, 達到balancing的效果.
如對我自己的數據有兩類分別為0,1, 一共55張500500訓練圖片,統計55張圖片中0,1像素的個數:
count1 227611
count0 13522389
freq1 = 227611/(500
50055) = 0.0166
freq0 = 13522389/(500
500*55) = 0.9834
median = 0.5
weight1 = 30.12
weight0 = 0.508

  1. webdemo權重
    作者訓練的webdemo和他給出的模型文件的類別數目和label 是對不上號的,因此可以使用webdemo跑測試,但是最好不要在上面finetune, 直接在VGG-16上面finetune 就行

  2. rgb label 轉換為 gray label

一些數據集給出的label是rgb的,如下圖,但是訓練過程中輸入網絡的label一般是0 - class_num-1標記的label map, 因此需要一個轉換過程,下面給出一個python2轉換腳本:

#!/usr/bin/env python
import os
import numpy as np
from itertools import izip
from argparse import ArgumentParser
from collections import OrderedDict
from skimage.io import ImageCollection, imsave
from skimage.transform import resize


camvid_colors = OrderedDict([
    ("Animal", np.array([64, 128, 64], dtype=np.uint8)),
    ("Archway", np.array([192, 0, 128], dtype=np.uint8)),
    ("Bicyclist", np.array([0, 128, 192], dtype=np.uint8)),
    ("Bridge", np.array([0, 128, 64], dtype=np.uint8)),
    ("Building", np.array([128, 0, 0], dtype=np.uint8)),
    ("Car", np.array([64, 0, 128], dtype=np.uint8)),
    ("CartLuggagePram", np.array([64, 0, 192], dtype=np.uint8)),
    ("Child", np.array([192, 128, 64], dtype=np.uint8)),
    ("Column_Pole", np.array([192, 192, 128], dtype=np.uint8)),
    ("Fence", np.array([64, 64, 128], dtype=np.uint8)),
    ("LaneMkgsDriv", np.array([128, 0, 192], dtype=np.uint8)),
    ("LaneMkgsNonDriv", np.array([192, 0, 64], dtype=np.uint8)),
    ("Misc_Text", np.array([128, 128, 64], dtype=np.uint8)),
    ("MotorcycleScooter", np.array([192, 0, 192], dtype=np.uint8)),
    ("OtherMoving", np.array([128, 64, 64], dtype=np.uint8)),
    ("ParkingBlock", np.array([64, 192, 128], dtype=np.uint8)),
    ("Pedestrian", np.array([64, 64, 0], dtype=np.uint8)),
    ("Road", np.array([128, 64, 128], dtype=np.uint8)),
    ("RoadShoulder", np.array([128, 128, 192], dtype=np.uint8)),
    ("Sidewalk", np.array([0, 0, 192], dtype=np.uint8)),
    ("SignSymbol", np.array([192, 128, 128], dtype=np.uint8)),
    ("Sky", np.array([128, 128, 128], dtype=np.uint8)),
    ("SUVPickupTruck", np.array([64, 128, 192], dtype=np.uint8)),
    ("TrafficCone", np.array([0, 0, 64], dtype=np.uint8)),
    ("TrafficLight", np.array([0, 64, 64], dtype=np.uint8)),
    ("Train", np.array([192, 64, 128], dtype=np.uint8)),
    ("Tree", np.array([128, 128, 0], dtype=np.uint8)),
    ("Truck_Bus", np.array([192, 128, 192], dtype=np.uint8)),
    ("Tunnel", np.array([64, 0, 64], dtype=np.uint8)),
    ("VegetationMisc", np.array([192, 192, 0], dtype=np.uint8)),
    ("Wall", np.array([64, 192, 0], dtype=np.uint8)),
    ("Void", np.array([0, 0, 0], dtype=np.uint8))
])


def convert_label_to_grayscale(im):
    out = (np.ones(im.shape[:2]) * 255).astype(np.uint8)
    for gray_val, (label, rgb) in enumerate(camvid_colors.items()):
        match_pxls = np.where((im == np.asarray(rgb)).sum(-1) == 3)
        out[match_pxls] = gray_val
    assert (out != 255).all(), "rounding errors or missing classes in camvid_colors"
    return out.astype(np.uint8)


def make_parser():
    parser = ArgumentParser()
    parser.add_argument(
        'label_dir',
        help="Directory containing all RGB camvid label images as PNGs"
    )
    parser.add_argument(
        'out_dir',
        help="""Directory to save grayscale label images.
        Output images have same basename as inputs so be careful not to
        overwrite original RGB labels""")
    return parser


if __name__ == '__main__':
    parser = make_parser()
    args = parser.parse_args()
    labs = ImageCollection(os.path.join(args.label_dir, "*"))
    os.makedirs(args.out_dir)
    for i, (inpath, im) in enumerate(izip(labs.files, labs)):
        print(i + 1, "of", len(labs))
        # resize to caffe-segnet input size and preserve label values
        resized_im = (resize(im, (360, 480), order=0) * 255).astype(np.uint8)
        out = convert_label_to_grayscale(resized_im)
        outpath = os.path.join(args.out_dir, os.path.basename(inpath))
        imsave(outpath, out)

訓練結果

基於VGG-16finetune訓練的一個模型迭代20000次的測試結果:
gQZ7n.png
label:
gQyPQ.png
基於VGG-16自己數據訓練的結果:
g4BBu.png
label:
g45vH.png

測試結果:
g49kN.png

Reference

  1. Demystifying Segnet:http://5argon.info/portfolio/d/SegnetTrainingGuide.pdf


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM