SSD 車輛檢測 實現


part1 模型實現部部分

1我們使用builde_model 來實現模型的構建 

def build_model(image_size,
                n_classes,
                mode='training',
                l2_regularization=0.0,
                min_scale=0.1,
                max_scale=0.9,
                scales=None,
                aspect_ratios_global=[0.5, 1.0, 2.0],
                aspect_ratios_per_layer=None,
                two_boxes_for_ar1=True,
                steps=None,
                offsets=None,
                clip_boxes=False,
                variances=[1.0, 1.0, 1.0, 1.0],
                coords='centroids',
                normalize_coords=False,
                subtract_mean=None,
                divide_by_stddev=None,
                swap_channels=False,
                confidence_thresh=0.01,
                iou_threshold=0.45,
                top_k=200,
                nms_max_output_size=400,
                return_predictor_sizes=False):

解釋一下這個函數的參數

Arguments:
        image_size (tuple): The input image size in the format `(height, width, channels)`.
        n_classes (int): The number of positive classes: 5.
        mode (str, optional):
            training:相對坐標 inference:絕對坐標
            One of 'training', 'inference' and 'inference_fast'. In 'training' mode,
            the model outputs the raw prediction tensor, while in 'inference' and 'inference_fast' modes,
            the raw predictions are decoded into absolute coordinates and filtered via confidence thresholding,
            non-maximum suppression, and top-k filtering. The difference between latter two modes is that
            'inference' follows the exact procedure of the original Caffe implementation, while
            'inference_fast' uses a faster prediction decoding procedure.
        l2_regularization (float, optional): The L2-regularization rate. Applies to all convolutional layers.
        min_scale (float, optional):
            minS 和 maxS
            The smallest scaling factor for the size of the anchor boxes as a fraction
            of the shorter side of the input images.
        max_scale (float, optional): The largest scaling factor for the size of the anchor boxes as a fraction
            of the shorter side of the input images. All scaling factors between the smallest and the
            largest will be linearly interpolated. Note that the second to last of the linearly interpolated
            scaling factors will actually be the scaling factor for the last predictor layer, while the last
            scaling factor is used for the second box for aspect ratio 1 in the last predictor layer
            if `two_boxes_for_ar1` is `True`.
        scales (list, optional):
            一個list 包含每一層的anchor 占比
            A list of floats containing scaling factors per convolutional predictor layer.
            This list must be one element longer than the number of predictor layers. The first `k` elements are the
            scaling factors for the `k` predictor layers, while the last element is used for the second box
            for aspect ratio 1 in the last predictor layer if `two_boxes_for_ar1` is `True`. This additional
            last scaling factor must be passed either way, even if it is not being used. If a list is passed,
            this argument overrides `min_scale` and `max_scale`. All scaling factors must be greater than zero.
        aspect_ratios_global (list, optional):
            全局長寬比
            The list of aspect ratios for which anchor boxes are to be
            generated. This list is valid for all predictor layers. The original implementation uses more aspect ratios
            for some predictor layers and fewer for others. If you want to do that, too, then use the next argument instead.
        aspect_ratios_per_layer (list, optional):
            每一層長寬比設定
            A list containing one aspect ratio list for each predictor layer.
            This allows you to set the aspect ratios for each predictor layer individually. If a list is passed,
            it overrides `aspect_ratios_global`.
        two_boxes_for_ar1 (bool, optional):
            在長寬比為1的情況下 是否考慮使用2個box
            第二個的比例 是取下一層的 S與當前層S 來計算集合平均值
            如第一層s = 0.2 第二層s = 0.34
            Only relevant for aspect ratio lists that contain 1. Will be ignored otherwise.
            If `True`, two anchor boxes will be generated for aspect ratio 1. The first will be generated
            using the scaling factor for the respective layer, the second one will be generated using
            geometric mean of said scaling factor and next bigger scaling factor.
        steps (list, optional):
            前一像素個點的anchor 到 下一像素個點 滑動的步長
            默認為 輸入原圖 如300*300 除以 當前feature map 長寬 如 10*10
            `None` or a list with as many elements as there are predictor layers. The elements can be
            either ints/floats or tuples of two ints/floats. These numbers represent for each predictor layer how many
            pixels apart the anchor box center points should be vertically and horizontally along the spatial grid over
            the image. If the list contains ints/floats, then that value will be used for both spatial dimensions.
            If the list contains tuples of two ints/floats, then they represent `(step_height, step_width)`.
            If no steps are provided, then they will be computed such that the anchor box center points will form an
            equidistant grid within the image dimensions.
        offsets (list, optional):
            起始anchor 的 中心位置 可設置為None
            `None` or a list with as many elements as there are predictor layers. The elements can be
            either floats or tuples of two floats. These numbers represent for each predictor layer how many
            pixels from the top and left boarders of the image the top-most and left-most anchor box center points should be
            as a fraction of `steps`. The last bit is important: The offsets are not absolute pixel values, but fractions
            of the step size specified in the `steps` argument. If the list contains floats, then that value will
            be used for both spatial dimensions. If the list contains tuples of two floats, then they represent
            `(vertical_offset, horizontal_offset)`. If no offsets are provided, then they will default to 0.5 of the step size,
            which is also the recommended setting.
        clip_boxes (bool, optional):
            是否對超出邊界的anchor 進行剪切操作 默認為False 剪切效果不是很好
            If `True`, clips the anchor box coordinates to stay within image boundaries.
        variances (list, optional):
            默認為1 作者設定的值 
            A list of 4 floats >0. The anchor box offset for each coordinate will be divided by
            its respective variance value.
        coords (str, optional):
            標定框表示的形式
            'centroids' for the format `(cx, cy, w, h)`
            'minmax' for the format `(xmin, xmax, ymin, ymax)`
            'corners' for the format `(xmin, ymin, xmax, ymax)
            The box coordinate format to be used internally by the model (i.e. this is not the input format
            of the ground truth labels). Can be either 'centroids' for the format `(cx, cy, w, h)` (box center coordinates, width,
            and height), 'minmax' for the format `(xmin, xmax, ymin, ymax)`, or 'corners' for the format `(xmin, ymin, xmax, ymax)`.
        normalize_coords (bool, optional):
            是否使用歸一化的形式來表示像素的坐標
            Set to `True` if the model is supposed to use relative instead of absolute coordinates,
            i.e. if the model predicts box coordinates within [0,1] instead of absolute coordinates.
        subtract_mean (array-like, optional):
            均值化 
            把圖像像素變為【-127,+127】
            `None` or an array-like object of integers or floating point values
            of any shape that is broadcast-compatible with the image shape. The elements of this array will be
            subtracted from the image pixel intensity values. For example, pass a list of three integers
            to perform per-channel mean normalization for color images.
        divide_by_stddev (array-like, optional):
            歸一化 均值化之后縮放到0,1 之間 或正負0.5之間
            `None` or an array-like object of non-zero integers or
            floating point values of any shape that is broadcast-compatible with the image shape. The image pixel
            intensity values will be divided by the elements of this array. For example, pass a list
            of three integers to perform per-channel standard deviation normalization for color images.
        swap_channels (list, optional):
            對通道 進行操作 默認為False
            Either `False` or a list of integers representing the desired order in which the input
            image channels should be swapped.
image_size:模型的輸入形狀
n_classes:檢測的類別送數

mode: training:相對坐標 inference:絕對坐標 相對坐標就是相對於整個feature 的比例值 絕對目標就真實的像素值
 One of 'training', 'inference' and 'inference_fast'. In 'training' mode,

l2_regularization:l2正則化的系數
min_scale (float, optional):
max_scale (float, optional)
minS 和 maxS 論文中 anchor 現對於 featuremap的 比例值  一個是最大值 一個是最小值 最小值是第一個預測層的比例 最大值是最后一層預測層的比例

scales (list, optional):一個list 包含每一層的anchor 占比
aspect_ratios_global (list, optional):
            全局長寬比
aspect_ratios_per_layer (list, optional):
            每一層長寬比設定
two_boxes_for_ar1 (bool, optional):
            在長寬比為1的情況下 是否考慮使用2個box
            第二個的比例 是取下一層的 S與當前層S 來計算集合平均值
            如第一層s = 0.2 第二層s = 0.34
steps (list, optional):
            前一像素個點的anchor 到 下一像素個點 滑動的步長
offsets (list, optional):
            起始anchor 的 中心位置 可設置為None
clip_boxes (bool, optional):
            是否對超出邊界的anchor 進行剪切操作 默認為False 剪切效果不是很好
 variances (list, optional):
            默認為1 作者設定的值 
coords (str, optional):
            標定框表示的形式
            'centroids' for the format `(cx, cy, w, h)` 'minmax' for the format `(xmin, xmax, ymin, ymax)` 'corners' for the format `(xmin, ymin, xmax, ymax)
normalize_coords (bool, optional):
            是否使用歸一化的形式來表示像素的坐標
divide_by_stddev (array-like, optional): 歸一化 均值化之后縮放到0,1 之間 或正負0.5之間
divide_by_stddev (array-like, optional): 歸一化 均值化之后縮放到0,1 之間 或正負0.5之間
swap_channels (list, optional):
            對通道 進行操作 默認為False

confidence_thresh (float, optional):
            檢測目標的概率的閾值 達到這個閾值認為 正確檢測到了 目標
            A float in [0,1), the minimum classification confidence in a specific
            positive class in order to be considered for the non-maximum suppression stage for the respective class.
            A lower value will result in a larger part of the selection process being done by the non-maximum suppression
            stage, while a larger value will result in a larger part of the selection process happening in the confidence
            thresholding stage.
        iou_threshold (float, optional):
            非極大值抑制 中的iou操作表示兩個anchor 的重合區域 我們把 兩個anchor iou 超過這個閾值我們認為檢測到了同一個目標
            A float in [0,1]. All boxes that have a Jaccard similarity of greater than `iou_threshold`
            with a locally maximal box will be removed from the set of predictions for a given class, where 'maximal' refers
            to the box's confidence score.
        top_k (int, optional):
            保留概率最高的k個邊界框  假如一共檢測3個  可以設定3
            The number of highest scoring predictions to be kept for each batch item after the
            non-maximum suppression stage.
        nms_max_output_size (int, optional): The maximal number of predictions that will be left over after the NMS stage.
        return_predictor_sizes (bool, optional):
            返回沒個輸出層的特征圖大小 可用於調試
            If `True`, this function not only returns the model, but also
            a list containing the spatial dimensions of the predictor layers. This isn't strictly necessary since
            you can always get their sizes easily via the Keras API, but it's convenient and less error-prone
            to get them this way. They are only relevant for training anyway (SSDBoxEncoder needs to know the
            spatial dimensions of the predictor layers), for inference you don't need them.

part2模型的實現

conv1 = Conv2D(32, (5,5), strides=(1,1), padding="same", kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv1')(x1)
    conv1 = BatchNormalization(axis=3, momentum=0.99, name='bn1')(conv1)
    conv1 = ELU(name='elu1')(conv1)#指數線性單元
    poo11 = MaxPooling2D(pool_size=(2,2), name='pool1')(conv1)

    conv2 = Conv2D(48, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal',kernel_regularizer=l2(l2_reg), name='conv2')(poo11)
    conv2 = BatchNormalization(axis=3, momentum=0.99, name='bn2')(conv2)
    conv2 = ELU(name='elu2')(conv2)  # 指數線性單元
    poo12 = MaxPooling2D(pool_size=(2, 2), name='pool2')(conv2)

    conv3 = Conv2D(64, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal',kernel_regularizer=l2(l2_reg), name='conv3')(poo12)
    conv3 = BatchNormalization(axis=3, momentum=0.99, name='bn3')(conv3)
    conv3 = ELU(name='elu3')(conv3)  # 指數線性單元
    poo13 = MaxPooling2D(pool_size=(2, 2), name='pool3')(conv3)

    conv4 = Conv2D(64, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal',kernel_regularizer=l2(l2_reg), name='conv4')(poo13)
    conv4 = BatchNormalization(axis=3, momentum=0.99, name='bn4')(conv4)
    conv4 = ELU(name='elu4')(conv4)  # 指數線性單元
    poo14 = MaxPooling2D(pool_size=(2, 2), name='pool4')(conv4)

    conv5 = Conv2D(48, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal',kernel_regularizer=l2(l2_reg), name='conv5')(poo14)
    conv5 = BatchNormalization(axis=3, momentum=0.99, name='bn5')(conv5)
    conv5 = ELU(name='elu5')(conv5)  # 指數線性單元
    poo15 = MaxPooling2D(pool_size=(2, 2), name='pool5')(conv5)

    conv6 = Conv2D(48, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal',kernel_regularizer=l2(l2_reg), name='conv6')(poo15)
    conv6 = BatchNormalization(axis=3, momentum=0.99, name='bn6')(conv6)
    conv6 = ELU(name='elu6')(conv6)  # 指數線性單元
    poo16 = MaxPooling2D(pool_size=(2, 2), name='pool6')(conv6)

    conv7 = Conv2D(32, (3, 3), strides=(1, 1), padding="same", kernel_initializer='he_normal',kernel_regularizer=l2(l2_reg), name='conv7')(poo16)
    conv7 = BatchNormalization(axis=3, momentum=0.99, name='bn7')(conv7)
    conv7 = ELU(name='elu7')(conv7)  # 指數線性單元
    poo17 = MaxPooling2D(pool_size=(2, 2), name='pool7')(conv7)

一共7層卷積層用於特征提取

其中4,5,6,7用於ssd的檢測

#分類
    classes4 = Conv2D(n_boxes[0] * n_classes, (3, 3), strides=(1,1), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='classes4')(conv4)
    classes5 = Conv2D(n_boxes[1] * n_classes, (3, 3), strides=(1,1), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='classes5')(conv5)
    classes6 = Conv2D(n_boxes[2] * n_classes, (3, 3), strides=(1,1), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='classes6')(conv6)
    classes7 = Conv2D(n_boxes[3] * n_classes, (3, 3), strides=(1,1), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='classes7')(conv7)
    #定位
    boxes4 = Conv2D(n_boxes[0] * 4, (3, 3), strides=(1,1), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='boxes4')(conv4)
    boxes5 = Conv2D(n_boxes[1] * 4, (3, 3), strides=(1,1), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='boxes5')(conv5)
    boxes6 = Conv2D(n_boxes[2] * 4, (3, 3), strides=(1,1), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='boxes6')(conv6)
    boxes7 = Conv2D(n_boxes[3] * 4, (3, 3), strides=(1,1), padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='boxes7')(conv7)

    #生成anchor box
    anchors4 = AnchorBoxes(img_height, img_width, this_scale=scales[0], next_scale=scales[1], aspect_ratios=aspect_ratios[0],
                           two_boxes_for_ar1= two_boxes_for_ar1, this_steps=steps[0], this_offsets=offsets[0],
                           clip_boxes=clip_boxes, variances= variances, coords=coords, normalize_coords=normalize_coords, name='anchors4')(boxes4)
    anchors5 = AnchorBoxes(img_height, img_width, this_scale=scales[1], next_scale=scales[2], aspect_ratios=aspect_ratios[1],
                           two_boxes_for_ar1= two_boxes_for_ar1, this_steps=steps[1], this_offsets=offsets[1],
                           clip_boxes=clip_boxes, variances= variances, coords=coords, normalize_coords=normalize_coords, name='anchors5')(boxes5)
    anchors6 = AnchorBoxes(img_height, img_width, this_scale=scales[2], next_scale=scales[3], aspect_ratios=aspect_ratios[2],
                           two_boxes_for_ar1= two_boxes_for_ar1, this_steps=steps[2], this_offsets=offsets[2],
                           clip_boxes=clip_boxes, variances= variances, coords=coords, normalize_coords=normalize_coords, name='anchors6')(boxes6)
    anchors7 = AnchorBoxes(img_height, img_width, this_scale=scales[3], next_scale=scales[4], aspect_ratios=aspect_ratios[3],
                           two_boxes_for_ar1= two_boxes_for_ar1, this_steps=steps[3], this_offsets=offsets[3],
                           clip_boxes=clip_boxes, variances= variances, coords=coords, normalize_coords=normalize_coords, name='anchors7')(boxes7)
 #softmax loss
    classes_softmax = Activation('softmax', name='classes_softmax')(classes_concat)

    #total loss
    predictions = Concatenate(axis=2, name='predictions')([classes_softmax, boxes_concat, anchors_concat])

2訓練部分

模型參數

img_height = 300 # 圖像的高度
img_width = 480 # 圖像的寬度
img_channels = 3 # 圖像的通道數
intensity_mean = 127.5 # 用於圖像歸一化, 將像素值轉為 `[-1,1]`
intensity_range = 127.5 # 用於圖像歸一化, 將像素值轉為 `[-1,1]`
n_classes = 5 # 正樣本的類別 (不包括背景)
scales = [0.08, 0.16, 0.32, 0.64, 0.96] # Anchor 的 scaling factors. 如果設置了這個值, 那么 `min_scale` 和 `max_scale` 會被忽略
aspect_ratios = [0.5, 1.0, 2.0] # 每一個 Anchor 的長寬比
two_boxes_for_ar1 = True # 是否產生兩個為長寬比為 1 的 Anchor
steps = None # 可以手動設置 Anchor 的步長, 不建議使用
offsets = None # 可以手動設置左上角 Anchor 的偏置, 不建議使用
clip_boxes = False # 是否將 Anchor 剪切到圖像邊界范圍內 
variances = [1.0, 1.0, 1.0, 1.0] # 可以將目標的坐標 scale 的參數, 建議保留 1.0
normalize_coords = True # 是否使用相對於圖像尺寸的相對坐標

loss設定

adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

ssd_loss = SSDLoss(neg_pos_ratio=3, alpha=1.0)

model.compile(optimizer=adam, loss=ssd_loss.compute_loss)

圖像增強

data_augmentation_chain = DataAugmentationConstantInputSize(random_brightness=(-48, 48, 0.5),
                                                            random_contrast=(0.5, 1.8, 0.5),
                                                            random_saturation=(0.5, 1.8, 0.5),
                                                            random_hue=(18, 0.5),
                                                            random_flip=0.5,
                                                            random_translate=((0.03,0.5), (0.03,0.5), 0.5),
                                                            random_scale=(0.5, 2.0, 0.5),
                                                            n_trials_max=3,
                                                            clip_boxes=True,
                                                            overlap_criterion='area',
                                                            bounds_box_filter=(0.3, 1.0),
                                                            bounds_validator=(0.5, 1.0),
                                                            n_boxes_min=1,
                                                            background=(0,0,0))

encorder 操作

predictor_sizes = [model.get_layer('classes4').output_shape[1:3],
                   model.get_layer('classes5').output_shape[1:3],
                   model.get_layer('classes6').output_shape[1:3],
                   model.get_layer('classes7').output_shape[1:3]]

ssd_input_encoder = SSDInputEncoder(img_height=img_height,
                                    img_width=img_width,
                                    n_classes=n_classes,
                                    predictor_sizes=predictor_sizes,
                                    scales=scales,
                                    aspect_ratios_global=aspect_ratios,
                                    two_boxes_for_ar1=two_boxes_for_ar1,
                                    steps=steps,
                                    offsets=offsets,
                                    clip_boxes=clip_boxes,
                                    variances=variances,
                                    matching_type='multi',
                                    pos_iou_threshold=0.5,
                                    neg_iou_limit=0.3,
                                    normalize_coords=normalize_coords)
generator
train_generator = train_dataset.generate(batch_size=batch_size,
                                         shuffle=True,
                                         transformations=[data_augmentation_chain],
                                         label_encoder=ssd_input_encoder,
                                         returns={'processed_images',
                                                  'encoded_labels'},
                                         keep_images_without_gt=False)

val_generator = val_dataset.generate(batch_size=batch_size,
                                     shuffle=False,
                                     transformations=[],
                                     label_encoder=ssd_input_encoder,
                                     returns={'processed_images',
                                              'encoded_labels'},
                                     keep_images_without_gt=False)

訓練過程中的設置

model_checkpoint = ModelCheckpoint(filepath='ssd7_epoch-{epoch:02d}_loss-{loss:.4f}_val_loss-{val_loss:.4f}.h5',
                                   monitor='val_loss',
                                   verbose=1,
                                   save_best_only=True,
                                   save_weights_only=False,
                                   mode='auto',
                                   period=1)

csv_logger = CSVLogger(filename='ssd7_training_log.csv',
                       separator=',',
                       append=True)

early_stopping = EarlyStopping(monitor='val_loss',
                               min_delta=0.0,
                               patience=10,
                               verbose=1)

reduce_learning_rate = ReduceLROnPlateau(monitor='val_loss',
                                         factor=0.2,
                                         patience=8,
                                         verbose=1,
                                         epsilon=0.001,
                                         cooldown=0,
                                         min_lr=0.00001)

callbacks = [model_checkpoint,
             csv_logger,
             early_stopping,
             reduce_learning_rate]

訓練設置

batch_size = 16
initial_epoch   = 0
final_epoch     = 50
steps_per_epoch = 500

訓練的結果

訓練過程中記錄的loss

 

可以看到最后第46個epoch並不比45次的模型准確,程序就沒有保存46次訓練的權重。其他也同理。

效果最好的是第48次的權重。

整體仍有下降的趨勢。可以繼續多訓練寫epoch。使模型更准確。

這是我們保存的權重文件

測試一下模型的效果

圖片1
圖像名: ./driving_datasets/1478899079843657225.jpg

人工標注的值:

[[  1  83 144 105 158]
 [  1 124 142 138 153]
 [  1 174 138 194 156]
 [  1 183 139 235 178]
 [  1 189 138 197 149]
 [  1 209 137 219 147]
 [  2 139 134 160 151]]
預測值:

   類別   概率 xmin   ymin   xmax   ymax
[[  1.     0.98 175.51 138.55 195.6  157.32]
 [  1.     0.72 183.55 134.15 242.   175.66]
 [  1.     0.59  80.93 142.11 108.24 158.76]]

一共人工標注有6個目標。我們模型檢測到其中的3個。可以看到模型對於較近的目標識別的還是比較准確的。圖片中兩輛車(未檢測到的)距離較遠而且有樹木的遮擋。還有一輛距離較遠 目標比較小而且有些模糊。

再試試其他圖片

可以看到模型不是很准確 我們多訓練些epoch.

訓練更多的epoch 來提升模型的准確度
initial_epoch   = 0
final_epoch     = 100
steps_per_epoch = 1000

 

查看訓練的結果

 

 可以看到loss和之前相比與了明顯的下降。

測試

 
        


總結
1模型的loss仍然有下降的趨勢,可以多訓練些輪次、
2模型對遠處特別小的目標很難識別,模糊的目標,遮擋的目標,還有陰影處的目標。
3數據的標注有些問題,有些交目標未標注,這對我們模型訓練過程中會有干擾。





























免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM