caffe數據集LMDB的生成

本文轉載自查看原文 2019-05-10 10:38 1498

本文主要介紹如何在caffe框架下生成LMDB。其中包含了兩個任務的LMDB生成方法，一種是分類，另外一種是檢測。

分類任務

第一步生成train.txt和test.txt文件文件

對於一個監督學習而言，通常具有訓練集（train_data文件夾）和測試集（test_data文件夾），如下圖所示

而多分類問題，train_data文件夾的子目錄下，有會各個類別的文件夾，里面放着歸屬同一類的圖片數據。（test_data文件夾同理）

因此，我們需要先生成train.txt和test.txt，以用作下一步處理。

以train.txt為例，其格式應該是

--------->

首先，為了防止命名中文的干擾問題，我們先為每個文件重新命名，如果你的文件沒有中文命名，則此步可以跳過。

import os
import shutil
import random

#為每個文件改名
ToRename_train = 'C:\Users\dengshunge\Desktop\plate_dataV6\train_data'
ToRename_test = 'C:\Users\dengshunge\Desktop\plate_dataV6\test_data'
# subDict為子目錄的文件夾名，需要手動填寫
subDict = ['ao_plate','black_plate','blue_plate','doubleYellow_plate','gang_plate','gua_plate','jiaolian_plate','jing_plate','lingshiguan_plate','newEnergy_plate','nongyong_plate','yellow_plate']
for i in range(len(subDict)):
    ToRename_train1 = os.path.join(ToRename_train,subDict[i])
    ToRename_test1 = os.path.join(ToRename_test,subDict[i])
    if not os.path.exists(ToRename_train1) or not os.path.exists(ToRename_test1):
        raise Exception('ERROR')
    files_train = list(os.listdir(ToRename_train1))
    random.shuffle(files_train)
    files_test = list(os.listdir(ToRename_test1))
    random.shuffle(files_test)
    for s in range(len(files_train)):
        oldname = os.path.join(ToRename_train1,files_train[s])
        # newname為新的文件名
        newname = ToRename_train1+'\\newname_train_'+str(s)+'.jpg'
        os.rename(oldname,newname)
    for s in range(len(files_test)):
        oldname = os.path.join(ToRename_test1,files_test[s])
        # newname為新的文件名
        newname = ToRename_test1+'\\newname_test_'+str(s)+'.jpg'
        os.rename(oldname,newname)

當為每個文件改名后，此時就可以生成train.txt和test.txt文件。

import os
import shutil
import random

# 形成train和test.txt文件
# 需要更換train_path，test_path和restoreFile
train_path = r'C:\Users\dengshunge\Desktop\plate_dataV6\train_data'
test_path = r'C:\Users\dengshunge\Desktop\plate_dataV6\test_data'
# 文件夾下的子目錄名稱
subPath = ['ao_plate','black_plate','blue_plate','doubleYellow_plate','gang_plate','gua_plate','jiaolian_plate','jing_plate','lingshiguan_plate','newEnergy_plate','nongyong_plate','yellow_plate']
# 生成的train.txt或者test.txt存放的位置
restoreFile = r'C:\Users\dengshunge\Desktop'
# 生成train.txt
for i in range(len(subPath)):
    train_path1 = os.path.join(train_path,subPath[i])
    if not os.path.exists(train_path1):
        raise Exception('error')
    restoreFile_train = os.path.join(restoreFile,'train.txt')
    with open(restoreFile_train,'a') as f:
        files = os.listdir(train_path1)
        for s in files:
            f.write(os.path.join(subPath[i],s)+' '+str(i)+'\n')
# 生成test.txt
for i in range(len(subPath)):
    test_path1 = os.path.join(test_path,subPath[i])
    if not os.path.exists(test_path1):
        raise Exception('error')
    restoreFile_test = os.path.join(restoreFile,'test.txt')
    with open(restoreFile_test,'a') as f:
        files = os.listdir(test_path1)
        for s in files:
            f.write(os.path.join(subPath[i],s)+' '+str(i)+'\n')

第二步修改create_imagenet.sh

如果你安裝了caffe並且得到了train.txt和test.txt文件，可以利用caffe提供的函數來生成LMDB文件。

create_imagenet.sh位於/caffe/examples/imagenet中。

將create_imagenet.sh復制出來，放到一個文件夾內。例如我放到了/Desktop/convertLMDB中。將數據集，train.txt和test.txt也放在convertTMDB文件夾中，如圖所示。

修改create_imagenet.sh文件，如下面的中文注釋所示，大家按需更改，

#!/usr/bin/env sh
# Create the imagenet lmdb inputs
# N.B. set the path to the imagenet train + val data dirs
set -e

# 生成的LMDB文件存放的位置
EXAMPLE=/home/dengshunge/Desktop/convertLMDB
# train.txt和test.txt文件放置的位置
DATA=/home/dengshunge/Desktop/convertLMDB
# caffe/build/tools的位置
TOOLS=/home/dengshunge/caffe/build/tools

# 訓練集和測試集的位置，記得，最后的 '/' 不要漏了
TRAIN_DATA_ROOT=/home/dengshunge/Desktop/convertLMDB/plate_dataV6/train_data/
VAL_DATA_ROOT=/home/dengshunge/Desktop/convertLMDB/plate_dataV6/test_data/

# Set RESIZE=true to resize the images to 256x256. Leave as false if images have
# already been resized using another tool.
# 如果需要給該輸入圖片的大小，將RESIZE設置成true，並圖片的高度和寬度
RESIZE=true
if $RESIZE; then
  RESIZE_HEIGHT=30
  RESIZE_WIDTH=120
else
  RESIZE_HEIGHT=0
  RESIZE_WIDTH=0
fi

if [ ! -d "$TRAIN_DATA_ROOT" ]; then
  echo "Error: TRAIN_DATA_ROOT is not a path to a directory: $TRAIN_DATA_ROOT"
  echo "Set the TRAIN_DATA_ROOT variable in create_imagenet.sh to the path" \
       "where the ImageNet training data is stored."
  exit 1
fi

if [ ! -d "$VAL_DATA_ROOT" ]; then
  echo "Error: VAL_DATA_ROOT is not a path to a directory: $VAL_DATA_ROOT"
  echo "Set the VAL_DATA_ROOT variable in create_imagenet.sh to the path" \
       "where the ImageNet validation data is stored."
  exit 1
fi

echo "Creating train lmdb..."

# EXAMPLE/ilsvrc12_train_lmdb中的ilsvrc12_train_lmdb為LMDB的命名，可以按需更改
# DATA/train.txt要與自己生成train.txt名字相對應，不然得更改
# test lmdb同理
GLOG_logtostderr=1 $TOOLS/convert_imageset \
    --resize_height=$RESIZE_HEIGHT \
    --resize_width=$RESIZE_WIDTH \
    --shuffle \
    $TRAIN_DATA_ROOT \
    $DATA/train.txt \
    $EXAMPLE/train_lmdb

echo "Creating test lmdb..."

GLOG_logtostderr=1 $TOOLS/convert_imageset \
    --resize_height=$RESIZE_HEIGHT \
    --resize_width=$RESIZE_WIDTH \
    --shuffle \
    $VAL_DATA_ROOT \
    $DATA/test.txt \
    $EXAMPLE/test_lmdb

echo "Done."

第三步生成LMDB文件

在命令行中輸入，./create_imagenet.sh

dengshunge@computer-5054:~/Desktop/convertLMDB$ ./create_imagenet.sh -shuffle

最后會生成如下圖所示。生成的LMDB大小如果只有十幾KB的話，有可能是生成失敗了。可以看到生成LMDB的時候，會自動打亂數據

最后，大家可以前去我的github來下載create_imagenet.sh文件與數據預處理.py文件，大家根據需求進行更改就行。

檢測任務

生成的方法主要參考了這位博主的文章。本次使用的是github上的Tiny-DSOD版本的caffe，大家可以看一下Tiny-DSOD/data文件夾，可以清楚看到需要准備的東西。

第一步准備image文件和xml文件

對於檢測任務，當然是少不了標注信息的，因此，需要准備以下幾個文件：

圖像文件
標簽文件，是按照pascal voc格式的 xml文件，一張圖像對應一個xml文件，圖片名與標簽文件名相同

如圖所示，左邊是圖像文件，右圖對應的xml文件

第二步生成train.txt和test.txt

首先，我們看一下tran.txt和test.txt的格式是怎樣的。如圖所示，每一行由2個部分組成，左邊是圖片的地址，右邊是對應圖片的xml地址，兩者用空格相連。因此，知道了格式后，我們就可以生成了。那么地址是需要是怎樣呢？下面我們會講到，這個地址是一個相對地址，之后會與"create_data.sh"中的“data_root_dir”結合，生成絕對地址。

第三步生成labelmap.prototxt和test_name_size.txt

首先，看一下labelmap.prototxt的格式是怎么樣的。如下圖所示，是有多個item組成的，label為0的item是背景，接下來就是你自己標注的label，label的編號最好連續，而且每個label對應的Name需要和xml里面的name一致。

然后再看看test_name_size.txt，如下圖所示。由3列組成，第一列是圖片的名稱，第二、三列分別是圖片的高和寬。注意，這里圖片的名稱沒有后綴名。這個文件不知道有什么用，下面函數調用中，並沒有引入這個文件。

因此，對於labelmap.prototxt的生成，可以手動進行修改；而train.txt，test.txt和test_name_size.txt的生成，這里提供了一個函數模板，大家可以按需進行修改。

第四步生成LMDB

這里對Tiny-DSOD/data/VOC0712/create_data.sh進行了修改，如下所示。root_dir設置caffe的路徑，這里主要是用於調用這個路徑下的scripts/create_annoset.py；lmdbFile是生成LMDB的地址，而lmdbLink是這個lmdbFile的軟連接。其他地方，都有注釋了，應該能看懂。

cur_dir=$(cd $( dirname ${BASH_SOURCE[0]} ) && pwd )
# caffe的路徑
root_dir="/home/dengshunge/Tiny-DSOD-master"

cd $root_dir

redo=1
# 數據的根目錄，與txt的文件結合
data_root_dir="/home/dengshunge/Desktop/data"
# trainval.txt和test.txt的路徑
txtFileDir="/home/dengshunge/Desktop/LMDB"
# LMDB存儲位置
lmdbFile="/home/dengshunge/Desktop/LMDB/lmdb"
# LMDB存儲位置的軟連接
lmdbLink="/home/dengshunge/Desktop/LMDB/lmdbLink"
# mapfile位置
mapfile="/home/dengshunge/Desktop/LMDB/labelmap.prototxt"
# 任務類型
anno_type="detection"
# 格式
db="lmdb"
# 圖片尺寸，若width,height=0,0，說明按原始圖片輸入尺寸，否則resize到(width,height)
min_dim=0
max_dim=0
width=300
height=300

extra_cmd="--encode-type=jpg --encoded"
if [ $redo ]
then
  extra_cmd="$extra_cmd --redo"
fi
for subset in test trainval
do
  python3 $root_dir/scripts/create_annoset.py --anno-type=$anno_type --label-map-file=$mapfile --min-dim=$min_dim --max-dim=$max_dim --resize-width=$width --resize-height=$height --check-label $extra_cmd $data_root_dir $txtFileDir/$subset.txt $lmdbFile/$subset"_"$db $lmdbLink
done

另外提一點，Tiny-DSOD/scripts/create_annoset.py似乎存在一點不足，對其進行了如下修改，是針對開頭的部分的，其余內容不變。

import argparse
import os
import shutil
import subprocess
import sys

# 改成你的caffe路徑
caffe_root = "/home/dengshunge/Tiny-DSOD-master"

# sys.path.append(caffe_root)
sys.path.insert(0,caffe_root+'/python')

from caffe.proto import caffe_pb2
from google.protobuf import text_format

運行這個create_data.sh文件，既可生成相應的LMDB文件。

我把這幾個文件放在我的github上了，大家可以下載來進行使用。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Caffe3——ImageNet數據集創建lmdb類型的數據 Caffe1——Mnist數據集創建lmdb或leveldb類型的數據 Caffe2——cifar10數據集創建lmdb或leveldb類型的數據 caffe訓練自己的數據集（原）caffe中通過圖像生成lmdb格式的數據 Wider Face 轉VOC格式制作LMDB數據集 caffe生成voc格式lmdb fcn+caffe+制作自己的數據集 caffe讀取多標簽的lmdb數據 [caffe(一)]使用caffe訓練mnist數據集

caffe數據集LMDB的生成

分類任務

第一步 生成train.txt和test.txt文件文件

第二步 修改create_imagenet.sh

第三步 生成LMDB文件

檢測任務

第一步 准備image文件和xml文件

第二步 生成train.txt和test.txt

第三步 生成labelmap.prototxt和test_name_size.txt

第四步 生成LMDB