mxnet快速入門教程

本文轉載自查看原文 2018-12-14 11:44 10173 深度學習框架/ mxnet

　　前段時間工作中用到了MXnet，然而MXnet的文檔寫的實在是.....所以在這記錄點東西，方便自己，也方便大家。

一、MXnet的安裝及使用

　　開源地址：https://github.com/dmlc/mxnet

　　如下是單節點的具體安裝和實驗流程，參考於官方文檔：http://mxnt.ml/en/latest/build.html#building-on-linux

　　1.1、基本依賴的安裝

　　sudo apt-get update

　　sudo apt-get install -y build-essential git libblas-dev libopencv-dev

　　1.2、下載mxnet

　　git clone --recursive https://github.com/dmlc/mxnet

　　1.3、安裝cuda

　　詳見博客：http://blog.csdn.net/a350203223/article/details/50262535

　　1.4、編譯支持GPU的MXnet

　　將mxnet/目錄里找到mxnet/make/子目錄，把該目錄下的config.mk復制到mxnet/目錄，用文本編輯器打開，找到並修改以下兩行：

　　USE_CUDA = 1

　　USE_CUDA_PATH = /usr/local/cuda

　　修改之后，在mxnet/目錄下編譯

　　make -j4

　　1.5、安裝Python支持

　　cd python;

　　python setup.py install

　　有些時候需要安裝setuptools和numpy(sudo apt-get install python-numpy)。

　　1.6、運行Mnist手寫體識別實例

　　MNIST手寫數字識別，數據集包含6萬個手寫數字的訓練數據集以及1萬個測試數據集，每個圖片是28x28的灰度圖。在mxnet/example/image-classification里可以找到MXnet自帶MNIST的識別樣例，我們可以先運行一下試試：

　　cd mxnet/example/image-classification

　　python train_mnist.py

　　在第一次運行的時候會自動下載MNIST數據集。

　　以上的命令是使用默認的參數運行，即使用mlp網絡，在cpu上計算。

　　如果使用lenet網絡，在GPU上實現加速，則使用如下命令：

　　python train_mnist.py --gpus 0 --network lenet

　　想要搞清楚一個框架怎么使用，第一步就是用它來訓練自己的數據，這是個很關鍵的一步。

二、MXnet數據預處理

　　整個數據預處理的代碼都集成在了toosl/im2rec.py中了，這個首先要造出一個list文件，lst文件有三列，分別是index label 圖片路徑。如下圖所示：

　　我這個label是瞎填的，所以都是0。另外最新的MXnet上面的im2rec是有問題的，它生成的list所有的index都是0，不過據說這個index沒什么用.....但我還是改了一下。把yield生成器換成直接append即可。

　　執行的命令如下：

　　　　sudo python im2rec.py --list=True /home/erya/dhc/result/try /home/erya/dhc/result/ --recursive=True --shuffle=true --train-ratio=0.8

　　每個參數的意義在代碼內部都可以查到，簡單說一下這里用到的：--list=True說明這次的目的是make list，后面緊跟的是生成的list的名字的前綴，我這里是加了路徑，然后是圖片所在文件夾的路徑，recursive是是否迭代的進入文件夾讀取圖片，--train-ratio則表示train和val在數據集中的比例。

　　執行上面的命令后，會得到三個文件：

然后再執行下面的命令生成最后的rec文件：

　　sudo python im2rec.py /home/erya/dhc/result/try_val.lst /home/erya/dhc/result --quality=100

以及，sudo python im2rec.py /home/erya/dhc/result/try_train.lst /home/erya/dhc/result --quality=100

　來生成相應的lst文件的rec文件，參數意義太簡單就不說了..看着就明白，result是我存放圖片的目錄。

　　這樣最終就完成了數據的預處理，簡單的說，就是先生成lst文件，這個其實完全可以自己做，而且后期我做segmentation的時候，label就是圖片了..

三、非常簡單的小demo

　　先上代碼：

import mxnet as mx
import logging
import numpy as np

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)#暫時不需要管的log
def ConvFactory(data, num_filter, kernel, stride=(1,1), pad=(0, 0), act_type="relu"):
    conv = mx.symbol.Convolution(data=data, workspace=256,
                                 num_filter=num_filter, kernel=kernel, stride=stride, pad=pad)
    return conv   #我把這個刪除到只有一個卷積的操作
def DownsampleFactory(data, ch_3x3):
    # conv 3x3
    conv = ConvFactory(data=data, kernel=(3, 3), stride=(2, 2), num_filter=ch_3x3, pad=(1, 1))
    # pool
    pool = mx.symbol.Pooling(data=data, kernel=(3, 3), stride=(2, 2), pool_type='max')
    # concat
    concat = mx.symbol.Concat(*[conv, pool])
    return concat
def SimpleFactory(data, ch_1x1, ch_3x3):
    # 1x1
    conv1x1 = ConvFactory(data=data, kernel=(1, 1), pad=(0, 0), num_filter=ch_1x1)
    # 3x3
    conv3x3 = ConvFactory(data=data, kernel=(3, 3), pad=(1, 1), num_filter=ch_3x3)
    #concat
    concat = mx.symbol.Concat(*[conv1x1, conv3x3])
    return concat
if __name__ == "__main__":
    batch_size = 1
    train_dataiter = mx.io.ImageRecordIter(
        shuffle=True,
        path_imgrec="/home/erya/dhc/result/try_train.rec",
        rand_crop=True,
        rand_mirror=True,
        data_shape=(3,28,28),
        batch_size=batch_size,
        preprocess_threads=1)#這里是使用我們之前的創造的數據，簡單的說就是要自己寫一個iter，然后把相應的參數填進去。
    test_dataiter = mx.io.ImageRecordIter(
        path_imgrec="/home/erya/dhc/result/try_val.rec",
        rand_crop=False,
        rand_mirror=False,
        data_shape=(3,28,28),
        batch_size=batch_size,
        round_batch=False,
        preprocess_threads=1)#同理
    data = mx.symbol.Variable(name="data")
    conv1 = ConvFactory(data=data, kernel=(3,3), pad=(1,1), num_filter=96, act_type="relu")
    in3a = SimpleFactory(conv1, 32, 32)
    fc = mx.symbol.FullyConnected(data=in3a, num_hidden=10)
    softmax = mx.symbol.SoftmaxOutput(name='softmax',data=fc)#上面就是定義了一個巨巨巨簡單的結構
    # For demo purpose, this model only train 1 epoch
    # We will use the first GPU to do training
    num_epoch = 1
    model = mx.model.FeedForward(ctx=mx.gpu(), symbol=softmax, num_epoch=num_epoch,
                             learning_rate=0.05, momentum=0.9, wd=0.00001) #將整個model訓練的架構定下來了，類似於caffe里面solver所做的事情。

# we can add learning rate scheduler to the model
# model = mx.model.FeedForward(ctx=mx.gpu(), symbol=softmax, num_epoch=num_epoch,
#                              learning_rate=0.05, momentum=0.9, wd=0.00001,
#                              lr_scheduler=mx.misc.FactorScheduler(2))
model.fit(X=train_dataiter,
          eval_data=test_dataiter,
          eval_metric="accuracy",
          batch_end_callback=mx.callback.Speedometer(batch_size))#開跑數據。

四、detaiter

　　MXnet的設計結構是C++做后端運算，python、R等做前端來使用，這樣既兼顧了效率，又讓使用者方便了很多，完整的使用MXnet訓練自己的數據集需要了解幾個方面。今天我們先談一談Data iterators。

　　MXnet中的data iterator和python中的迭代器是很相似的，當其內置方法next被call的時候它每次返回一個 data batch。所謂databatch，就是神經網絡的輸入和label，一般是(n, c, h, w)的格式的圖片輸入和(n, h, w)或者標量式樣的label。直接上官網上的一個簡單的例子來說說吧。

import numpy as np
class SimpleIter:
    def __init__(self, data_names, data_shapes, data_gen,
                 label_names, label_shapes, label_gen, num_batches=10):
        self._provide_data = zip(data_names, data_shapes)
        self._provide_label = zip(label_names, label_shapes)
        self.num_batches = num_batches
        self.data_gen = data_gen
        self.label_gen = label_gen
        self.cur_batch = 0

    def __iter__(self):
        return self

    def reset(self):
        self.cur_batch = 0        

    def __next__(self):
        return self.next()

    @property
    def provide_data(self):
        return self._provide_data

    @property
    def provide_label(self):
        return self._provide_label

    def next(self):
        if self.cur_batch < self.num_batches:
            self.cur_batch += 1
            data = [mx.nd.array(g(d[1])) for d,g in zip(self._provide_data, self.data_gen)]
            assert len(data) > 0, "Empty batch data."
            label = [mx.nd.array(g(d[1])) for d,g in zip(self._provide_label, self.label_gen)]
            assert len(label) > 0, "Empty batch label."
            return SimpleBatch(data, label)
        else:
            raise StopIteration

　　上面的代碼是最簡單的一個dataiter了，沒有對數據的預處理，甚至於沒有自己去讀取數據，但是基本的意思是到了，一個dataiter必須要實現上面的幾個方法，provide_data返回的格式是(dataname, batchsize, channel, width, height)， provide_label返回的格式是(label_name, batchsize, width, height),reset()的目的是在每個epoch后打亂讀取圖片的順序，這樣隨機采樣的話訓練效果會好一點，一般情況下是用shuffle你的lst（上篇用來讀取圖片的lst）實現的，next()的方法就很顯然了，用來返回你的databatch，如果出現問題...記得raise stopIteration，這里或許用try更好吧...需要注意的是，databatch返回的數據類型是mx.nd.ndarry。

　　下面是我最近做segmentation的時候用的一個稍微復雜的dataiter，多了預處理和shuffle等步驟：

# pylint: skip-file
import random

import cv2
import mxnet as mx
import numpy as np
import os
from mxnet.io import DataIter, DataBatch


class FileIter(DataIter): #一般都是繼承DataIter
    """FileIter object in fcn-xs example. Taking a file list file to get dataiter.
    in this example, we use the whole image training for fcn-xs, that is to say
    we do not need resize/crop the image to the same size, so the batch_size is
    set to 1 here
    Parameters
    ----------
    root_dir : string
        the root dir of image/label lie in
    flist_name : string
        the list file of iamge and label, every line owns the form:
        index \t image_data_path \t image_label_path
    cut_off_size : int
        if the maximal size of one image is larger than cut_off_size, then it will
        crop the image with the minimal size of that image
    data_name : string
        the data name used in symbol data(default data name)
    label_name : string
        the label name used in symbol softmax_label(default label name)
    """

    def __init__(self, root_dir, flist_name, rgb_mean=(117, 117, 117),
                 data_name="data", label_name="softmax_label", p=None):
        super(FileIter, self).__init__()

        self.fac = p.fac #這里的P是自己定義的config
        self.root_dir = root_dir
        self.flist_name = os.path.join(self.root_dir, flist_name)
        self.mean = np.array(rgb_mean)  # (R, G, B)
        self.data_name = data_name
        self.label_name = label_name
        self.batch_size = p.batch_size
        self.random_crop = p.random_crop
        self.random_flip = p.random_flip
        self.random_color = p.random_color
        self.random_scale = p.random_scale
        self.output_size = p.output_size
        self.color_aug_range = p.color_aug_range
        self.use_rnn = p.use_rnn
        self.num_hidden = p.num_hidden
        if self.use_rnn:
            self.init_h_name = 'init_h'
            self.init_h = mx.nd.zeros((self.batch_size, self.num_hidden))
        self.cursor = -1

        self.data = mx.nd.zeros((self.batch_size, 3, self.output_size[0], self.output_size[1]))
        self.label = mx.nd.zeros((self.batch_size, self.output_size[0] / self.fac, self.output_size[1] / self.fac))
        self.data_list = []
        self.label_list = []
        self.order = []
        self.dict = {}
        lines = file(self.flist_name).read().splitlines()
        cnt = 0
        for line in lines: #讀取lst，為后面讀取圖片做好准備
            _, data_img_name, label_img_name = line.strip('\n').split("\t")
            self.data_list.append(data_img_name)
            self.label_list.append(label_img_name)
            self.order.append(cnt)
            cnt += 1
        self.num_data = cnt
        self._shuffle()

    def _shuffle(self):
        random.shuffle(self.order)

    def _read_img(self, img_name, label_name):
　　　　　# 這個是在服務器上跑的時候，因為數據集很小，而且經常被同事卡IO，所以我就把數據全部放進了內存
        if os.path.join(self.root_dir, img_name) in self.dict:
            img = self.dict[os.path.join(self.root_dir, img_name)]
        else:
            img = cv2.imread(os.path.join(self.root_dir, img_name))
            self.dict[os.path.join(self.root_dir, img_name)] = img

        if os.path.join(self.root_dir, label_name) in self.dict:
            label = self.dict[os.path.join(self.root_dir, label_name)]
        else:
            label = cv2.imread(os.path.join(self.root_dir, label_name),0)
            self.dict[os.path.join(self.root_dir, label_name)] = label


　　　　 # 下面是讀取圖片后的一系統預處理工作
        if self.random_flip:
            flip = random.randint(0, 1)
            if flip == 1:
                img = cv2.flip(img, 1)
                label = cv2.flip(label, 1)
        # scale jittering
        scale = random.uniform(self.random_scale[0], self.random_scale[1])
        new_width = int(img.shape[1] * scale)  # 680
        new_height = int(img.shape[0] * scale)  # new_width * img.size[1] / img.size[0]
        img = cv2.resize(img, (new_width, new_height), interpolation=cv2.INTER_NEAREST)
        label = cv2.resize(label, (new_width, new_height), interpolation=cv2.INTER_NEAREST)
        #img = cv2.resize(img, (900,450), interpolation=cv2.INTER_NEAREST)
        #label = cv2.resize(label, (900, 450), interpolation=cv2.INTER_NEAREST)
        if self.random_crop:
            start_w = np.random.randint(0, img.shape[1] - self.output_size[1] + 1)
            start_h = np.random.randint(0, img.shape[0] - self.output_size[0] + 1)
            img = img[start_h : start_h + self.output_size[0], start_w : start_w + self.output_size[1], :]
            label = label[start_h : start_h + self.output_size[0], start_w : start_w + self.output_size[1]]
        if self.random_color:
            img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
            hue = random.uniform(-self.color_aug_range[0], self.color_aug_range[0])
            sat = random.uniform(-self.color_aug_range[1], self.color_aug_range[1])
            val = random.uniform(-self.color_aug_range[2], self.color_aug_range[2])
            img = np.array(img, dtype=np.float32)
            img[..., 0] += hue
            img[..., 1] += sat
            img[..., 2] += val
            img[..., 0] = np.clip(img[..., 0], 0, 255)
            img[..., 1] = np.clip(img[..., 1], 0, 255)
            img[..., 2] = np.clip(img[..., 2], 0, 255)
            img = cv2.cvtColor(img.astype('uint8'), cv2.COLOR_HSV2BGR)
            is_rgb = True
        #cv2.imshow('main', img)
        #cv2.waitKey()
        #cv2.imshow('maain', label)
        #cv2.waitKey()
        img = np.array(img, dtype=np.float32)  # (h, w, c)
        reshaped_mean = self.mean.reshape(1, 1, 3)
        img = img - reshaped_mean
        img[:, :, :] = img[:, :, [2, 1, 0]]
        img = img.transpose(2, 0, 1)
        # img = np.expand_dims(img, axis=0)  # (1, c, h, w)

        label_zoomed = cv2.resize(label, None, fx = 1.0 / self.fac, fy = 1.0 / self.fac)
        label_zoomed = label_zoomed.astype('uint8')
        return (img, label_zoomed)

    @property
    def provide_data(self):
        """The name and shape of data provided by this iterator"""
        if self.use_rnn:
            return [(self.data_name, (self.batch_size, 3, self.output_size[0], self.output_size[1])),
                    (self.init_h_name, (self.batch_size, self.num_hidden))]
        else:
            return [(self.data_name, (self.batch_size, 3, self.output_size[0], self.output_size[1]))]

    @property
    def provide_label(self):
        """The name and shape of label provided by this iterator"""
        return [(self.label_name, (self.batch_size, self.output_size[0] / self.fac, self.output_size[1] / self.fac))]

    def get_batch_size(self):
        return self.batch_size

    def reset(self):
        self.cursor = -self.batch_size
        self._shuffle()

    def iter_next(self):
        self.cursor += self.batch_size
        return self.cursor < self.num_data

    def _getpad(self):
        if self.cursor + self.batch_size > self.num_data:
            return self.cursor + self.batch_size - self.num_data
        else:
            return 0

    def _getdata(self):
        """Load data from underlying arrays, internal use only"""
        assert(self.cursor < self.num_data), "DataIter needs reset."
        data = np.zeros((self.batch_size, 3, self.output_size[0], self.output_size[1]))
        label = np.zeros((self.batch_size, self.output_size[0] / self.fac, self.output_size[1] / self.fac))
        if self.cursor + self.batch_size <= self.num_data:
            for i in range(self.batch_size):
                idx = self.order[self.cursor + i]
                data_, label_ = self._read_img(self.data_list[idx], self.label_list[idx])
                data[i] = data_
                label[i] = label_
        else:
            for i in range(self.num_data - self.cursor):
                idx = self.order[self.cursor + i]
                data_, label_ = self._read_img(self.data_list[idx], self.label_list[idx])
                data[i] = data_
                label[i] = label_
            pad = self.batch_size - self.num_data + self.cursor
            #for i in pad:
            for i in range(pad):
                idx = self.order[i]
                data_, label_ = self._read_img(self.data_list[idx], self.label_list[idx])
                data[i + self.num_data - self.cursor] = data_
                label[i + self.num_data - self.cursor] = label_
        return mx.nd.array(data), mx.nd.array(label)

    def next(self):
        """return one dict which contains "data" and "label" """
        if self.iter_next():
            data, label = self._getdata()
            data = [data, self.init_h] if self.use_rnn else [data]
            label = [label]
            return DataBatch(data=data, label=label,
                             pad=self._getpad(), index=None,
                             provide_data=self.provide_data,
                             provide_label=self.provide_label)
        else:
            raise StopIteration

　　到這里基本上正常的訓練我們就可以開始了，但是當你有了很多新的想法的時候，你又會遇到新的問題...比如：multi input/output怎么辦？

　　其實也很簡單，只需要修改幾個地方：

　　　　1、provide_label和provide_data，注意到之前我們的return都是一個list，所以之間在里面添加和之前一樣的格式就行了。

　　　　2. next() 如果你需要傳 data和depth兩個輸入，只需要傳 input = sum([[data],[depth],[]])到databatch的data就行了，label也同理。

　　值得一提的時候，MXnet的multi loss實現起來需要在寫network的symbol的時候注意一點，假設你有softmax_loss和regression_loss。那么只要在最后return mx.symbol.Group([softmax_loss, regression_loss])。

　　我們在MXnet中定義好symbol、寫好dataiter並且准備好data之后，就可以開開心的去訓練了。一般訓練一個網絡有兩種常用的策略，基於model的和基於module的。接下來談一談他們的使用。

五、Model

　　按照老規矩，直接從官方文檔里面拿出來的代碼看一下：

# configure a two layer neuralnetwork
    data = mx.symbol.Variable('data')
    fc1 = mx.symbol.FullyConnected(data, name='fc1', num_hidden=128)
    act1 = mx.symbol.Activation(fc1, name='relu1', act_type='relu')
    fc2 = mx.symbol.FullyConnected(act1, name='fc2', num_hidden=64)
    softmax = mx.symbol.SoftmaxOutput(fc2, name='sm')
# create a model using sklearn-style two-step way
#創建一個model
   model = mx.model.FeedForward(
         softmax,
         num_epoch=num_epoch,
         learning_rate=0.01)
#開始訓練
    model.fit(X=data_set)

　　具體的API參照http://mxnet.io/api/python/model.html。

　　然后呢，model這部分就說完了。。。之所以這么快主要有兩個原因：

　　　　1.確實東西不多，一般都是查一查文檔就可以了。

　　　　2.model的可定制性不強，一般我們是很少使用的，常用的還是module。

六、Module

　　Module真的是一個很棒的東西，雖然深入了解后，你會覺得“哇，好厲害，但是感覺沒什么鳥用呢”這種想法。。實際上我就有過，現在回想起來，從代碼的設計和使用的角度來講，Module確實是一個非常好的東西，它可以為我們的網絡計算提高了中級、高級的接口，這樣一來，就可以有很多的個性化配置讓我們自己來做了。

　　Module有四種狀態：

　　　　1.初始化狀態，就是顯存還沒有被分配，基本上啥都沒做的狀態。

　　　　2.binded，在把data和label的shape傳到Bind函數里並且執行之后，顯存就分配好了，可以准備好計算能力。

　　　　3.參數初始化。就是初始化參數

　　　　3.Optimizer installed 。就是傳入SGD，Adam這種optimuzer中去進行訓練　

　　先上一個簡單的代碼：

import mxnet as mx
 
    # construct a simple MLP
    data = mx.symbol.Variable('data')
    fc1  = mx.symbol.FullyConnected(data, name='fc1', num_hidden=128)
    act1 = mx.symbol.Activation(fc1, name='relu1', act_type="relu")
    fc2  = mx.symbol.FullyConnected(act1, name = 'fc2', num_hidden = 64)
    act2 = mx.symbol.Activation(fc2, name='relu2', act_type="relu")
    fc3  = mx.symbol.FullyConnected(act2, name='fc3', num_hidden=10)
    out  = mx.symbol.SoftmaxOutput(fc3, name = 'softmax')
 
    # construct the module
    mod = mx.mod.Module(out)
    
     mod.bind(data_shapes=train_dataiter.provide_data,
         label_shapes=train_dataiter.provide_label)
    
     mod.init_params()
     mod.fit(train_dataiter, eval_data=eval_dataiter,
            optimizer_params={'learning_rate':0.01, 'momentum': 0.9},
            num_epoch=n_epoch)

　　分析一下：首先是定義了一個簡單的MLP，symbol的名字就叫做out，然后可以直接用mx.mod.Module來創建一個mod。之后mod.bind的操作是在顯卡上分配所需的顯存，所以我們需要把data_shapehe label_shape傳遞給他，然后初始化網絡的參數，再然后就是mod.fit開始訓練了。這里補充一下。fit這個函數我們已經看見兩次了，實際上它是一個集成的功能，mod.fit（）實際上它內部的核心代碼是這樣的：

for epoch in range(begin_epoch, num_epoch):
             tic = time.time()
             eval_metric.reset()
             for nbatch, data_batch in enumerate(train_data):
                 if monitor is not None:
                     monitor.tic()
                 self.forward_backward(data_batch) #網絡進行一次前向傳播和后向傳播
                 self.update()  #更新參數
                 self.update_metric(eval_metric, data_batch.label) #更新metric
  
  
                 if monitor is not None:
                     monitor.toc_print()
  
 
                 if batch_end_callback is not None:
                     batch_end_params = BatchEndParam(epoch=epoch, nbatch=nbatch,
                                                      eval_metric=eval_metric,
                                                      locals=locals())
                     for callback in _as_list(batch_end_callback):
                         callback(batch_end_params)

　　正是因為module里面我們可以使用很多intermediate的interface，所以可以做出很多改進，舉個最簡單的例子：如果我們的訓練網絡是大小可變怎么辦？我們可以實現一個mutumodule，基本上就是，每次data的shape變了的時候，我們就重新bind一下symbol，這樣訓練就可以照常進行了。

　　總結：實際上學一個框架的關鍵還是使用它，要說訣竅的話也就是多看看源碼和文檔了，我寫這些博客的目的，一是為了記錄一些東西，二是讓后來者少走一些彎路。所以有些東西不會說的很全。。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 GraphQL快速入門教程 OpenCL快速入門教程 sklearn 快速入門教程 Koa快速入門教程（一） Kibana 快速入門教程 pouchdb快速入門教程 Guice快速入門教程 Docker 快速入門教程 MongoDb 快速入門教程 Materialize快速入門教程