mxnet的訓練過程——從python到C++

本文轉載自查看原文 2017-09-27 22:23 4425 C++/ MXNet/ 人工智能

mxnet的訓練過程——從python到C++

mxnet（github-mxnet）的python接口相當完善，我們可以完全不看C++的代碼就能直接訓練模型，如果我們要學習它的C++的代碼，從python訓練與預測的模型中可以看到C++的代碼是怎么被調用的。上一篇博客中，我已經說明了mshadow的工作原理——mshadow的原理--MXNet；在這一篇中，來說明一下mxnet的訓練過程，看python是調用發哪些C++的接口，但對C++接口的更進一步解釋並沒有很詳細，具體可以自己看源碼，后面也可能會有新的博客解釋。

實驗代碼

下面是mxnet訓練的簡單樣例代碼，python調試所用的工具是Wing Pro，C++的調試工具推薦使用Qt Creator，Qt Creator要求有Cmakelist，然后要打開Debug編譯相關的so文件才能調試。

# -*- coding: utf-8 -*-
import mxnet as mx
import numpy as np
import logging
logging.getLogger().setLevel(logging.DEBUG)

# product data
def productData(Dim, half_len):
    '''
    product data for training or eval

    Dim : dimension
    half_len : 2*half_len is the number of training data
    '''

    data = np.append(np.random.uniform(-1, 0, [half_len, Dim]),
                           np.random.uniform(0, 1, [half_len, Dim]), axis = 0)
    label = np.append(np.zeros(half_len), np.ones(half_len))

    return data, label

#get the data
np.random.seed(1)
Dim = 3
train_data,train_label = productData(Dim, 1)
eval_data, eval_label = productData(Dim, 1)

#data iter
batch_size = 1
train_iter = mx.io.NDArrayIter(train_data,train_label, batch_size, shuffle=True)
eval_iter = mx.io.NDArrayIter(eval_data, eval_label, batch_size, shuffle=False)

#input variable
X = mx.sym.Variable('data')
Y = mx.symbol.Variable('softmax_label')

#netword config
fc_1  = mx.sym.FullyConnected(data=X, name='fc1', num_hidden = 2)
fc_2  = mx.sym.FullyConnected(data=fc_1, name='fc2', num_hidden = 3)
fc_3  = mx.sym.FullyConnected(data=fc_2, name='fc3', num_hidden = 4)
lro = mx.sym.SoftmaxOutput(data=fc_3, label=Y, name="softmax")

#build the model
model = mx.mod.Module(
    symbol = lro ,
    data_names=['data'],
    label_names = ['softmax_label']# network structure
)

#train the model
model.fit(train_iter, eval_iter,
            optimizer_params={'learning_rate':0.5, 'momentum': 0.9},
            num_epoch=1,
            eval_metric='mse',
            batch_end_callback = mx.callback.Speedometer(batch_size, 1))

#predict the result
pre = model.predict(eval_iter).asnumpy()
print np.argmax(pre, axis = 1)

上面的代碼十分簡單，對於mxnet python訓練的人都很容易看明白第一點，在這里不展開講這些python代碼的具體意義，而講這些代碼是怎么與mxnet底層的C++代碼交互的，python與C++交互的python庫ctypes，本人用的mxnet版本是0.7，其它版本的代碼結構不會差別太大。

Create Variable

mx.io.NDArrayIter沒有引用到C++的函數，當創建一個變量符號（Symbol Variable）時，會引用到MXSymbolCreateVariable函數。要注意的是調用的python函數如果是mxnet包內的，就會引用包的相應函數，調用的C++函數都會封裝在C_api.h中，對應的函數在./src/c_api下。調用過程以下：Variable()_python --> MXSymbolCreateVariable()_C++ --> CreateVariable()_C++。我們來看一下C++中Symbol類及其與之相關的結構體：

/*!
 * \brief Symbol is used to represent dynamically generated symbolic computation graph.
 *
 *   This class is used as a tool to generate computation graphs(aka. configuration) of the network.
 *   Symbol is always composite, the head Node is the output node of the symbol.
 *   An atomic symbol can be seen as a special case of the composite symbol with only the head node.
 */
class Symbol {
 public:
 ...
 protected:
  // Declare node, internal data structure.
  struct Node;
  /*! \brief an entry that represents output data from a node */
  struct DataEntry {
    /*! \brief the source node of this data */
    std::shared_ptr<Node> source;
    /*! \brief index of output from the source. */
    uint32_t index;
    /*! \brief enabled default copy constructor */
    DataEntry() {}
    /*! \brief constructor from index */
    DataEntry(std::shared_ptr<Node> source, uint32_t index)
        : source(source), index(index) {}
  };
  /*!
   * \brief the head nodes of Symbols
   * This head is only effective when
   */
  std::vector<DataEntry> heads_;
 ...
}

/*!
 * \brief Node is represents node of an operator in the symbolic graph.
 *
 * It stores connection to the inputs to function represented by OperatorProperty
 * NOTE on data structure: there are three types of node:
 * - Normal node: contains all the necessary elements of a graph.
 * - OperatorProperty: the inputs_ is empty, represents an OperatorProperty that has not been applied.
 * - Variable: the sym_ is nullptr, represents an named Variable of tensors that can be composed.
 */
struct Symbol::Node {
  /*! \brief Operator of this node */
  std::unique_ptr<OperatorProperty> op;
  /*! \brief name of the node */
  std::string name;
  /*! \brief inputs to this node */
  std::vector<DataEntry> inputs;
  /*! \brief source node of the current node */
  std::shared_ptr<Symbol::Node> backward_source_node;
  /*!
   * \brief additional attributes about the node,
   *  Use pointer to save space, as attr can be accessed in a slow way,
   *  not every node will have attributes.
   */
  std::unique_ptr<std::map<std::string, std::string> > attr;
  /*!
    *\brief constructor
    *\param op the OperatorProperty to construct the Node
    *\param name the name of the symbol
   */
  explicit Node(OperatorProperty *op,
                const std::string& name)
      : op(op), name(name) {}
  /*!
    *\brief copy constructor constructor
   */
  explicit Node(const Node& other)
      : name(other.name) {
    if (other.op != nullptr) {
      op.reset(other.op->Copy());
    }
    if (other.attr.get() != nullptr) {
      attr.reset(new std::map<std::string, std::string>(*(other.attr)));
    }
  }
  ~Node() {
   ...
  }
  /*! \return Whether the symbol is atomic */
  inline bool is_atomic() const {
    return inputs.size() == 0 && op != nullptr;
  }
  /*! \return Whether it is unit variable */
  inline bool is_variable() const {
    return op == nullptr && !backward_source_node;
  }
  /*! \return Whether it is backward op */
  inline bool is_backward() const {
    return backward_source_node.get() != nullptr;
  }
};

/*! \return whwther the symbol is atomic */
inline bool Symbol::is_atomic() const {
  return heads_[0].source->is_atomic();
}

通過上面的inline bool is_variable()函數可以看到variable的特點，創建一個variable也特別簡單，直接創建一個Symbol的並把初始數據壓入到heads_容器中就能創建，如下：

Symbol Symbol::CreateVariable(const std::string &name) {
  Symbol s;
  s.heads_.push_back(DataEntry(std::make_shared<Node>(nullptr, name), 0));
  return s;
}

在mxnet中層(mx.sym.FullyConnected\mx.sym.SoftmaxOutput等)和變量都是Symbol。

python動態加載函數

mxnet中的層的種類可能是會發生變化的，當用C++寫一個新的層時，都要先注冊到mxnet內核dlmc中，python在載入Symbol模塊時，會動態加載所有的層。下面先來簡單地說明python是如何動態加載的，再來看下mxnet中的python是如何動態加載的。

import sys

def fib(n):
	a, b = 0, 1
    result = []
    while(b<n):
		result.append(b)
		a, b = b, a+b
	print(result)

print("load function in here")
setattr(sys.modules[__name__], "FIBC", fib)

假如上面的代碼放在load_test.py中，當import load_test時會先運行腳本中第一行和最后兩行代碼，最后一行代碼將FIBC定位到fib上，所以相當於可以引用FIBC函數，結果如下：

>>> import load_test
load function in here
>>> load_test.fib(16)
[1, 1, 2, 3, 5, 8, 13]
>>> load_test.FIBC(16)
[1, 1, 2, 3, 5, 8, 13]

那么在mxnet的python中是怎么實現的呢？在導入Symbol模塊時會運行_init_symbol_module()，這個函數能加載注冊在mxnet內核中的所有Symbol，來看下面兩個函數：

def _init_symbol_module():
    """List and add all the atomic symbol functions to current module."""
    plist = ctypes.POINTER(ctypes.c_void_p)()
    size = ctypes.c_uint()

    check_call(_LIB.MXSymbolListAtomicSymbolCreators(ctypes.byref(size),
                                                     ctypes.byref(plist)))
    module_obj = sys.modules[__name__]
    module_internal = sys.modules["mxnet._symbol_internal"]
    for i in range(size.value):
        hdl = SymbolHandle(plist[i])
        function = _make_atomic_symbol_function(hdl)
        if function.__name__.startswith('_'):
            setattr(module_internal, function.__name__, function)
        else:
            setattr(module_obj, function.__name__, function)



def _make_atomic_symbol_function(handle):
    """Create an atomic symbol function by handle and funciton name."""
    name = ctypes.c_char_p()
    desc = ctypes.c_char_p()
    key_var_num_args = ctypes.c_char_p()
    num_args = mx_uint()
    arg_names = ctypes.POINTER(ctypes.c_char_p)()
    arg_types = ctypes.POINTER(ctypes.c_char_p)()
    arg_descs = ctypes.POINTER(ctypes.c_char_p)()
    ret_type = ctypes.c_char_p()

    check_call(_LIB.MXSymbolGetAtomicSymbolInfo(
        handle, ctypes.byref(name), ctypes.byref(desc),
        ctypes.byref(num_args),
        ctypes.byref(arg_names),
        ctypes.byref(arg_types),
        ctypes.byref(arg_descs),
        ctypes.byref(key_var_num_args),
        ctypes.byref(ret_type)))
    param_str = ctypes2docstring(num_args, arg_names, arg_types, arg_descs)
    key_var_num_args = py_str(key_var_num_args.value)
    func_name = py_str(name.value)
    desc = py_str(desc.value)
    if key_var_num_args:
        desc += '\nThis function support variable length of positional input.'
    doc_str = ('%s\n\n' +
               '%s\n' +
               'name : string, optional.\n' +
               '    Name of the resulting symbol.\n\n' +
               'Returns\n' +
               '-------\n' +
               'symbol: Symbol\n' +
               '    The result symbol.')
    doc_str = doc_str % (desc, param_str)
    extra_doc = "\n" + '\n'.join([x.__doc__ for x in type.__subclasses__(SymbolDoc)
                                  if x.__name__ == '%sDoc' % func_name])
    doc_str += re.sub(re.compile("    "), "", extra_doc)

    def creator(*args, **kwargs):
        """Activation Operator of Neural Net.
        The parameters listed below can be passed in as keyword arguments.

        Parameters
        ----------
        name : string, required.
            Name of the resulting symbol.

        Returns
        -------
        symbol: Symbol
            the resulting symbol
        """
        param_keys = []
        param_vals = []
        symbol_kwargs = {}
        name = kwargs.pop('name', None)
        attr = kwargs.pop('attr', None)

        if key_var_num_args and key_var_num_args not in kwargs:
            param_keys.append(c_str(key_var_num_args))
            param_vals.append(c_str(str(len(args))))

        for k, v in kwargs.items():
            if isinstance(v, Symbol):
                symbol_kwargs[k] = v
            else:
                param_keys.append(c_str(k))
                param_vals.append(c_str(str(v)))
        # create atomic symbol
        param_keys = c_array(ctypes.c_char_p, param_keys)
        param_vals = c_array(ctypes.c_char_p, param_vals)
        sym_handle = SymbolHandle()
        check_call(_LIB.MXSymbolCreateAtomicSymbol(
            handle,
            mx_uint(len(param_keys)),
            param_keys, param_vals,
            ctypes.byref(sym_handle)))

        if len(args) != 0 and len(symbol_kwargs) != 0:
            raise TypeError(
                '%s can only accept input'
                'Symbols either as positional or keyword arguments, not both' % func_name)
        if key_var_num_args and len(symbol_kwargs) != 0:
            raise ValueError('This function supports variable length of Symbol arguments.\n' +
                             'Please pass all the input Symbols via positional arguments' +
                             ' instead of keyword arguments.')
        s = Symbol(sym_handle)
        attr = AttrScope.current.get(attr)
        if attr:
            s._set_attr(**attr)
        hint = func_name.lower()
        name = NameManager.current.get(name, hint)
        s._compose(*args, name=name, **symbol_kwargs)
        return s

    creator.__name__ = func_name
    creator.__doc__ = doc_str
    return creator

先從MXSymbolListAtomicSymbolCreators中獲取以注冊在內核中的OperatorPropertyReg對象數組。
_make_atomic_symbol_function這個函數用獲取相應Symbol的信息，以及返回一個creator的對象，可以看到creator.__name__是以Symbol的名字來命名的。
setattr(module_obj, function.__name__, function)將剛才返回的creator寫入到這個模板中，當導入這個模板后，可以直接引用creator.__name__來調用相應的creator(*args, **kwargs)函數。

至於如何向mxnet內核注冊，可以看下全連接層的樣例：

DMLC_REGISTER_PARAMETER(FullyConnectedParam);

MXNET_REGISTER_OP_PROPERTY(FullyConnected, FullyConnectedProp)
.describe("Apply matrix multiplication to input then add a bias.")
.add_argument("data", "Symbol", "Input data to the FullyConnectedOp.")
.add_argument("weight", "Symbol", "Weight matrix.")
.add_argument("bias", "Symbol", "Bias parameter.")
.add_arguments(FullyConnectedParam::__FIELDS__());

struct FullyConnectedParam : public dmlc::Parameter<FullyConnectedParam> {
  int num_hidden;
  bool no_bias;
  DMLC_DECLARE_PARAMETER(FullyConnectedParam) {
    // TODO(bing) add support for boolean
    DMLC_DECLARE_FIELD(num_hidden).set_lower_bound(1)
    .describe("Number of hidden nodes of the output.");
    DMLC_DECLARE_FIELD(no_bias).set_default(false)
    .describe("Whether to disable bias parameter.");
  }
};

Create OperatorSymbol

這一段的題目我也不知道叫什么名字好，其實就是創建一個層的Symbol，但這個Symbol內有Node是與層有關的操作(operator)。下面這幾個層是過程都是一樣的，對於每一個層都創建一個相應的Symbol，從上面可以看到調用這些函數時，實際上是調用一個Creator對象，所以單卡調試python代碼會直接入到creator(*args, **kwargs)中，我們繼續看下在這個函數中的操作，我們以fc_3 = mx.sym.FullyConnected(data=fc_2, name='fc3', num_hidden = 4)為例。

#netword config
fc_1  = mx.sym.FullyConnected(data=X, name='fc1', num_hidden = 2)
fc_2  = mx.sym.FullyConnected(data=fc_1, name='fc2', num_hidden = 3)
fc_3  = mx.sym.FullyConnected(data=fc_2, name='fc3', num_hidden = 4)
lro = mx.sym.SoftmaxOutput(data=fc_3, label=Y, name="softmax")

有creator(*args, **kwargs)中先是將參數中的Symbol對象（在這里是fc_2）與非Symbol對象分開（定義在FullyConnectedParam的num_hidden），將非Symbol對象的參數傳入到C++函數中MXSymbolCreateAtomicSymbol中創建Symbol，並掛在這個Symbol的heads_[0].source。

創建了Symbol后，還要裝前一層的Symbol掛在這一層上面，這里調用s._compose(*args, name=name, **symbol_kwargs)。這個函數調用了C++中的MXSymbolCompose --> Compose，Compose會將是上層的Symbol對象掛在heads_[0].source->inputs相應位置上，heads_[0].source->inputs的位置有這個Symbol的heads_[0].source->op->ListArguments決定的。有這例子中，fc3.heads_[0].source->inputs[0] = fc2，FullyConnectedProp.ListArguments如下，其它的空位用NULL（從上面的is_variable()可以看出這里填充的是variable）填充，最后返回這個操作Symbol。

  std::vector<std::string> ListArguments() const override {
    if (!param_.no_bias) {
      return {"data", "weight", "bias"};
    } else {
      return {"data", "weight"};
    }
  }

到運行完lro = mx.sym.SoftmaxOutput(data=fc_3, label=Y, name="softmax")，我們可以得到一個如下的網絡結構圖，但這還不是計算圖，這里我將Symbol分為兩類，一類是層，即是Symbol:OP；一類是變量，即是Symbol:Var。

origin

圖1 網絡結構的Symbol連接網

Bind構建計算圖

#build the model
model = mx.mod.Module(
    symbol = lro ,
    data_names=['data'],
    label_names = ['softmax_label']# network structure
)

這個是構建一個模型，這個初始化函數我想講的是arg_names = symbol.list_arguments()，這個涉及到圖的深度優先搜索，調用的是C++內的MXSymbolListArguments，C++中主要是如下三個函數做了深度優先搜索然后返回變量的列表。

std::vector<std::string> Symbol::ListArguments() const {
  std::vector<std::string> ret;
  if (this->is_atomic()) {
    return heads_[0].source->op->ListArguments();
  } else {
    this->DFSVisit([&ret](const std::shared_ptr<Node> &node) {
        if (node->is_variable()) {
          ret.push_back(node->name);
        }
      });
    return ret;
  }
}

template<typename FVisit>
inline void Symbol::DFSVisit(FVisit fvisit) const {
  typedef const std::shared_ptr<Node>* GNode;
  std::vector<GNode> head_nodes(heads_.size());
  std::transform(heads_.begin(), heads_.end(), head_nodes.begin(),
                 [](const DataEntry& e)->GNode {
                   return &e.source;
                 });
  graph::PostOrderDFSVisit<GNode, Node*>(
      head_nodes,
      [fvisit](GNode n) { fvisit(*n); },  // FVisit
      [](GNode n)->Node* { return n->get(); },  // HashFunc
      [](GNode n)->uint32_t { return (*n)->inputs.size() +
            static_cast<int>((*n)->is_backward()); },  // InDegree
      [](GNode n, uint32_t index)->GNode {  // GetInput
        if (index < (*n)->inputs.size()) {
          return &(*n)->inputs.at(index).source;
        } else {
          return &(*n)->backward_source_node;
        }
      });
}

template <typename GNode, typename HashType, typename FVisit,
          typename HashFunc, typename InDegree, typename GetInput>
void PostOrderDFSVisit(const std::vector<GNode>& heads, FVisit fvisit,
                       HashFunc hash, InDegree indegree, GetInput getinput) {
  std::vector<std::pair<GNode, uint32_t> > stack;
  std::unordered_set<HashType> visited;
  for (auto& head : heads) {
    HashType head_hash = hash(head);
    if (visited.count(head_hash) == 0) {
      stack.push_back(std::make_pair(head, 0));
      visited.insert(head_hash);
    }
    while (!stack.empty()) {
      std::pair<GNode, uint32_t>& back = stack.back();
      if (back.second == indegree(back.first)) {
        fvisit(back.first);
        stack.pop_back();
      } else {
        const GNode& input = getinput(back.first, back.second++);
        HashType input_hash = hash(input);
        if (visited.count(input_hash) == 0) {
          stack.push_back(std::make_pair(input, 0));
          visited.insert(input_hash);
        }
      }
    }
  }
}

從第一個函數ListArguments()可以看到，如果Symbol是variable，則放到輸出結果ret中。第二個函數DFSVisit(FVisit fvisit)是幫第三個函數PostOrderDFSVisit(...)構建一些匿名函數。關鍵是看第三個函數，我們在初始化模型時掛上去的lro,也圖1中的Symbol:OP--Out。這里這里深度優先搜索（DFS）的步驟如下：

將在初始化模型時掛上去的Symbol放到容器中（可以看成一個隊列）
如果容器為空，則結束，否則將容器中最老的元素賦給back。
back.second的值是訪問的次數
如果訪問次數等於入度數，將back從容器中拿掉，且如果back.first是變量則放到輸出結果ret中。
如果訪問次數不等於入度數，將back.first中的輸入input[back.second]拿出放入到容器的最后，且back.second的值增加一。
轉到步驟2。

從圖1的頂層開始的DFS，按以上步驟可以得到的結果如下（要注意的是下面的順序是唯一的）：

['data', 'fc1_weight', 'fc1_bias', 'fc2_weight', 'fc2_bias', 'fc3_weight', 'fc3_bias', 'softmax_label']

從這個順序也可以看到為什么用DFS，因為遍歷的順序剛好是前向傳播計算的順序。

訓練fit

綁定執行器與初始化計算圖

在訓練之前會根據設備來綁定執行器（Bind Executor），沒有明確指出執行器時，默認為cpu(0)，一般來說一個Executor對應該硬件的一個設備，比如一個cpu、一個gpu。python的函數調用過程如下：

base_module.py ： model.fit -->
module.py : bind -->
excutor_group.py :　DataParallelExecutorGroup.__init__ --> bind_exec --> _bind_ith_exec -->
symbol.py : bind -->
C++ : MXExecutorBindEX

_bind_ith_exec是python代碼中最關鍵的一個，它是不僅綁定執行器，還分配了前向（arg_arrays）和后向（grad_arrays）傳播所需要的內存空間、Symbol是否要后向傳播（grad_req）、矩形形狀的推斷（infer shape）。其中infer shape也是引用了C++的代碼，里面用到了迭代器生成TShape、拓朴排序等知識。

C++的調用關系以下：

MXExecutorBindEX() --> Executor::Bind() --> GraphExecutor::init()

看下GraphExecutor::init()具體做了什么，InitGraph初始化了計算圖，這個計算圖包括了前向和后向的，InitDataEntryInfo初始化一些傳入來的變量，InitDataEntryMemory這個是為中間的一些輸出分配內存空間，這里涉及到兩個省內存的策略：

inplace。在這個策略里，我們模擬圖的遍歷過程，並為每個變量維護一個還有多少其他變量需要它的計數。當我們發現某個變量的計數變成0時，我們便回收其內存空間：這個要求在寫操作層時有對應的ForwardInplaceOption與BackwardInplaceOption
co-share：我們允許兩個變量使用同一段內存空間。這么做當然會使得這兩個變量不能同時在寫這段空間。所以我們只考慮對不能並行的變量進行co-share。每一次我們考慮圖中的一條路（path），路上所有變量都有依賴關系所以不能被並行，然后我們對其進行內存分配並將它們從圖中刪掉。這個可以由算法得到，但要設計一個內存池GraphStoragePool。

其實還有一個省內存的策略，不過與計算圖無關，就是我在上篇博客所說的——mshadow的原理--MXNet。

inline void Init(Symbol symbol,
                   const Context& default_ctx,
                   const std::map<std::string, Context>& ctx_map,
                   const std::vector<NDArray> &in_args,
                   const std::vector<NDArray> &arg_grad_store,
                   const std::vector<OpReqType> &grad_req_type,
                   const std::vector<NDArray> &aux_states,
                   Executor* shared_exec = nullptr) {
    enable_inplace_allocation_ = dmlc::GetEnv("MXNET_EXEC_ENABLE_INPLACE", true);
    prefer_bulk_execution_ = dmlc::GetEnv("MXNET_EXEC_PREFER_BULK_EXEC", true);
    if (shared_exec != NULL) {
      GraphExecutor* gexec = dynamic_cast<GraphExecutor*>(shared_exec);
      CHECK(gexec) << "Input executor for sharing memory must have GraphExecutor type.";
      shared_mem_ = gexec->shared_mem_;
    } else {
      shared_mem_ = std::make_shared<GraphStoragePool>();
    }

    CHECK_EQ(grad_req_type.size(), arg_grad_store.size());
    bool need_backward = false;
    for (auto req : grad_req_type) {
      if (req != kNullOp) need_backward = true;
    }
    this->InitGraph(symbol, default_ctx, ctx_map,
                    in_args, arg_grad_store, grad_req_type,
                    need_backward);
    this->InitDataEntryInfo(in_args, arg_grad_store, grad_req_type, aux_states);
    this->InitOperators();
    this->InitDataEntryMemory();
    this->InitResources();
    this->InitCachedOps();
    this->InitOpSegs();
  }

如圖2所示，這是mxnet省內存策略的效果：

mem

圖2 前向預測與訓練時的省內存效果

訓練

訓練之前，先初始化除了輸入數的所有變量，初始化訓練的算法，這個在base_module.py：

self.init_params(initializer=initializer, arg_params=arg_params, aux_params=aux_params,
                 allow_missing=allow_missing, force_init=force_init)
self.init_optimizer(kvstore=kvstore, optimizer=optimizer,
                    optimizer_params=optimizer_params)

訓練的步驟主要是forward_backward與update，代碼如下：

		################################################################################
		# training loop
		################################################################################
	    for epoch in range(begin_epoch, num_epoch):
            tic = time.time()
            eval_metric.reset()
            for nbatch, data_batch in enumerate(train_data):
                if monitor is not None:
                    monitor.tic()
                self.forward_backward(data_batch)
                self.update()
                self.update_metric(eval_metric, data_batch.label)

                if monitor is not None:
                    monitor.toc_print()

                if batch_end_callback is not None:
                    batch_end_params = BatchEndParam(epoch=epoch, nbatch=nbatch,
                                                     eval_metric=eval_metric,
                                                     locals=locals())
                    for callback in _as_list(batch_end_callback):
                        callback(batch_end_params)

            # one epoch of training is finished
            for name, val in eval_metric.get_name_value():
                self.logger.info('Epoch[%d] Train-%s=%f', epoch, name, val)
            toc = time.time()
            self.logger.info('Epoch[%d] Time cost=%.3f', epoch, (toc-tic))

            if epoch_end_callback is not None:
                arg_params, aux_params = self.get_params()
                for callback in _as_list(epoch_end_callback):
                    callback(epoch, self.symbol, arg_params, aux_params)

            #----------------------------------------
            # evaluation on validation set
            if eval_data:
                res = self.score(eval_data, validation_metric,
                                 batch_end_callback=eval_batch_end_callback, epoch=epoch)
                for name, val in res:
                    self.logger.info('Epoch[%d] Validation-%s=%f', epoch, name, val)

            # end of 1 epoch, reset the data-iter for another epoch
            train_data.reset()

forward與backward最后都調用了void RunOps(bool is_train, size_t topo_start, size_t topo_end)，估計這個函數才是整個訓練的核心，但個函數涉及到的同步、異步處理的parameter server（PS），PS很復雜，在這里就不再展開討論了。

【防止爬蟲轉載而導致的格式問題——鏈接】：
http://www.cnblogs.com/heguanyou/p/7604326.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 YOLOv5訓練過程（原）torch的訓練過程深度學習模型訓練過程關於LSTM的輸入和訓練過程的理解 Tensorflow 保存和載入訓練過程 YOLOv3訓練過程筆記 09 使用Tensorboard查看訓練過程 caffe + ssd網絡訓練過程 TransCoder代碼詳解（二）：MLM的訓練過程學習CNN系列二：訓練過程