Caffe
Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors.Yangqing Jia created the project during his PhD at UC Berkeley. Caffe is released under the BSD 2-Clause license
1.FrameWork of Caffe
Caffe是一種開源軟件框架,內部提供了一套基本的編程框架,或者說一個模板框架,用以實現GPU並行架構下的深度卷積神經網絡,DeepLearning等算法,能在性能上大幅度提升,相比較
A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 2012.中的Highperformance convnet而言,我們可以按照框架定義各種各樣的卷積神經網絡的結構,並且可以再此框架下增加自己的代碼,設計新的算法,該框架的一個問題就是,只能夠使用卷積網絡,所有框架都是再基於卷積神經網路的模型上進行的。
大家如果對別的深度學習算法感興趣,推薦使用TensorFlow,這個能夠完成更多的算法,比如RNN,LSTM等
2.Caffe的三大不可更改的基本組成結構
caffe具有三個基本都是額原子結構,顧名思義,原子結構就是說不能隨意更改,caffe的編程框架就是在這三個原子下實現,它們分別是:Blobs, Layers, and Nets
- Blob
A Blob is a wrapper over the actual data being processed and passed along by Caffe, and also under the hood provides synchronization capability between the CPU and the GPU. Mathematically, a blob is a 4-dimensional array that stores things in the order of (Num, Channels, Height and Width), from major to minor, and stored in a C-contiguous fashion. The main reason for putting Num (the name is due to legacy reasons, and is equivalent to the notation of “batch” as in minibatch SGD).
Caffe stores and communicates data in 4-dimensional arrays called blobs. Blobs provide a unified memory interface, holding data e.g. batches of images, model parameters, and derivatives for optimization.
Blobs conceal the computational and mental overhead of mixed CPU/GPU operation by synchronizing from the CPU host to the GPU device as needed. Memory on the host and device is allocated on demand (lazily) for efficient memory usage.
The conventional blob dimensions for data are number N x channel K x height H x width W. Blob memory is row-major in layout so the last / rightmost dimension changes fastest. For example, the value at index (n, k, h, w) is physically located at index ((n * K + k) * H + h) * W + w.
- Number / N is the batch size of the data. Batch processing achieves better throughput for communication and device processing. For an ImageNet training batch of 256 images B = 256.
- Channel / K is the feature dimension e.g. for RGB images K = 3.
Note that although we have designed blobs with its dimensions corresponding to image applications, they are named purely for notational purpose and it is totally valid for you to do non-image applications. For example, if you simply need fully-connected networks like the conventional multi-layer perceptron, use blobs of dimensions (Num, Channels, 1, 1) and call the InnerProductLayer (which we will cover soon).
Caffe operations are general with respect to the channel dimension / K. Grayscale and hyperspectral imagery are fine. Caffe can likewise model and process arbitrary vectors in blobs with singleton. That is, the shape of blob holding 1000 vectors of 16 feature dimensions is 1000 x 16 x 1 x 1.
Parameter blob dimensions vary according to the type and configuration of the layer. For a convolution layer with 96 filters of 11 x 11 spatial dimension and 3 inputs the blob is 96 x 3 x 11 x 11. For an inner product / fully-connected layer with 1000 output channels and 1024 input channels the parameter blob is 1 x 1 x 1000 x 1024.
For custom data it may be necessary to hack your own input preparation tool or data layer. However once your data is in your job is done. The modularity of layers accomplishes the rest of the work for you.
以上是官網上對Blob的介紹,講了這么多,其實就是想說,Blob就是一個包裝器,在caffe這個流程中,所有的數據都要被包裝成blob格式。然后在caffe的架構下進行編程和處理,這點事我們不能隨意更改的,因為caffe本身提供了很多已經設計好的函數和類,我們隨意更改數據包轉器就等於沒法再使用其中的函數,你就沒法再Caffe的框架下設計深度神經網絡。
blob的格式就是(Number,Channel,Height,Width)將數據按照四元組的方式存儲,這里由於是處理的圖像數據,所以后面三維代表圖像的數據格式,Channel代表圖像的通道數,如灰度圖是1通道,Channel=1,RGB圖像是3通道,Channel=3,Height和Width分別是圖像的長寬。至於Number則代表Batch,由於內存有限,所以我們進行訓練的時候我們只能分批進行,這里還為每個batch設置了一個標識號,后面會看到我們使用隨機梯度下降算法(Schocastic gredient descent,SGD)對模型進行訓練,其中就是要使用到Batch,blob不僅僅只用來保存深度網路進行前向過程時的數據,還用來保存在后向求梯度過程時的提梯度數據
具體使用方式:
const Dtype* cpu_data() const;
Dtype* mutable_cpu_data();
上面兩中格式分別表示數據的固態模式和和自由模式,blob具有CPU的數據保存和GPU的數據保存,同時blob將數據在CPU和GPU之間的交換封裝起來了,
並進行了同步處理,因此我們不需要理會數據在GPU和CPU之間的交互。
-
layer
Layer computation and connections,層是組成網絡結構的單位,
層接受下層的數據輸出作為輸入,通過內部的運算輸出,這是卷積神經網絡的內容這里不再詳細介紹,主要說下Caffe中定義的層的結構的使用和編程方法
A layer takes input through bottom connections and makes output through top connections.
Each layer type defines three critical computations: setup, forward, and backward.
- Setup: initialize the layer and its connections once at model initialization.
- Forward: given input from bottom compute the output and send to the top.
- Backward: given the gradient w.r.t. the top output compute the gradient w.r.t. to the input and send to the bottom. A layer with parameters computes the gradient w.r.t. to its parameters and stores it internally.
以上是官網中對Caffe中網絡層的使用定義,和一般的深度學習庫類似,,都有三個步驟,1:建立層,包括建立連接關系初始化其中一些變量。2:前向計算過程,接受輸入數據並計算出輸出,3:后向過程,進行反向梯度的計算,並把梯度保存在層結構中
- Net definition
網絡是由層組成的,定義了輸入輸出,網絡各層,就定義了一個網絡,官網說法:The net is a set of layers connected in a computation graph – a directed acyclic graph (DAG) to be exact,舉一個回歸網絡的定義:
定義代碼如下:
name: "LogReg"
layers {
name: "mnist"
type: DATA
top: "data"
top: "label"
data_param {
source: "input_leveldb"
batch_size: 64
}
}
layers {
name: "ip"
type: INNER_PRODUCT
bottom: "data"
top: "ip"
inner_product_param {
num_output: 2
}
}
layers {
name: "loss"
type: SOFTMAX_LOSS
bottom: "ip"
bottom: "label"
top: "loss"
}
實際上,代碼只有三層,輸入,中間層,輸出層,這是一種最基本的單隱層的網絡。我們可以使用 Net::Init()
對定義的網絡進行初始化和檢查,初始化包括對一些變量權值初始化,,檢查包括對網絡的結構的正確性進行檢查,因為涉及到網絡的上下層連接
關系的匹配和耦合連接