使用卷積神經網絡做回歸任務

本文轉載自查看原文 2017-01-15 21:20 19127

Caffe應該是目前深度學習領域應用最廣泛的幾大框架之一了，尤其是視覺領域。絕大多數用Caffe的人，應該用的都是基於分類的網絡，但有的時候也許會有基於回歸的視覺應用的需要，查了一下Caffe官網，還真沒有很現成的例子。這篇舉個簡單的小例子說明一下如何用Caffe和卷積神經網絡（CNN: Convolutional Neural Networks）做基於回歸的應用。

原理

最經典的CNN結構一般都是幾個卷積層，后面接全連接（FC: Fully Connected）層，最后接一個Softmax層輸出預測的分類概率。如果把圖像的矩陣也看成是一個向量的話，CNN中無論是卷積還是FC，就是不斷地把一個向量變換成另一個向量（事實上對於單個的filter/feature channel，Caffe里最基礎的卷積實現就是向量和矩陣的乘法：Convolution in Caffe: a memo），最后輸出就是一個把制定分類的類目數作為維度的概率向量。因為神經網絡的風格算是黑盒子學習，所以很直接的想法就是把最后輸出的向量的值直接拿來做回歸，最后優化的目標函數不再是cross entropy等，而是直接基於實數值的誤差。

EuclideanLossLayer

Caffe內置的EuclideanLossLayer就是用來解決上面提到的實值回歸的一個辦法。EuclideanLossLayer計算如下的誤差：

\begin{align}\notag \frac 1 {2N} \sum_{i=1}^N \| x^1_i - x^2_i \|_2^2\end{align}

所以很簡單，把標注的值和網絡計算出來的值放到EuclideanLossLayer比較差異就可以了。

給圖像混亂程度打分的簡單例子

用一個給圖像混亂程度打分的簡單例子來說明如何使用Caffe和EuclideanLossLayer進行回歸。

生成基於Ising模型的數據

這里采用統計物理里非常經典的Ising模型的模擬來生成圖片，Ising模型可能是統計物理里被人研究最多的模型之一，不過這篇不是講物理，就略過細節，總之基於這個模型的模擬可以生成如下的圖片：

圖片中第一個字段是編號，第二個字段對應的分數可以大致認為是圖片的有序程度，范圍0~1，而這個例子要做的事情就是用一個CNN學習圖片的有序程度並預測。

生成圖片的Python腳本源於Monte Carlo Simulation of the Ising Model using Python，基於Metropolis算法對Ising模型的模擬，做了一些並行和隨機生成圖片的修改，在每次模擬的時候隨機取一個時間（1e3到1e7之間）點輸出到圖片，代碼如下：

import os
import sys
import datetime

from multiprocessing import Process

import numpy as np
from matplotlib import pyplot

LATTICE_SIZE = 100
SAMPLE_SIZE = 12000
STEP_ORDER_RANGE = [3, 7]
SAMPLE_FOLDER = 'samples'

#----------------------------------------------------------------------#
#   Check periodic boundary conditions
#----------------------------------------------------------------------#
def bc(i):
    if i+1 > LATTICE_SIZE-1:
        return 0
    if i-1 < 0:
        return LATTICE_SIZE - 1
    else:
        return i

#----------------------------------------------------------------------#
#   Calculate internal energy
#----------------------------------------------------------------------#
def energy(system, N, M):
    return -1 * system[N,M] * (system[bc(N-1), M] \
                               + system[bc(N+1), M] \
                               + system[N, bc(M-1)] \
                               + system[N, bc(M+1)])

#----------------------------------------------------------------------#
#   Build the system
#----------------------------------------------------------------------#
def build_system():
    system = np.random.random_integers(0, 1, (LATTICE_SIZE, LATTICE_SIZE))
    system[system==0] = - 1

    return system

#----------------------------------------------------------------------#
#   The Main monte carlo loop
#----------------------------------------------------------------------#
def main(T, index):

    score = np.random.random()
    order = score*(STEP_ORDER_RANGE[1]-STEP_ORDER_RANGE[0]) + STEP_ORDER_RANGE[0]
    stop = np.int(np.round(np.power(10.0, order)))
    print('Running sample: {}, stop @ {}'.format(index, stop))
    sys.stdout.flush()

    system = build_system()

    for step in range(stop):
        M = np.random.randint(0, LATTICE_SIZE)
        N = np.random.randint(0, LATTICE_SIZE)

        E = -2. * energy(system, N, M)

        if E <= 0.:
            system[N,M] *= -1
        elif np.exp(-1./T*E) > np.random.rand():
            system[N,M] *= -1

        #if step % 100000 == 0:
        #    print('.'),
        #    sys.stdout.flush()

    filename = '{}/'.format(SAMPLE_FOLDER) + '{:0>5d}'.format(index) + '_{}.jpg'.format(score)
    pyplot.imsave(filename, system, cmap='gray')
    print('Saved to {}!\n'.format(filename))
    sys.stdout.flush()

#----------------------------------------------------------------------#
#   Run the menu for the monte carlo simulation
#----------------------------------------------------------------------#

def run_main(index, length):
    np.random.seed(datetime.datetime.now().microsecond)
    for i in xrange(index, index+length):
        main(0.1, i)

def run():

    cmd = 'mkdir -p {}'.format(SAMPLE_FOLDER)
    os.system(cmd)

    n_processes = 8
    length = int(SAMPLE_SIZE/n_processes)
    processes = [Process(target=run_main, args=(x, length)) for x in np.arange(n_processes)*length]

    for p in processes:
        p.start()
    
    for p in processes:
        p.join()

if __name__ == '__main__':
    run()

在這個例子中一共隨機生成了12000張100x100的灰度圖片，命名的規則是[編號]_[有序程度].jpg。至於有序程度為什么用0~1之間的隨機數而不是模擬的時間步數，是因為雖說理論上三層神經網絡就能逼近任意函數，不過具體到實際訓練中還是應該對數據進行預處理，尤其是當目標函數是L2 norm的形式時，如果能保持數據分布均勻，模型的收斂性和可靠性都會提高，范圍0到1之間是為了方便最后一層Sigmoid輸出對比，同時也方便估算模型誤差。還有一點需要注意是，因為圖片本身就是模特卡羅模擬產生的，所以即使是同樣的有序度的圖片，其實看上去不管是主觀還是客觀的有序程度都是有差別的。

生成訓練/驗證/測試集

把Ising模擬生成的12000張圖片划分為三部分：1w作為訓練數據；1k作為驗證集；剩下1k作為測試集。下面的Python代碼用來生成這樣的訓練集和驗證集的列表：

import os
import numpy

filename2score = lambda x: x[:x.rfind('.')].split('_')[-1]

img_files = sorted(os.listdir('samples'))

with open('train.txt', 'w') as train_txt:
    for f in img_files[:10000]:
        score = filename2score(f)
        line = 'samples/{} {}\n'.format(f, score)
        train_txt.write(line)

with open('val.txt', 'w') as val_txt:
    for f in img_files[10000:11000]:
        score = filename2score(f)
        line = 'samples/{} {}\n'.format(f, score)
        val_txt.write(line)

with open('test.txt', 'w') as test_txt:
    for f in img_files[11000:]:
        line = 'samples/{}\n'.format(f)
        test_txt.write(line)

生成HDF5文件

lmdb雖然又快又省空間，可是Caffe默認的生成lmdb的工具(convert_imageset)不支持浮點類型的數據，雖然caffe.proto里Datum的定義似乎是支持的，不過相應的代碼改動還是比較麻煩。相比起來HDF又慢又占空間，但簡單好用，如果不是海量數據，還是個不錯的選擇，這里用HDF來存儲用於回歸訓練和驗證的數據，下面是一個生成HDF文件和供Caffe讀取文件列表的腳本：

import sys
import numpy
from matplotlib import pyplot
import h5py

IMAGE_SIZE = (100, 100)
MEAN_VALUE = 128

filename = sys.argv[1]
setname, ext = filename.split('.')

with open(filename, 'r') as f:
    lines = f.readlines()

numpy.random.shuffle(lines)

sample_size = len(lines)
imgs = numpy.zeros((sample_size, 1,) + IMAGE_SIZE, dtype=numpy.float32)
scores = numpy.zeros(sample_size, dtype=numpy.float32)

h5_filename = '{}.h5'.format(setname)
with h5py.File(h5_filename, 'w') as h:
    for i, line in enumerate(lines):
        image_name, score = line[:-1].split()
        img = pyplot.imread(image_name)[:, :, 0].astype(numpy.float32)
        img = img.reshape((1, )+img.shape)
        img -= MEAN_VALUE
        imgs[i] = img
        scores[i] = float(score)
        if (i+1) % 1000 == 0:
            print('processed {} images!'.format(i+1))
    h.create_dataset('data', data=imgs)
    h.create_dataset('score', data=scores)

with open('{}_h5.txt'.format(setname), 'w') as f:
    f.write(h5_filename)

需要注意的是Caffe中HDF的DataLayer不支持transform，所以數據存儲前就提前進行了減去均值的步驟。保存為gen_hdf.py，依次運行命令生成訓練集和驗證集：

python gen_hdf.py train.txt
python gen_hdf.py val.txt

訓練

用一個簡單的小網絡訓練這個基於回歸的模型：

網絡結構的train_val.prototxt如下：

name: "RegressionExample"
layer {
  name: "data"
  type: "HDF5Data"
  top: "data"
  top: "score"
  include {
    phase: TRAIN
  }
  hdf5_data_param {
    source: "train_h5.txt"
    batch_size: 64
  }
}
layer {
  name: "data"
  type: "HDF5Data"
  top: "data"
  top: "score"
  include {
    phase: TEST
  }
  hdf5_data_param {
    source: "val_h5.txt"
    batch_size: 64
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 1
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 5
    stride: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 1
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    pad: 2
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 1
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "pool3"
  type: "Pooling"
  bottom: "conv3"
  top: "pool3"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc4"
  type: "InnerProduct"
  bottom: "pool3"
  top: "fc4"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 1
    decay_mult: 0
  }
  inner_product_param {
    num_output: 192
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "fc4"
  top: "fc4"
}
layer {
  name: "drop4"
  type: "Dropout"
  bottom: "fc4"
  top: "fc4"
  dropout_param {
    dropout_ratio: 0.35
  }
}
layer {
  name: "fc5"
  type: "InnerProduct"
  bottom: "fc4"
  top: "fc5"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 1
    decay_mult: 0
  }
  inner_product_param {
    num_output: 1
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "sigmoid5"
  type: "Sigmoid"
  bottom: "fc5"
  top: "pred"
}
layer {
  name: "loss"
  type: "EuclideanLoss"
  bottom: "pred"
  bottom: "score"
  top: "loss"
}

其中回歸部分由EuclideanLossLayer中???較最后一層的輸出和train.txt/val.txt中的分數差並作為目標函數實現。需要提一句的是基於實數值的回歸問題，對於方差這種目標函數，SGD的性能和穩定性一般來說都不是很好，Caffe文檔里也有提到過這點。不過具體到Caffe中，能用就行。。solver.prototxt如下：

net: "./train_val.prototxt"
test_iter: 2000
test_interval: 500
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 50000
display: 50
max_iter: 10000
momentum: 0.85
weight_decay: 0.0005
snapshot: 1000
snapshot_prefix: "./example_ising"
solver_mode: GPU
type: "Nesterov"

然后訓練：

/path/to/caffe/build/tools/caffe train -solver solver.prototxt

測試

隨便訓了10000個iteration，反正是收斂了

把train_val.prototxt的兩個data layer替換成input_shape，然后去掉最后一層EuclideanLoss就可以了，input_shape定義如下：

input: "data"
input_shape {
  dim: 1
  dim: 1
  dim: 100
  dim: 100
}

改好后另存為deploy.prototxt，然后把訓好的模型拿來在測試集上做測試，pycaffe提供了非常方便的接口，用下面腳本輸出一個文件列表里所有文件的預測結果：

import sys
import numpy
sys.path.append('/opt/caffe/python')
import caffe

WEIGHTS_FILE = 'example_ising_iter_10000.caffemodel'
DEPLOY_FILE = 'deploy.prototxt'
IMAGE_SIZE = (100, 100)
MEAN_VALUE = 128

caffe.set_mode_cpu()
net = caffe.Net(DEPLOY_FILE, WEIGHTS_FILE, caffe.TEST)
net.blobs['data'].reshape(1, 1, *IMAGE_SIZE)

transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))
transformer.set_mean('data', numpy.array([MEAN_VALUE]))
transformer.set_raw_scale('data', 255)

image_list = sys.argv[1]

with open(image_list, 'r') as f:
    for line in f.readlines():
        filename = line[:-1]
        image = caffe.io.load_image(filename, False)
        transformed_image = transformer.preprocess('data', image)
        net.blobs['data'].data[...] = transformed_image

        output = net.forward()
        score = output['pred'][0][0]

        print('The predicted score for {} is {}'.format(filename, score))

對test.txt執行后，前20個文件的結果：

The predicted score for samples/11000_0.30434289374.jpg is 0.296356916428
The predicted score for samples/11001_0.865486910668.jpg is 0.823452055454
The predicted score for samples/11002_0.566940975024.jpg is 0.566108822823
The predicted score for samples/11003_0.447787648857.jpg is 0.443993896246
The predicted score for samples/11004_0.688095649282.jpg is 0.714970111847
The predicted score for samples/11005_0.0834013155212.jpg is 0.0675165131688
The predicted score for samples/11006_0.421206628337.jpg is 0.419887691736
The predicted score for samples/11007_0.579389741639.jpg is 0.58779758215
The predicted score for samples/11008_0.428772434501.jpg is 0.422569811344
The predicted score for samples/11009_0.188864264594.jpg is 0.18296033144
The predicted score for samples/11010_0.328103100948.jpg is 0.325099766254
The predicted score for samples/11011_0.131306426901.jpg is 0.119059860706
The predicted score for samples/11012_0.627027363247.jpg is 0.622474730015
The predicted score for samples/11013_0.0857273267817.jpg is 0.0735778361559
The predicted score for samples/11014_0.870007364446.jpg is 0.883266746998
The predicted score for samples/11015_0.0515036691772.jpg is 0.0575885437429
The predicted score for samples/11016_0.799989222638.jpg is 0.750781834126
The predicted score for samples/11017_0.22049410733.jpg is 0.208014890552
The predicted score for samples/11018_0.882973794598.jpg is 0.891137182713
The predicted score for samples/11019_0.686353385772.jpg is 0.671325206757
The predicted score for samples/11020_0.385639405472.jpg is 0.385150641203

看上去還不錯，挑幾張看看：

再輸出第一層的卷積核看看：

可以看到第一層的卷積核成功學到了高頻和低頻的成分，這也是這個例子中判斷有序程度的關鍵，其實就是高頻的圖像就混亂，低頻的就相對有序一些。Ising的自旋圖雖然都是二值的，不過學出來的模型也可以隨便拿一些別的圖片試試：

嗯。。定性看還是差不多的。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 卷積神經網絡分類與回歸任務的應用簡介 [翻譯]用神經網絡做回歸(Using Neural Networks With Regression) 針對回歸訓練卷積神經網絡哪種神經網絡做回歸比較好，LSTM還是RNN 卷積神經網絡卷積神經網絡卷積神經網絡卷積神經網絡卷積神經網絡卷積神經網絡