Theano2.1.7-基礎知識之設置的配置和編譯模式

本文轉載自查看原文 2015-06-16 17:14 3817 Theano

來自：http://deeplearning.net/software/theano/tutorial/modes.html

Configuration Settings and Compiling Modes

一、配置

該 config 模塊包含了好幾個屬性用來修改theano的行為。許多屬性會在導入theano模塊的時候被檢查的，其中一些屬性是被假定成只讀形式的。約定俗成，在config模塊中的屬性不應該在用戶的代碼中被修改。

Theano的代碼對這些屬性都有默認值的，不過你可以從你的 .theanorc 文件中對它們進行覆蓋，然而 THEANO_FLAGS 環境變量又會覆蓋這些值。

優先級順序如下：

對theano.config.<property>的賦值。
在THEANO_FLAGS中的賦值
在 .theanorc file (或者是在 THEANORC中指定的文件)文件中的賦值。

你可以在任何時候通過theano.config打印出當前的配置。例如，為了查看所有激活的配置變量的列表，輸入下面的命令：

python -c 'import theano; print theano.config' | less

更詳細的，請看庫中的 Configuration 。

二、練習

考慮邏輯回歸：

import numpy
import theano
import theano.tensor as T
rng = numpy.random

N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX),
rng.randint(size=N,low=0, high=2).astype(theano.config.floatX))
training_steps = 10000

# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]
#print "Initial model:"
#print w.get_value(), b.get_value()

# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # Probability of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy
cost = xent.mean() + 0.01*(w**2).sum() # The cost to optimize
gw,gb = T.grad(cost, [w,b])

# Compile expressions to functions
train = theano.function(
            inputs=[x,y],
            outputs=[prediction, xent],
            updates={w:w-0.01*gw, b:b-0.01*gb},
            name = "train")
predict = theano.function(inputs=[x], outputs=prediction,
            name = "predict")

if any([x.op.__class__.__name__ in ['Gemv', 'CGemv', 'Gemm', 'CGemm'] for x in
        train.maker.fgraph.toposort()]):
    print 'Used the cpu'
elif any([x.op.__class__.__name__ in ['GpuGemm', 'GpuGemv'] for x in
          train.maker.fgraph.toposort()]):
    print 'Used the gpu'
else:
    print 'ERROR, not able to tell if theano used the cpu or the gpu'
    print train.maker.fgraph.toposort()

for i in range(training_steps):
    pred, err = train(D[0], D[1])
#print "Final model:"
#print w.get_value(), b.get_value()

print "target values for D"
print D[1]

print "prediction on D"
print predict(D[0])

修改這個例子然后在cpu（默認情況）上執行，使用floatX=float32，然后使用計時命令 time python file.py （該命令在win8下無法執行）。保存你的代碼，稍后還會用到。

note：

在代碼中使用theano的flag floatX=float32 (通過theano.config.floatX來配置) 。
在存儲到共享變量之前先Cast輸入到一個共享變量中
避免本來將int32 cast成float32的，自動cast成float64.
- 在代碼中手動插入cast 或者使用[u]int{8,16}.
- 在均值操作上手動插入cast (這會涉及到除以length，其中length是一個int64的類型).
- 注意到一個新的casting機制現在在開發。

答案（Solution）：

#!/usr/bin/env python
# Theano tutorial
# Solution to Exercise in section 'Configuration Settings and Compiling Modes'

from __future__ import print_function
import numpy
import theano
import theano.tensor as tt

theano.config.floatX = 'float32'

rng = numpy.random

N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX),
rng.randint(size=N, low=0, high=2).astype(theano.config.floatX))
training_steps = 10000

# Declare Theano symbolic variables
x = tt.matrix("x")
y = tt.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]
#print "Initial model:"
#print w.get_value(), b.get_value()

# Construct Theano expression graph
p_1 = 1 / (1 + tt.exp(-tt.dot(x, w) - b))  # Probability of having a one
prediction = p_1 > 0.5  # The prediction that is done: 0 or 1
xent = -y * tt.log(p_1) - (1 - y) * tt.log(1 - p_1)  # Cross-entropy
cost = tt.cast(xent.mean(), 'float32') + \
       0.01 * (w ** 2).sum()  # The cost to optimize
gw, gb = tt.grad(cost, [w, b])

# Compile expressions to functions
train = theano.function(
            inputs=[x, y],
            outputs=[prediction, xent],
            updates={w: w - 0.01 * gw, b: b - 0.01 * gb},
            name="train")
predict = theano.function(inputs=[x], outputs=prediction,
            name="predict")

if any([x.op.__class__.__name__ in ['Gemv', 'CGemv', 'Gemm', 'CGemm'] for x in
train.maker.fgraph.toposort()]):
    print('Used the cpu')
elif any([x.op.__class__.__name__ in ['GpuGemm', 'GpuGemv'] for x in
train.maker.fgraph.toposort()]):
    print('Used the gpu')
else:
    print('ERROR, not able to tell if theano used the cpu or the gpu')
    print(train.maker.fgraph.toposort())

for i in range(training_steps):
    pred, err = train(D[0], D[1])
#print "Final model:"
#print w.get_value(), b.get_value()

print("target values for D")
print(D[1])

print("prediction on D")
print(predict(D[0]))

三、模式

每次 theano.function 被調用的時候，介於輸入和輸出之間的theano變量之間的符號關系是會被優化和編譯的。編譯的方式是由mode參數所控制的。

Theano通過名字定義的以下模型：

'FAST_COMPILE': 只使用一點graph優化，只用python實現。
'FAST_RUN': 使用所有的優化，並在可能的地方使用c實現。
'DebugMode: 驗證所有優化的正確性，對比c和pytho的實現。該模式可能會花比其他模式下更長的時間，不過卻能驗證幾種不同的問題。
'ProfileMode' (棄用): 和FAST_RUN一樣的優化，不過打印出一些分析信息
默認模式是 FAST_RUN,，不過可以通過配置變量 config.mode來改變，這可以通過傳遞關鍵參數給theano.function來重寫該值。

short name	Full constructor	What does it do?
`FAST_COMPILE`	`compile.mode.Mode(linker='py',optimizer='fast_compile')`	只用Python實現,快速和簡單的graph轉換
`FAST_RUN`	`compile.mode.Mode(linker='cvm',optimizer='fast_run')`	在可以的地方用C實現，使用所有的graph轉換技術
`DebugMode`	`compile.debugmode.DebugMode()`	兩種實現方式，使用所有的graph轉換技術
`ProfileMode`	`compile.profilemode.ProfileMode()`	棄用，在可以的地方c實現，所有的graph轉換技術，打印profile 信息

note：對於調試的目的來說，還有一個 MonitorMode 。它可以用來以step的方式來查看函數的執行，更詳細的看 the debugging FAQ

四、連接器

模式是有2個部分組成的：一個優化器和一個連接器。許多模式，例如 ProfileMode 和 DebugMode, 在優化器和連接器上增加邏輯。 ProfileMode 和DebugMode使用它們自己的連接器。

可以通過theano flag config.linker來選擇使用哪個連接器。這里是一個不同連接器的對比表：

linker	gc [1]	Raise error by op	Overhead	Definition
cvm	yes	yes	“++”	和 c \| py一樣， but the runtime algo to execute the code is in c
cvm_nogc	no	yes	“+”	和 cvm一樣，不過沒有gc
c\|py [2]	yes	yes	“+++”	嘗試使用 C code，如果沒有有關op 的c代碼，那就使用Python的
c\|py_nogc	no	yes	“++”	和 c\|py一樣，不過沒有 gc
c	no	yes	“+”	只用 C代碼 (如果對op沒有可用的c代碼，拋出錯誤)
py	yes	yes	“+++”	只用Python代碼
ProfileMode	no	no	“++++”	(棄用) 計算一些額外的profiling信息
DebugMode	no	yes	VERY HIGH	在theano的計算上進行許多檢查

[1] 在計算的時候對中間的值采用垃圾回收。不然，為了不要重新分配內存，和更少的重寫（意味着更快），被ops使用的內存空間將一直保存在theano的函數調用中。

[2] 默認。

更多詳細信息，查看庫中的 Mode 部分。

五、使用調試模式

通常來說，你應該使用 FAST_RUN 或者 FAST_COMPILE 模式，首先在使用調試模式的時候（mode='DebugMode）運行你的代碼的時候，這很有用 (特別是當你在定義新的表達式或新的優化的時候) 。調試模式是設計用來運行一些自我檢查和斷言，有助於診斷可能的編碼錯誤導致的不正確輸出。。注意到DebugMode 比 FAST_RUN 或 FAST_COMPILE 要慢，所以只在開發的時候使用該模式 (不要當在一個集群上運行1000 進程的時候用).

調試模式按如下方式使用：

x = T.dvector('x')

f = theano.function([x], 10 * x, mode='DebugMode')

f([5])
f([0])
f([7])

如果檢測到任何問題，DebugMode 將會拋出一個異常來指定出錯的信息，不論是在調用的時候(f(5))還是編譯的時候(f = theano.function(x, 10 * x, mode='DebugMode'))。這些異常不應該被忽略，和你的當地的theano guru談談或者當異常沒法搞定的時候記得給使用者發郵件

許多種錯誤只能只有當某些輸入值結合的時候才會被檢測到。在上面的例子中，沒有方法保證說一個函數的調用，例如f(-1)不會引起問題，DebugMode不是銀彈（有本軟件工程的書就叫做《沒有銀彈》）。

如果你使用構造器（見DebugMode）來實例化 DebugMode 而不是使用關鍵字 DebugMode ，你就能通過構造器的參數來配置它的行為。而DebugMode的關鍵字版本是相當嚴格的 (通過使用 mode='DebugMode'來得到) 。

更詳細的，見庫的DebugMode 。

六、ProfileMode

note：ProfileMode 被棄用了，使用 config.profile 來代替的。

在檢查錯誤的同事，另一個重要的任務就是profile你的代碼。對於thean使用的一個特殊的模式叫做ProfileMode，它是用來作為參數傳遞給 theano.function的。使用該模式是一個三步的過程。

note：為了切換到相應的默認情況下，設置theano 的flag config.mode 為ProfileMode。在這種情況下，當python的進程存在的時候，它會自動的打印profiling信息到標准輸出端口上。

T每個apply節點的輸出的內存profile可以被theano 的flag config.ProfileMode.profile_memory所啟用。

更詳細的，看看庫中 ProfileMode 的部分。

七、創建一個ProfileMode實例

首先，創建一個ProfileMode實例：

>>> from theano import ProfileMode
>>> profmode = theano.ProfileMode(optimizer='fast_run', linker=theano.gof.OpWiseCLinker())

ProfileMode的構造器將一個優化器和一個連接器作為輸入。使用哪個優化器和連接器是由應用所決定的。例如，一個用戶想要只profile python的實現，就應該使用gof.PerformLinker (或者 “py” for short)。在另一方面，一個用戶想要使用c實現來profile他的graph，那么久應該使用 gof.OpWiseCLinker (or “c|py”)。為了測試你代碼的速度，我們推薦使用 fast_run 優化器和 gof.OpWiseCLinker 連接器。

八、用ProfileMode來編譯graph

一旦ProfileMode實例創建好了，通過指定模式的參數來簡化編譯你的graph，就和平常一樣：

>>> # with functions
>>> f = theano.function([input1,input2],[output1], mode=profmode)

九、檢索時間信息

一旦你的graph編譯好了，簡單的運行你希望profile的程序或操作，然后調用 profmode.print_summary()。 這會給你提供合適的時間信息，用來指明你的graph的哪個地方最耗時。這最好通過一個例子來說明，我們接着使用邏輯回歸的例子吧。

使用 ProfileMode來編譯模塊，然后調用profmode.print_summary() 來生成下面的輸出：

"""
ProfileMode.print_summary()
---------------------------

local_time 0.0749197006226 (Time spent running thunks)
Apply-wise summary: <fraction of local_time spent at this position> (<Apply position>, <Apply Op name>)
        0.069   15      _dot22
        0.064   1       _dot22
        0.053   0       InplaceDimShuffle{x,0}
        0.049   2       InplaceDimShuffle{1,0}
        0.049   10      mul
        0.049   6       Elemwise{ScalarSigmoid{output_types_preference=<theano.scalar.basic.transfer_type object at 0x171e650>}}[(0, 0)]
        0.049   3       InplaceDimShuffle{x}
        0.049   4       InplaceDimShuffle{x,x}
        0.048   14      Sum{0}
        0.047   7       sub
        0.046   17      mul
        0.045   9       sqr
        0.045   8       Elemwise{sub}
        0.045   16      Sum
        0.044   18      mul
   ... (remaining 6 Apply instances account for 0.25 of the runtime)
Op-wise summary: <fraction of local_time spent on this kind of Op> <Op name>
        0.139   * mul
        0.134   * _dot22
        0.092   * sub
        0.085   * Elemwise{Sub{output_types_preference=<theano.scalar.basic.transfer_type object at 0x1779f10>}}[(0, 0)]
        0.053   * InplaceDimShuffle{x,0}
        0.049   * InplaceDimShuffle{1,0}
        0.049   * Elemwise{ScalarSigmoid{output_types_preference=<theano.scalar.basic.transfer_type object at 0x171e650>}}[(0, 0)]
        0.049   * InplaceDimShuffle{x}
        0.049   * InplaceDimShuffle{x,x}
        0.048   * Sum{0}
        0.045   * sqr
        0.045   * Sum
        0.043   * Sum{1}
        0.042   * Elemwise{Mul{output_types_preference=<theano.scalar.basic.transfer_type object at 0x17a0f50>}}[(0, 1)]
        0.041   * Elemwise{Add{output_types_preference=<theano.scala

參考資料：

[1]官網：http://deeplearning.net/software/theano/tutorial/modes.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Theano2.1.10-基礎知識之循環 Theano2.1.13-基礎知識之PyCUDA、CUDAMat、Gnumpy的兼容 Theano2.1.12-基礎知識之使用GPU Theano2.1.3-基礎知識之更多的例子 Serilog 配置基礎知識編譯原理基礎知識---文法和語言（一） Theano2.1.14-基礎知識之理解為了速度和正確性的內存別名 SAP SD基礎知識之銷售模式 Theano2.1.16-基礎知識之調試：常見的問題解答 Theano2.1.2-基礎知識之第一步：代數