Theano2.1.16-基礎知識之調試：常見的問題解答

本文轉載自查看原文 2015-06-19 14:27 8978 Theano

來自：http://deeplearning.net/software/theano/tutorial/shape_info.html

Debugging Theano: FAQ and Troubleshooting

在計算機程序中會有許多種不同的bug。該頁就是來說說FAQ，即問題集的。介紹了一些處理常見問題的方法，並介紹了一些在我們自己的theano代碼中，用於查找問題（即使該問題發生在theano內部）的工具： Using DebugMode.

一、將問題獨立出來/測試theano的編譯器

你可以在 DebugMode 下運行thenao的函數。該模式下會測試theano的優化，並有助於找到問題的所在，例如NaN，inf 和其他問題。

二、分析錯誤信息

甚至在默認的配置下，theano都會嘗試顯示有用的錯誤信息。考慮下面的錯誤代碼：

import numpy as np
import theano
import theano.tensor as T

x = T.vector()
y = T.vector()
z = x + x
z = z + y
f = theano.function([x, y], z)
f(np.ones((2,)), np.ones((3,)))

運行上面的代碼：

Traceback (most recent call last):
  File "test0.py", line 10, in <module>
    f(np.ones((2,)), np.ones((3,)))
  File "/PATH_TO_THEANO/theano/compile/function_module.py", line 605, in __call__
    self.fn.thunks[self.fn.position_of_error])
  File "/PATH_TO_THEANO/theano/compile/function_module.py", line 595, in __call__
    outputs = self.fn()
ValueError: Input dimension mis-match. (input[0].shape[0] = 3, input[1].shape[0] = 2)
Apply node that caused the error: Elemwise{add,no_inplace}(<TensorType(float64, vector)>, <TensorType(float64, vector)>, <TensorType(float64, vector)>)
Inputs types: [TensorType(float64, vector), TensorType(float64, vector), TensorType(float64, vector)]
Inputs shapes: [(3,), (2,), (2,)]
Inputs strides: [(8,), (8,), (8,)]
Inputs scalar values: ['not scalar', 'not scalar', 'not scalar']

HINT: Re-running with most Theano optimization disabled could give you a back-traces when this node was created. This can be done with by setting the Theano flags 'optimizer=fast_compile'. If that does not work, Theano optimization can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint of this apply node.

可以說最有用的信息通常差不多一半都來自對錯誤信息的分析理解，而且錯誤信息也是按照引起錯誤的順序顯示的 (ValueError: 輸入維度不匹配. (input[0].shape[0] = 3, input[1].shape[0] = 2).。在它下面，給出了一些其他的信息，例如apply節點導致的錯誤，還有輸入類型，shapes，strides 和scalar values。

最后兩個提示在調試的時候也是很有用的。使用theano flag optimizer=fast_compile 或者 optimizer=None 可以告訴你出錯的那一行，而 exception_verbosity=high 會顯示apply節點的調試打印（debugprint）。使用這些提示，錯誤信息最后會變成：

Backtrace when the node is created:
  File "test0.py", line 8, in <module>
    z = z + y

Debugprint of the apply node:
Elemwise{add,no_inplace} [@A] <TensorType(float64, vector)> ''
 |Elemwise{add,no_inplace} [@B] <TensorType(float64, vector)> ''
 | |<TensorType(float64, vector)> [@C] <TensorType(float64, vector)>
 | |<TensorType(float64, vector)> [@C] <TensorType(float64, vector)>
 |<TensorType(float64, vector)> [@D] <TensorType(float64, vector)>

這里我們可以看到錯誤可以追溯到 z = z + y這一行。對於這個例子來說，使用 optimizer=fast_compile 是有效果的，如果它沒效果，你就需要設置 optimizer=None 或者使用測試值。

三、使用測試值

在 v.0.4.0版本的時候，Theano有一個新機制，也就是theano.function 編譯之前，graph是動態執行的。因為優化在這個階段還沒執行，所以對於用戶來說就很容易定位bug的來源。這個功能可以通過配置flagtheano.config.compute_test_value啟用。下面這個例子就很好的說明了這點。這里，我們使用exception_verbosity=high 和 optimizer=fast_compile，這里（個人：該例子中）不會告訴你具體出錯的那一行（個人在；這里與上面有些矛盾，不過看得出來這里提示出錯的是調用的函數，而上面出錯定位到了語句。具體的留待以后在分析）。 optimizer=None 因而就很自然的用來代替測試值了。

import numpy
import theano
import theano.tensor as T

# compute_test_value is 'off' by default, meaning this feature is inactive
theano.config.compute_test_value = 'off' # Use 'warn' to activate this feature

# configure shared variables
W1val = numpy.random.rand(2, 10, 10).astype(theano.config.floatX)
W1 = theano.shared(W1val, 'W1')
W2val = numpy.random.rand(15, 20).astype(theano.config.floatX)
W2 = theano.shared(W2val, 'W2')

# input which will be of shape (5,10)
x  = T.matrix('x')
# provide Theano with a default test-value
#x.tag.test_value = numpy.random.rand(5, 10)

# transform the shared variable in some way. Theano does not
# know off hand that the matrix func_of_W1 has shape (20, 10)
func_of_W1 = W1.dimshuffle(2, 0, 1).flatten(2).T

# source of error: dot product of 5x10 with 20x10
h1 = T.dot(x, func_of_W1)

# do more stuff
h2 = T.dot(h1, W2.T)

# compile and call the actual function
f = theano.function([x], h2)
f(numpy.random.rand(5, 10))

運行上面的代碼，生成下面的錯誤信息：

Traceback (most recent call last):
  File "test1.py", line 31, in <module>
    f(numpy.random.rand(5, 10))
  File "PATH_TO_THEANO/theano/compile/function_module.py", line 605, in __call__
    self.fn.thunks[self.fn.position_of_error])
  File "PATH_TO_THEANO/theano/compile/function_module.py", line 595, in __call__
    outputs = self.fn()
ValueError: Shape mismatch: x has 10 cols (and 5 rows) but y has 20 rows (and 10 cols)
Apply node that caused the error: Dot22(x, DimShuffle{1,0}.0)
Inputs types: [TensorType(float64, matrix), TensorType(float64, matrix)]
Inputs shapes: [(5, 10), (20, 10)]
Inputs strides: [(80, 8), (8, 160)]
Inputs scalar values: ['not scalar', 'not scalar']

Debugprint of the apply node:
Dot22 [@A] <TensorType(float64, matrix)> ''
 |x [@B] <TensorType(float64, matrix)>
 |DimShuffle{1,0} [@C] <TensorType(float64, matrix)> ''
   |Flatten{2} [@D] <TensorType(float64, matrix)> ''
     |DimShuffle{2,0,1} [@E] <TensorType(float64, 3D)> ''
       |W1 [@F] <TensorType(float64, 3D)>

HINT: Re-running with most Theano optimization disabled could give you a back-traces when this node was created. This can be done with by setting the Theano flags 'optimizer=fast_compile'. If that does not work, Theano optimization can be disabled with 'optimizer=None'.

如果上面的信息還不夠，可以通過改變一些代碼，從而讓theano來揭示錯誤的准確來源。

# enable on-the-fly graph computations
theano.config.compute_test_value = 'warn'

...

# input which will be of shape (5, 10)
x  = T.matrix('x')
# provide Theano with a default test-value
x.tag.test_value = numpy.random.rand(5, 10)

上面的代碼中，我們將符號矩陣x 賦值一個特定的測試值。這允許theano按照之前定義的那樣，動態的執行符號表達式（通過對每個op調用perform方法）。因此，可以在編譯通道中更准確和更早的識別到錯誤的來源。例如，運行上面的代碼得到下面的錯誤信息，正確的識別到了第24行。

Traceback (most recent call last):
  File "test2.py", line 24, in <module>
    h1 = T.dot(x, func_of_W1)
  File "PATH_TO_THEANO/theano/tensor/basic.py", line 4734, in dot
    return _dot(a, b)
  File "PATH_TO_THEANO/theano/gof/op.py", line 545, in __call__
    required = thunk()
  File "PATH_TO_THEANO/theano/gof/op.py", line 752, in rval
    r = p(n, [x[0] for x in i], o)
  File "PATH_TO_THEANO/theano/tensor/basic.py", line 4554, in perform
    z[0] = numpy.asarray(numpy.dot(x, y))
ValueError: matrices are not aligned

compute_test_value 機制如下方式工作：

當使用Theano的 constants 和 shared 變量的時候，不需要instrument它們。
一個theano變量 (例如： dmatrix, vector,等等) 應該通過屬性 tag.test_value來賦值特定的測試值。
Theano 會自動instruments 中間的結果。所以，任何從x中得到的值會自動由tag.test_value引用。

compute_test_value 可以有以下的值:

off: 默認行為. 這時候調試機制是未激活的。
raise:動態計算測試值。任何變量都需要一個測試值，不過不需要用戶來提供，這被認為是一個錯誤。會相應的拋出一個異常。
warn: Idem, 發出一個警告，而不是拋出異常。
ignore: 當一個變量沒有測試值的時候，會靜默的忽略掉中間測試值的計算。

note：該特性暫時不能與 Scan 兼容，而且也無法和那些沒有實現perform方法的ops相兼容。

四、我如何在一個函數中輸出中間值？

Theano提供了一個‘Print’ 操作：

x = theano.tensor.dvector('x')

x_printed = theano.printing.Print('this is a very important value')(x)

f = theano.function([x], x * 5)
f_with_print = theano.function([x], x_printed * 5)

#this runs the graph without any printing
assert numpy.all( f([1, 2, 3]) == [5, 10, 15])

#this runs the graph with the message, and value printed
assert numpy.all( f_with_print([1, 2, 3]) == [5, 10, 15])

因為 Theano 是以拓撲順序來運行你的程序的，你沒法准確的按照順序來控制，這時候多個Print()是同時運行的。想要知道更詳細的關於在哪里、什么時候、怎樣計算的，查閱： “How do I Step through a Compiled Function?” .

warning：使用這個Print Theano 操作可以防止一些theano的優化。這也可以在穩定的優化的時候使用，所以如果你使用這個Print，然后有NaN，那么就試着移除它們來看看是否是它們導致的錯誤。

五、我如何在編譯前后輸出一個graph

Theano 提供兩個函數 (theano.pp() 和 theano.printing.debugprint()) 來在編譯的前后打印graph到終端上。這兩個函數以不同的方式來打印表達式： pp() 更緊湊，而且更像數學； debugprint() 更詳細。Theano 同樣提供 theano.printing.pydotprint() ，這會生成一副關於函數的png圖片。

更詳細的查閱： printing – Graph Printing and Symbolic Print Statement.

六、我編譯的函數太慢了，怎么辦？

首先，確保你運行在 FAST_RUN 模式下。雖然 FAST_RUN 是默認情況下的模式，不過還是堅持要傳遞 mode='FAST_RUN' 給theano.function (或者 theano.make) 或者設置config.mode 為 FAST_RUN.

其次，嘗試 Theano ProfileMode. 這會告訴你現在是哪個 Apply 節點和哪個ops在你的cpu周期上。

提示:

使用flags floatX=float32 來請求類型 float32 而不是 float64; 使用 Theano 構造函數matrix(),vector(),... 而不是 dmatrix(), dvector(),... 因為他們分別涉及到默認的類型 float32 和 float64.
當你想要以相同的類型來將兩個矩陣進行相乘的時候，記得以profile模式來檢查在編譯后的graph中沒有Dot操作。當輸入是矩陣而且有着相同的類型的時候，Dot會被優化成dot22。當然在使用floatX=float32 ，而且其中一個graph的輸入是類型float64的時候也是這樣。

七、我如何對一個編譯后的函數進行step調試

你可以使用 MonitorMode 來檢查當函數被調用的時候每個節點的輸入和輸出。下面的代碼就展示了如何打印所有的輸入和輸出：

import theano

def inspect_inputs(i, node, fn):
    print i, node, "input(s) value(s):", [input[0] for input in fn.inputs],

def inspect_outputs(i, node, fn):
    print "output(s) value(s):", [output[0] for output in fn.outputs]

x = theano.tensor.dscalar('x')
f = theano.function([x], [5 * x],
                    mode=theano.compile.MonitorMode(
                        pre_func=inspect_inputs,
                        post_func=inspect_outputs))
f(3)

# The code will print the following:
#   0 Elemwise{mul,no_inplace}(TensorConstant{5.0}, x) input(s) value(s): [array(5.0), array(3.0)] output(s) value(s): [array(15.0)]

當在 MonitorMode的情況下，使用 inspect_inputs 和 inspect_outputs 這些函數。你應該看到 [可能很多] 打印的輸出。每個 Apply 節點都會被打印出來，按照graph中的位置順序，參數到函數 perform 或者 c_code 和計算得到的輸出。不可否認，如果你使用的是大張量，這會有着超多的輸出要讀... 不過你可以選擇增加邏輯來打印一部分信息，比如打印那些用到某種op的，在程序的某個位置，或者在輸入或者輸出上的一個具體的值。一個典型的例子就是檢測什么時候NaN的值會被加到計算中，如下面代碼：

import numpy

import theano

# This is the current suggested detect_nan implementation to
# show you how it work.  That way, you can modify it for your
# need.  If you want exactly this method, you can use
# ``theano.compile.monitormode.detect_nan`` that will always
# contain the current suggested version.

def detect_nan(i, node, fn):
    for output in fn.outputs:
        if (not isinstance(output[0], numpy.random.RandomState) and
            numpy.isnan(output[0]).any()):
            print '*** NaN detected ***'
            theano.printing.debugprint(node)
            print 'Inputs : %s' % [input[0] for input in fn.inputs]
            print 'Outputs: %s' % [output[0] for output in fn.outputs]
            break

x = theano.tensor.dscalar('x')
f = theano.function([x], [theano.tensor.log(x) * x],
                    mode=theano.compile.MonitorMode(
                        post_func=detect_nan))
f(0)  # log(0) * 0 = -inf * 0 = NaN

# The code above will print:
#   *** NaN detected ***
#   Elemwise{Composite{[mul(log(i0), i0)]}} [@A] ''
#    |x [@B]
#   Inputs : [array(0.0)]
#   Outputs: [array(nan)]

為了幫助理解在你的graph中在發生的的事情，你可以禁用 local_elemwise_fusion 和所有的 inplace 優化。首先是速度優化，也就是會將逐元素操作融合到一起的優化。這會使的更難知道哪個具體的逐元素導致的問題。第二個優化就是會讓某些ops的輸出重寫它們的輸入。所以如果一個op生成一個壞的輸出，你就沒法看到在post_func函數中被重寫之前的輸入。為了禁用這些優化（0.6rc3之后的版本），如下定義 MonitorMode：

mode = theano.compile.MonitorMode(post_func=detect_nan).excluding(
    'local_elemwise_fusion', 'inplace)
 f = theano.function([x], [theano.tensor.log(x) * x],
                     mode=mode)

note： Theano flags optimizer_including , optimizer_excluding 和 optimizer_requiring 不會被 MonitorMode使用的，它們只會在default模式下使用。當你想要定義監視的部分的時候，你沒法將 default 模式和MonitorMode一起使用。

為了確保所有的節點的輸入都是在調用到psto_func的時候可用的，你必須同樣禁用垃圾回收。執行的節點垃圾回收那些theano函數不再需要的輸入。這可以通過下面的flag來指定：

allow_gc=False

八、我如何使用pdb

在大部分情況下，你不是在交互模式下執行程序而是以python腳本的方式。在這種情況下，對python調試器的使用就變得十分的需要了，特別是當你的模型變得更加復雜的時候。中間的結果不需要有很清晰的名字，而且你會得到那些很那解讀的異常，因為這是函數編譯后的自然特性導致的：

考慮這個例子腳本 (“ex.py”):

import theano
import numpy
import theano.tensor as T

a = T.dmatrix('a')
b = T.dmatrix('b')

f = theano.function([a, b], [a * b])

# matrices chosen so dimensions are unsuitable for multiplication
mat1 = numpy.arange(12).reshape((3, 4))
mat2 = numpy.arange(25).reshape((5, 5))

f(mat1, mat2)

這實際上如此的簡單，而且調試也是如此的容易，不過這是為了圖文講解的目的。正如矩陣沒法逐元素相乘（不匹配的shapes），我們得到了下面的異常：

File "ex.py", line 14, in <module>
  f(mat1, mat2)
File "/u/username/Theano/theano/compile/function_module.py", line 451, in __call__
File "/u/username/Theano/theano/gof/link.py", line 271, in streamline_default_f
File "/u/username/Theano/theano/gof/link.py", line 267, in streamline_default_f
File "/u/username/Theano/theano/gof/cc.py", line 1049, in execute ValueError: ('Input dimension mis-match. (input[0].shape[0] = 3, input[1].shape[0] = 5)', Elemwise{mul,no_inplace}(a, b), Elemwise{mul,no_inplace}(a, b))

調用的堆棧包含着一些有用的信息，從而可以追溯錯誤的來源。首先是編譯后的函數被調用的腳本– 不過如果你使用（不正確的參數化）預建立模塊，錯誤也許來自這些模塊中的ops，而不是這個腳本。最后一行告訴我們這個op引起了這個異常。一個“mul”涉及到變量“a”和“b”。不過這里假設我們替換了一個沒有名字的中間值。

在了解了theano中graph結構的一些知識，我們可以使用python調試器來探索這個graph，然后我們就可以得到運行時的錯誤信息。特別是矩陣維度對指出錯誤的來源很有用。在打印出的結果中，會涉及到矩陣的4維度中的2個維度，不過因為例子的原因，我們需要其他的維度來指出錯誤。首先我們再次運行調試器模塊，然后用“c”來運行該程序：

python -m pdb ex.py
> /u/username/experiments/doctmp1/ex.py(1)<module>()
-> import theano
(Pdb) c

然后我們返回到上面錯誤的打印輸出部分，不過解釋器停留在了那個狀態。有用的命令如下：

“up” 和 “down” (往下或往上移動這個調用堆棧),
“l” (在當前堆棧位置上打印該行周圍的代碼),
“p variable_name” (打印 ‘variable_name’的字符串解釋),
“p dir(object_name)”, 使用python的 dir() 函數來打印一個對象的成員的列表。

例如，鍵入 “up”,和一個簡單的 “l” 會得到一個局部變量 “node”。該 “node” 來自於計算graph中，所以通過跟隨 “node.inputs”, “node.owner” 和 “node.outputs” 連接，就能探索這個graph。

這個graph是純符號的 (沒有數據，只有抽象的符號操作)。為了得到實際參數的信息，你需要探索 “thunk” 對象，這是通過函數自身（一個“thunk”就是一個關於閉包的概念）來綁定了輸入（和輸出）的存儲的。這里，為了得到當前節點的第一個輸入shape，你需要鍵入 “p thunk.inputs[0][0].shape”，這會打印出 “(3, 4)”.

九、Dumping 一個函數來幫助調試

如果你讀到這里了，那么就可能是你郵件到了我們的主列表，然后我們建議你讀的這部分。這部分解釋了如何dump所有的傳到theano.function()的參數。這有助於幫助我們在編譯的時候復制問題，然這並不要求你舉一個自圓其說的例子。

為了讓這工作起來，我們需要導入graph中所有op的代碼，所以如果你創建了你自己的op，我們需要這份代碼。然而，我們不會unpickle它，我們已經有了來自theano和Pylearn2的所有Ops：

# Replace this line:
theano.function(...)
# with
theano.function_dump(filename, ...)
# Where filename is a string to a file that we will write to.

然后和我們說文件名。

參考資料：

[1]官網：http://deeplearning.net/software/theano/tutorial/shape_info.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 WCF常見問題解答 # ConfigureAwait常見問題解答 pika常見問題解答（FAQ）充提系統常見問題解答問答專欄｜光模塊使用常見問題解答 Apkplug 開發常見問題解答 AppCan移動平台開發常見問題解答 Terraria吧常見問題解答V1.2 Power Automate實用常見問題解答(FAQ) xaf-常見問題解答