Tensorflow Debug:InvalidArgumentError: Cannot assign a device for operation 'gradients/embedding_lookup_grad/ToInt32': Could not satisfy explicit device specification '' because the node was colocated


原始代碼為:

#Learning Algorithm for CADE
# config = tf.ConfigProto(allow_soft_placement = True)
sess = tf.InteractiveSession()
maxIter = 100
ite = int(0)
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
while ite<maxIter:
    t1 = time()
    print('Iteration%d start at %.4f...'%(ite,t1))
    for i in range(train_usernums):
        _loss,_ = sess.run((loss,train),feed_dict={x:trainuimat.todense(),v:trainuni,_drop_rate:drop_rate})
        print('\t loss:%f'%(_loss))
    out = sess.run(out2,feed_dict={x:validuimat.todense(),v:validuni,_drop_rate:0})
    out = out*(validuimat.todense()==0)
    out = np.argsort(out)[:,::-1]
    for _k in [1,5,10]:
        _MAP = MAP(testuidict,out,_k)
        print('Iteration%d :  MAP@%d %f'%(ite,_k,_MAP))
    print('Iteration%d used time:%.4f s'%(ite,time()-t1))
    ite+=1
    

然后報錯:

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1322     try:
-> 1323       return fn(*args)
   1324     except errors.OpError as e:

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1301                                    feed_dict, fetch_list, target_list,
-> 1302                                    status, run_metadata)
   1303 

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    472             compat.as_text(c_api.TF_Message(self.status.status)),
--> 473             c_api.TF_GetCode(self.status.status))
    474     # Delete the underlying status object from memory otherwise it stays alive

InvalidArgumentError: Cannot assign a device for operation 'gradients/embedding_lookup_grad/ToInt32': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'
Colocation Debug Info:
Colocation group had the following types and devices: 
Const: GPU CPU 
VariableV2: GPU CPU 
UnsortedSegmentSum: GPU CPU 
Identity: GPU CPU 
L2Loss: GPU CPU 
Shape: GPU CPU 
Mul: GPU CPU 
Gather: GPU CPU 
SparseApplyAdagrad: CPU 
Cast: GPU CPU 
Unique: GPU CPU 
StridedSlice: GPU CPU 
     [[Node: gradients/embedding_lookup_grad/ToInt32 = Cast[DstT=DT_INT32, SrcT=DT_INT64, _class=["loc:@EmbeddingParams"]](gradients/embedding_lookup_grad/Shape)]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-6-01dec1a8f42a> in <module>()
     10     print('Iteration%d start at %.4f...'%(ite,t1))
     11     for i in range(train_usernums):
---> 12         _loss,_ = sess.run((loss,train),feed_dict={x:trainuimat.todense(),v:trainuni,_drop_rate:drop_rate})
     13         print('\t loss:%f'%(_loss))
     14     out = sess.run(out2,feed_dict={x:validuimat.todense(),v:validuni,_drop_rate:0})

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    887     try:
    888       result = self._run(None, fetches, feed_dict, options_ptr,
--> 889                          run_metadata_ptr)
    890       if run_metadata:
    891         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1118     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1119       results = self._do_run(handle, final_targets, final_fetches,
-> 1120                              feed_dict_tensor, options, run_metadata)
   1121     else:
   1122       results = []

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1315     if handle is None:
   1316       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1317                            options, run_metadata)
   1318     else:
   1319       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1334         except KeyError:
   1335           pass
-> 1336       raise type(e)(node_def, op, message)
   1337 
   1338   def _extend_graph(self):

InvalidArgumentError: Cannot assign a device for operation 'gradients/embedding_lookup_grad/ToInt32': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'
Colocation Debug Info:
Colocation group had the following types and devices: 
Const: GPU CPU 
VariableV2: GPU CPU 
UnsortedSegmentSum: GPU CPU 
Identity: GPU CPU 
L2Loss: GPU CPU 
Shape: GPU CPU 
Mul: GPU CPU 
Gather: GPU CPU 
SparseApplyAdagrad: CPU 
Cast: GPU CPU 
Unique: GPU CPU 
StridedSlice: GPU CPU 
     [[Node: gradients/embedding_lookup_grad/ToInt32 = Cast[DstT=DT_INT32, SrcT=DT_INT64, _class=["loc:@EmbeddingParams"]](gradients/embedding_lookup_grad/Shape)]]

Caused by op 'gradients/embedding_lookup_grad/ToInt32', defined at:
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 486, in start
    self.io_loop.start()
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 127, in start
    self.asyncio_loop.run_forever()
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/asyncio/base_events.py", line 422, in run_forever
    self._run_once()
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/asyncio/base_events.py", line 1432, in _run_once
    handle._run()
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tornado/ioloop.py", line 759, in _run_callback
    ret = callback()
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 536, in <lambda>
    self.io_loop.add_callback(lambda : self._handle_events(self.socket, 0))
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2662, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2785, in _run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2903, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-d4a6590ba166>", line 27, in <module>
    train = optimizer.minimize(loss)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 343, in minimize
    grad_loss=grad_loss)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 414, in compute_gradients
    colocate_gradients_with_ops=colocate_gradients_with_ops)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 581, in gradients
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 353, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 581, in <lambda>
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/array_grad.py", line 367, in _GatherGrad
    params_shape = math_ops.to_int32(params_shape)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 826, in to_int32
    return cast(x, dtypes.int32, name=name)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 745, in cast
    return gen_math_ops.cast(x, base_type, name=name)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 892, in cast
    "Cast", x=x, DstT=DstT, name=name)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

...which was originally created as op 'embedding_lookup', defined at:
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
[elided 23 identical lines from previous traceback]
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-d4a6590ba166>", line 11, in <module>
    ve = tf.nn.embedding_lookup(embedding_params,v)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/embedding_ops.py", line 328, in embedding_lookup
    transform_fn=None)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/embedding_ops.py", line 150, in _embedding_lookup_and_transform
    result = _clip(_gather(params[0], ids, name=name), ids, max_norm)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/embedding_ops.py", line 54, in _gather
    return array_ops.gather(params, ids, name=name)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 2486, in gather
    params, indices, validate_indices=validate_indices, name=name)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1834, in gather
    validate_indices=validate_indices, name=name)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home/likewise-open/SENSETIME/liupengcheng/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'gradients/embedding_lookup_grad/ToInt32': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'
Colocation Debug Info:
Colocation group had the following types and devices: 
Const: GPU CPU 
VariableV2: GPU CPU 
UnsortedSegmentSum: GPU CPU 
Identity: GPU CPU 
L2Loss: GPU CPU 
Shape: GPU CPU 
Mul: GPU CPU 
Gather: GPU CPU 
SparseApplyAdagrad: CPU 
Cast: GPU CPU 
Unique: GPU CPU 
StridedSlice: GPU CPU 
     [[Node: gradients/embedding_lookup_grad/ToInt32 = Cast[DstT=DT_INT32, SrcT=DT_INT64, _class=["loc:@EmbeddingParams"]](gradients/embedding_lookup_grad/Shape)]]

google到:https://github.com/tensorflow/tensorflow/issues/2292

說是GPU配置問題:

I just follow mrry's suggestion here, adding "allow_soft_placement=True" as follows:

config = tf.ConfigProto(allow_soft_placement = True)
sess = tf.Session(config = config)

Then it works.

I reviewed the Using GPUs in tutorial. It mentions adding "allow_soft_placement" under the error "Could not satisfy explicit device specification '/gpu:X' ". But it not mentions it could also solve the error "no supported kernel for GPU devices is available". Maybe it's better to add this in tutorial text in order to avoid confusing future users.

添加該語句(源代碼注釋部分),得到錯誤:

InvalidArgumentError                      Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1322     try:
-> 1323       return fn(*args)
   1324     except errors.OpError as e:

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1301                                    feed_dict, fetch_list, target_list,
-> 1302                                    status, run_metadata)
   1303 

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    472             compat.as_text(c_api.TF_Message(self.status.status)),
--> 473             c_api.TF_GetCode(self.status.status))
    474     # Delete the underlying status object from memory otherwise it stays alive

InvalidArgumentError: AttrValue must not have reference type value of float_ref
     for attr 'tensor_type'
    ; NodeDef: EmbeddingParams/Adagrad/_41 = _Recv[_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_470_EmbeddingParams/Adagrad", tensor_type=DT_FLOAT_REF, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^Adagrad/learning_rate/_43, ^Adagrad/update_EmbeddingParams/UnsortedSegmentSum, ^Adagrad/update_EmbeddingParams/Unique); Op<name=_Recv; signature= -> tensor:tensor_type; attr=tensor_type:type; attr=tensor_name:string; attr=send_device:string; attr=send_device_incarnation:int; attr=recv_device:string; attr=client_terminated:bool,default=false; is_stateful=true>
     [[Node: EmbeddingParams/Adagrad/_41 = _Recv[_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_470_EmbeddingParams/Adagrad", tensor_type=DT_FLOAT_REF, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^Adagrad/learning_rate/_43, ^Adagrad/update_EmbeddingParams/UnsortedSegmentSum, ^Adagrad/update_EmbeddingParams/Unique)]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-6-3171867edb26> in <module>()
     10     print('Iteration%d start at %s...'%(ite,t1))
     11     for i in range(train_usernums):
---> 12         _loss,_ = sess.run((loss,train),feed_dict={x:trainuimat.todense(),v:trainuni,_drop_rate:drop_rate})
     13         print('\t loss:%f'%(_loss))
     14     out = sess.run(out2,feed_dict={x:validuimat.todense(),v:validuni,_drop_rate:0})

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    887     try:
    888       result = self._run(None, fetches, feed_dict, options_ptr,
--> 889                          run_metadata_ptr)
    890       if run_metadata:
    891         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1118     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1119       results = self._do_run(handle, final_targets, final_fetches,
-> 1120                              feed_dict_tensor, options, run_metadata)
   1121     else:
   1122       results = []

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1315     if handle is None:
   1316       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1317                            options, run_metadata)
   1318     else:
   1319       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1334         except KeyError:
   1335           pass
-> 1336       raise type(e)(node_def, op, message)
   1337 
   1338   def _extend_graph(self):

InvalidArgumentError: AttrValue must not have reference type value of float_ref
     for attr 'tensor_type'
    ; NodeDef: EmbeddingParams/Adagrad/_41 = _Recv[_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_470_EmbeddingParams/Adagrad", tensor_type=DT_FLOAT_REF, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^Adagrad/learning_rate/_43, ^Adagrad/update_EmbeddingParams/UnsortedSegmentSum, ^Adagrad/update_EmbeddingParams/Unique); Op<name=_Recv; signature= -> tensor:tensor_type; attr=tensor_type:type; attr=tensor_name:string; attr=send_device:string; attr=send_device_incarnation:int; attr=recv_device:string; attr=client_terminated:bool,default=false; is_stateful=true>
     [[Node: EmbeddingParams/Adagrad/_41 = _Recv[_start_time=0, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_470_EmbeddingParams/Adagrad", tensor_type=DT_FLOAT_REF, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^Adagrad/learning_rate/_43, ^Adagrad/update_EmbeddingParams/UnsortedSegmentSum, ^Adagrad/update_EmbeddingParams/Unique)]]

google得到:https://github.com/tensorflow/tensorflow/issues/13880

采用了方法之一:把InteractiveSession改為常規session。解決問題:

#Learning Algorithm for CADE
config = tf.ConfigProto(allow_soft_placement = True)
with tf.Session(config=config) as sess:
    maxIter = 100
    ite = int(0)
    sess.run(tf.global_variables_initializer())
    sess.run(tf.local_variables_initializer())
    while ite<maxIter:
        t1 = time()
        print('Iteration%d start at %.4f s...'%(ite,t1))
        for i in range(train_usernums):
            _loss,_ = sess.run((loss,train),feed_dict={x:trainuimat.todense(),v:trainuni,_drop_rate:drop_rate})
            print('\t loss:%f'%(_loss))
        out = sess.run(out2,feed_dict={x:validuimat.todense(),v:validuni,_drop_rate:0})
        out = out*(validuimat.todense()==0)
        out = np.argsort(out)[:,::-1]
        for _k in [1,5,10]:
            _MAP = MAP(testuidict,out,_k)
            print('Iteration%d :  MAP@%d %f'%(ite,_k,_MAP))
        print('Iteration%d used time:%.4f s'%(ite,time()-t1))
        ite+=1
    

不過又出了新的問題:

InvalidArgumentError: indices[69165,0] = 69166 is not in [0, 69166)
     [[Node: embedding_lookup = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:@EmbeddingParams"], validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](EmbeddingParams/read, _arg_Placeholder_1_0_1)]]

看Node:是embedding出了問題,

想起獻文昨天說過embedding輸入維度+1的事情,改了

embedding_params = tf.get_variable('EmbeddingParams',shape=[train_usernums+1,K],dtype=tf.float32,
                                   initializer=tf.glorot_normal_initializer(),
                                   regularizer=tf.contrib.layers.l2_regularizer(lamda))

又出新問題:

InternalError: Dst tensor is not initialized.
     [[Node: embedding_lookup/_15 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_38_embedding_lookup", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
     [[Node: add/_33 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_414_add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

查了說是GPU 內存滿了,猜想可能是沒有開

config.gpu_options.allow_growth = True

加上,沒用。

然后用nvidia-smi查看,發現竟然用了3ge多G的GPU內存。然后頓悟,我是不是應該一條一條的傳給placeholder而不是全部傳進去……

然后問題解決了……太智障了……


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM