unsorted_segment_sum源碼追溯

本文轉載自查看原文 2020-08-01 16:03 643 tensorflow/ 日常

unsorted_segment_sum

在tensorflow中遇到了unsorted_segment_sum作用差不多的幾個算子，追溯了一下源碼，mark一下。

`tf.math.unsorted_segment_sum`版本

tf.math.unsorted_segment_sum(
    data,		# <tf.Tensor 'wide_deep/deep/mul_4:0' shape=(?, ?, 32) dtype=float32>
    segment_ids,# <tf.Tensor 'wide_deep/deep/add_4:0' shape=(?, ?) dtype=int64>
    num_segments,# <tf.Tensor 'wide_deep/deep/Cast_9:0' shape=() dtype=int64>
    name=None # None
)

參數解釋：

data :

A Tensor. Must be one of the following types: float32, float64, int32, uint8, int16, int8, complex64, int64, qint8, quint8, qint32, bfloat16, uint16, complex128, half, uint32, uint64.

segment_ids : 分段索引數組，shape要求是data.shape的前綴。

A Tensor. Must be one of the following types: int32, int64. A tensor whose shape is a prefix of data.shape.

num_segments : 分段數目。

A Tensor. Must be one of the following types: int32, int64.

name :

A name for the operation (optional).

返回：類型與data相同，維度為(num_segments, data.shape(segment_ids.dims()), ... ,data.shape(data.dims()))

A Tensor. Has the same type as data.

作用

c = tf.constant([[1,2,3,4], [5,6,7,8], [4,3,2,1]])
tf.unsorted_segment_sum(c, tf.constant([0, 1, 0]), num_segments=2)
# ==> [[ 0所屬分段和 ], [ 1所屬分段和 ]]
# ==> [[ c[0] + c[2]], [c1]]
# ==> [[ 5,  5, 5, 5], [5,  6, 7, 8]]

實現

tensorflow 1.14.0版本python端：

tensorflow/python/ops/gen_math_ops.py(11767)unsorted_segment_sum()

gen_math_ops.py是編譯后生成的python文件，實際上是通過_pywrap_tensorflow.TFE_Py_FastPathExecute調用C++代碼：

tensorflow/core/ops/math_ops.cc(1252)REGISTER_OP("UnsortedSegmentSum")

UnsortedSegmentSum類比較復雜，且有多個版本，這里以GPU版本為例，首先通過REGISTER_GPU_KERNEL_UNSORTEDSEGMENT間接定義：

tensorflow/core/kernels/segment_reduction_ops.cc(584)REGISTER_GPU_KERNEL_UNSORTEDSEGMENT("UnsortedSegmentSum", type, index_type, functor::Zero<type>, functor::SumOpGpu<type>)

REGISTER_GPU_KERNEL_UNSORTEDSEGMENT宏最終通過REGISTER_KERNEL_BUILDER調用UnsortedSegmentReductionOp類：

tensorflow/core/kernels/segment_reduction_ops.cc(467)class UnsortedSegmentReductionOp

具體實現在Compute函數中：

tensorflow/core/kernels/segment_reduction_ops.cc(472)Compute()

在REGISTER_GPU_KERNEL_UNSORTEDSEGMENT中指定了DeviceReductionFunctor為functor::UnsortedSegmentFunctor這里直接調用：

tensorflow\core\kernels\segment_reduction_ops_gpu.cu.cc(176)struct UnsortedSegmentFunctor

UnsortedSegmentFunctor調用了兩個CUDA kernel：

第一個kernel為 SetToValue設定返回tensor的值全0（functor::Zero，在REGISTER_GPU_KERNEL_UNSORTEDSEGMENT指定的）：

tensorflow/core/util/gpu_device_functions.h(472)SetToValue()
tensorflow/core/kernels/segment_reduction_ops.h(107)struct Zero

第二個kernel為UnsortedSegmentCustomKernel對每個元素調用functor::SumOpGpu（REGISTER_GPU_KERNEL_UNSORTEDSEGMENT指定的）：

tensorflow/core/kernels/segment_reduction_ops_gpu.cu.cc(109)UnsortedSegmentCustomKernel()
tensorflow/core/kernels/segment_reduction_ops.h(72)struct SumOpGpu

實際上就是對每個元素調用CudaAtomicAdd函數。

C++代碼文件.cc等都只能在編譯前的源碼中找到，編譯后成了.so文件。

`tf.scatter_add`版本

tf.scatter_add(
    ref,     # <tf.Tensor 'wide_deep/deep/transpose:0' shape=(32, ?, 6) dtype=float32>
    indices, # <tf.Tensor 'wide_deep/deep/transpose_1:0' shape=(32, ?, ?) dtype=int64>
    updates, # <tf.Tensor 'wide_deep/deep/mul_4:0' shape=(?, ?, 32) dtype=float32>
    use_locking=False, #False
    name=None #None
)

參數解釋

ref: 目標值，類型與updates相同，這里輸入為全0 tensor，。

A Variable.

indices: 索引id，與data中的元素一一對應，表示updates要加到ref中的哪個位置。

A Tensor. Must be one of the following types: int32, int64. A tensor of indices into the first dimension of ref.

updates: 即data，維度與indices相同。A Tensor.

Must have the same type as ref.A tensor of updated values to store in ref.

use_locking: ref+=updates時是否加鎖。

An optional bool. Defaults to False. If True, the assignment will be protected by a lock; otherwise the behavior is undefined, but may exhibit less contention.

name:

A name for the operation (optional).

Same as ref. Returned as a convenience for operations that want to use the updated values after the update is done.

限制：

updates.shape = indices.shape + ref.shape[1:]

作用

# Scalar indices
ref[indices, ...] += updates[...]

# Vector indices (for each i)
ref[indices[i], ...] += updates[i, ...]

# High rank indices (for each i, ..., j)
ref[indices[i, ..., j], ...] += updates[i, ..., j, ...]

實現

tensorflow 1.14.0版本python端：

tensorflow/python/ops/gen_state_ops.py(719)scatter_add()

gen_state_ops.py是編譯后生成的python文件，實際上是通過_op_def_lib._apply_op_helper調用C++代碼：

tensorflow/core/ops/state_ops.cc(146)REGISTER_OP("ScatterAdd")

ScatterAdd類的實現比較復雜，該class並不是直接定義的，而是通過REGISTER_SCATTER_KERNEL間接定義的：

tensorflow/core/kernels/scatter_op.cc(256)REGISTER_SCATTER_KERNEL(type, dev, "ScatterAdd", scatter_op::UpdateOp::ADD);

該宏定義最終通過REGISTER_KERNEL_BUILDER調用ScatterUpdateOp類：

tensorflow/core/kernels/scatter_op.cc(73)class ScatterUpdateOp

具體實現在Compute中：

tensorflow/core/kernels/scatter_op.cc(84)Compute()

而Compute只是判斷是否加鎖並最終調用DoCompute函數：

tensorflow/core/kernels/scatter_op.cc(97)DoCompute()

DoCompute函數其實也只是檢查參數，具體實現由functor::ScatterFunctor，只看GPU版本的實現：

tensorflow/core/kernels/scatter_functor_gpu.cu.h(118)struct ScatterFunctor

該算子只調用了一個CUDA kernel scatter_op_gpu::ScatterOpCustomKernel：

tensorflow/core/kernels/scatter_functor_gpu.cu.h(73)ScatterOpCustomKernel()

該kernel對每一個元素調用ScatterOpKernelBody運算，這里調用的是scatter_op::UpdateOp::ADD版本（REGISTER_SCATTER_KERNEL指定的）：

tensorflow/core/kernels/scatter_functor_gpu.cu.h(43)struct ScatterOpKernelBody

實際上就是對每個元素調用CudaAtomicAdd操作。

C++代碼文件.cc等都只能在編譯前的源碼中找到，編譯后成了.so文件。

`torch.scatter_add`版本

torch.scatter_add(
    dim,	
    index, 
    src
)

參數解釋

self(tensor) : 調用scatter_add的對象，通常由一個tensor元素調用。

dim (int) : 單個int值，src要加到self的哪個維度。

the axis along which to index.

index (LongTensor) : 索引id，src加到self的dim維的index位置，大小要么為空，要么與src的維度相同。

the indices of elements to scatter and add, can be either empty or the same size of src. When empty, the operation returns identity.

src (Tensor) : 要加的元素。

the source elements to scatter and add.

一個tensor，維度與self的維度相同。

限制：

index.size(d) <= src.size(d) for all dimensions d, and that index.size(d) <= self.size(d) for all dimensions d != dim.

作用

self[index[i][j][k]][j][k] += src[i][j][k]  # if dim == 0
self[i][index[i][j][k]][k] += src[i][j][k]  # if dim == 1
self[i][j][index[i][j][k]] += src[i][j][k]  # if dim == 2

實現

pytorch 1.5.1版本python端：

torch/onnx/symbolic_opset9.py(1938)scatter_add()

從python源代碼可以直接看到，scatter_add的實現分為三步：

第一步先生成一個大小與self相同的全0 tensor to_add。

sizes = self.type().sizes()

to_add = g.op("Constant", value_t=torch.zeros(sizes, dtype=dtype))

第二步通過scatter操作將src的元素按index賦值到to_add的dim維對應位置處。

to_add = sym_help._scatter_helper(g, to_add, dim, index, src)

最后將to_add加到self中。

add(g, self, to_add)

具體C++代碼和CUDA代碼實現從pytorch源碼中並沒有找到。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 tf.segment_sum和tf.unsorted_segment_sum理解實例 solr源碼分析之數據導入DataImporter追溯。 kcp源碼segment頭文件各字段含義從源碼里的一個注釋，我追溯到了12年前，有點意思。 es lucene寫入流程，segment產生機制源碼分析堆技巧Unsorted Bin Attack 生產制造追溯系統 segment routing 基於Web的制造追溯系統DEMO 追溯 React Hot Loader 的實現