Rendezvous
|
1. 定義在core/framework/rendezvous.h 2. A Rendezvous is an abstraction for passing a Tensor from a producer to a consumer where the consumer may safely request the Tensor before or after it has been produced. A producer never blocks when using a Rendezvous. A consumer has the choice of making a blocking call or providing a callback: in either case, the consumer receives the Tensor as soon as it is available. (簡而言之:Nonblocking send, blocking receive) 3. A Rendezvous key encodes a single <producer, consumer> pair. It is an error to call Send() or Recv*() more than once with the same key. 4. 在消息通信機制中,消息傳遞涉及到信箱容量問題。一個極端的情況是信箱容量為0,那么,當send在receive之前執行的話,則發送進程被阻塞,直到receive做完。 執行receive時信件可從發送者直接拷貝到接收者,不用任何中間緩沖。類似的,如果receive先被執行,接受者將被阻塞直到send發生。上述策略稱為回合(rendezvous)原則。 5. tensorflow 的消息傳遞屬於【發送不阻塞,接收阻塞】,實現場景有以下兩種:
> LocalRendezvous (本地消息傳遞) > RpcRemoteRendezvous (分布式消息傳遞) > 另外一種特殊的通信形式是IntraProcessRendezvous (rendezvous_mgr.h),用於本地不同設備間通信。 Buffering of Tensor values is delegated to a "local" Rendezvous obtained from NewLocalRendezvous(). This class just adds functionality to coordinate multiple process-local devices. 6. 在Op Kernels中,有SendOp和RecvOp兩個類(kernels/sendrecv_ops.h),與Rendezvous結合使用。 7. 【Each node:port specified in inputs is replaced with a feed node, which will pick up the provided input tensor from specially-initialized entries in a Rendezvous object used for the Run call】(from tensorflow white paper) |
|||
符號編程
|
前向計算圖(顯式) + 反向計算圖(隱式)
![]() |
|||
Session
|
## 說明:A Session instance lets a caller drive a TensorFlow graph computation Extend函數,把額外的節點和邊擴充到當前的運算流圖中。 Run()是Session接口中另一個重要的函數。Run()函數的參數包括最終運算輸出的變量名,及運算流圖中涉及到的張量運算集。 為得到所期望的輸出結果,運行過程中TensorFlow對所有節點進行傳遞閉包運算。並遵照節點間的運算依賴關系進行排序(具體細節將在3.1節中介紹)。 在大部分的TensorFlow應用中,一般構建一次Session,然后通過調用Run()對整個運算流圖或是部分獨立的子圖進行多次運算。 表(1)TensorFlow核心庫中的部分運算 |
|||
設備及內存分配
|
1. tensorflow設備內存管理模塊實現了一個best-fit with coalescing (
bfc)算法
> bfc選擇合適內存塊的原則是:找到chunk size大於等於x的最小的那個空閑內存塊
2. 每個 worker 負責一個或者多個設備,每個設備有一個設備類型和一個名字。設備名字由識別設備類型的部分,在 worker 中的設備索引,以及在分布式設定中,worker 的 job和任務(或者 localhost 當設備是和進程在同一機器時)的標志構成。一些例子如/job:localhost/device:cpu:0 或者 /job:worker/task:17/device:gpu:3。 每個設備對象負責管理分配和解除分配設備內存,對在 TensorFlow 實現中的更高層請求任意 kernel 的執行調度管理。
3. tensorflow中,基類Device的子類有【GPUDevice, CPUDevice(即ThreadPoolDevice), GPUCompatibleCPUDevice】
|
|||
Graph
|
Graph describes a set of computations that are to be performed, as well as the dependencies between those computations. The basic model is a
DAG (directed acyclic graph) with
* internal nodes representing computational operations to be performed;
* edges represent dependencies, indicating the target may only be executed once the source has completed;
> 正常邊,正常邊上可以流動數據,即正常邊就是tensor
> 特殊邊,又稱作控制依賴,(control dependencies)
* predefined "source" (start) and "sink" (finish) nodes -- the source should be the only node that doesn't depend on anything, and the sink should be the only node that nothing depends on.
* graph優化:
|
|||
Gradients |
![]()
MXNet設計筆記之:深度學習的編程模式比較: (Backprop和AutoDiff的案例分析)
|
|||
目錄 |
core/
----BUILD bazel編譯文件,相關編譯函數定義在.bzl文件中
----client
----common_runtime
----debug
----distributed_runtime
----example
----framework
----graph DAG圖相關
----kernels 核心Op,如【matmul, conv2d, argmax, batch_norm】等
----lib 基礎公共庫【core gif gtl hash histogram io jpeg monitoring png random strings wav】
> /lib/gtl: (google template library),包含array_slice,iterator_range, inlined_vector, map_util, std_util,等基礎庫
----ops 均為.cc文件,為基本op操作,如【array_ops, math_ops, image_ops, function_ops, random_ops, io_ops】
流運算【control_flow_ops, data_flow_ops】
以及定義了op梯度計算方式:【array_grad, math_grad, functional_grad, nn_grad, random_grad】
----platform 平台相關文件,如設備內存分配
----protobuf 均為.proto文件,用於數據傳輸時的結構序列化
----public 公共頭文件,用於外部接口調用的API定義,主要是[session.h, tensor_c_api.h]
----user_ops 用戶自定義op
----util
stream_executor/ 參考:
https://github.com/henline/streamexecutordoc
----cuda/ cuda函數封裝。(CUDA-specific support for BLAS functionality)
StreamExecutor is currently used as the runtime for the vast majority of Google's internal GPGPU applications,
and a snapshot of it is included in the open-source TensorFlow project, where it serves as the GPGPU runtime. (Google Stream Executor team)
|
|||
Register |
ops_kernel注冊: REGISTER_KERNEL_BUILDER("MatMul", MatMulOp); (kernels/matmul_ops.cc)
ops_grad 注冊: REGISTER_OP_GRADIENT("MatMul", MatMulGrad); (ops/math_grad.cc)
ops 注冊:
![]() ![]() |