tf.train.shuffle_batch函數解析

覺得有用的話,歡迎一起討論相互學習~

`tf.train.shuffle_batch`

(tensor_list, batch_size, capacity, min_after_dequeue, num_threads=1, seed=None, enqueue_many=False, shapes=None, name=None)
Creates batches by randomly shuffling tensors. 通過隨機打亂張量的順序創建批次.

簡單來說就是讀取一個文件並且加載一個張量中的batch_size行

This function adds the following to the current Graph:
這個函數將以下內容加入到現有的圖中.

A shuffling queue into which tensors from tensor_list are enqueued.
一個由傳入張量組成的隨機亂序隊列
A dequeue_many operation to create batches from the queue.
從張量隊列中取出張量的出隊操作
A QueueRunner to QUEUE_RUNNER collection, to enqueue the tensors
from tensor_list.
一個隊列運行器管理出隊操作.
If enqueue_many is False, tensor_list is assumed to represent a single example. An input tensor with shape [x, y, z] will be output as a tensor with shape [batch_size, x, y, z].
If enqueue_many is True, tensor_list is assumed to represent a batch of examples, where the first dimension is indexed by example, and all members of tensor_list should have the same size in the first dimension. If an input tensor has shape [*, x, y, z], the output will have shape [batch_size, x, y, z].

enqueue_many主要是設置tensor中的數據是否能重復,如果想要實現同一個樣本多次出現可以將其設置為:"True",如果只想要其出現一次,也就是保持數據的唯一性,這時候我們將其設置為默認值:"False"

The capacity argument controls the how long the prefetching is allowed to grow the queues. capacity控制了預抓取操作對於增加隊列長度操作的長度.
For example:

# Creates batches of 32 images and 32 labels.
image_batch, label_batch = tf.train.shuffle_batch( [single_image, single_label], batch_size=32, num_threads=4,capacity=50000,min_after_dequeue=10000)

這段代碼寫的是從[single_image, single_label]利用4個線程讀取32個數據作為一個batch

Args:

tensor_list: The list of tensors to enqueue.
入隊的張量列表
batch_size: The new batch size pulled from the queue.
表示進行一次批處理的tensors數量.
capacity: An integer. The maximum number of elements in the queue.

容量:一個整數,隊列中的最大的元素數.
這個參數一定要比min_after_dequeue參數的值大,並且決定了我們可以進行預處理操作元素的最大值.
推薦其值為:

\[capacity=(min\_after\_dequeue+(num\_threads+a\ small\ safety\ margin*batch_size) \]

min_after_dequeue: Minimum number elements in the queue after a
dequeue(出列), used to ensure a level of mixing of elements.
當一次出列操作完成后,隊列中元素的最小數量,往往用於定義元素的混合級別.
定義了隨機取樣的緩沖區大小,此參數越大表示更大級別的混合但是會導致啟動更加緩慢,並且會占用更多的內存
num_threads: The number of threads enqueuing tensor_list.
設置num_threads的值大於1,使用多個線程在tensor_list中讀取文件,這樣保證了同一時刻只在一個文件中進行讀取操作(但是讀取速度依然優於單線程),而不是之前的同時讀取多個文件,這種方案的優點是:

避免了兩個不同的線程從同一文件中讀取用一個樣本
避免了過多的磁盤操作

seed: Seed for the random shuffling within the queue.
打亂tensor隊列的隨機數種子
enqueue_many: Whether each tensor in tensor_list is a single example.
定義tensor_list中的tensor是否冗余.
shapes: (Optional) The shapes for each example. Defaults to the
inferred shapes for tensor_list.
用於改變讀取tensor的形狀,默認情況下和直接讀取的tensor的形狀一致.
name: (Optional) A name for the operations.

Returns:

A list of tensors with the same number and types as tensor_list.
默認返回一個和讀取tensor_list數據和類型一個tensor列表.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 tensorflow中 tf.train.slice_input_producer 和 tf.train.batch 函數（轉） tf.train.AdamOptimizer()函數解析 tf.random_shuffle（）函數解析 tf.train.batch的偶爾亂序問題 tf.train.examle函數【轉】tf.train.MonitoredTrainingSession()解析 TensorFlow函數：tf.random_shuffle pytorch之 batch_train tf.slice函數解析 tf.transpose函數解析