一文講解TensorFlow數據接口 tf.data.Dataset

本文轉載自查看原文 2020-04-23 00:25 1692

導入數據

X = pd.read_csv('./datasets/housing/housing.csv')
X = X.sample(n=10)
X.drop(columns = X.columns.difference(['longitude']), inplace=True)

為了避免報錯，先進行格式轉換：

X = np.asarray(X).astype(np.float32)

dataset = tf.data.Dataset.from_tensor_slices(X)
for _ in dataset:
    print(_)

tf.Tensor([-118.75], shape=(1,), dtype=float32)
tf.Tensor([-119.25], shape=(1,), dtype=float32)
tf.Tensor([-118.18], shape=(1,), dtype=float32)
tf.Tensor([-118.13], shape=(1,), dtype=float32)
tf.Tensor([-118.2], shape=(1,), dtype=float32)
tf.Tensor([-117.25], shape=(1,), dtype=float32)
tf.Tensor([-117.93], shape=(1,), dtype=float32)
tf.Tensor([-122.96], shape=(1,), dtype=float32)
tf.Tensor([-121.77], shape=(1,), dtype=float32)
tf.Tensor([-121.24], shape=(1,), dtype=float32)

dataset = dataset.repeat(3).batch(10)
for _ in dataset:
    print(_)

圖解：

repeat(3)將數據集重復3次，batch(10)每次輸出一個包括10個元素的batch。

tf.Tensor(
[[-118.75]
 [-119.25]
 [-118.18]
 [-118.13]
 [-118.2 ]
 [-117.25]
 [-117.93]
 [-122.96]
 [-121.77]
 [-121.24]], shape=(10, 1), dtype=float32)
tf.Tensor(
[[-118.75]
 [-119.25]
 [-118.18]
 [-118.13]
 [-118.2 ]
 [-117.25]
 [-117.93]
 [-122.96]
 [-121.77]
 [-121.24]], shape=(10, 1), dtype=float32)
tf.Tensor(
[[-118.75]
 [-119.25]
 [-118.18]
 [-118.13]
 [-118.2 ]
 [-117.25]
 [-117.93]
 [-122.96]
 [-121.77]
 [-121.24]], shape=(10, 1), dtype=float32)

如果不能剛好等分，例如

dataset = dataset.repeat(3).batch(9)
for _ in dataset:
    print(_)

最后一個batch將包含剩下的元素

tf.Tensor(
[[-122.08]
 [-121.37]
 [-118.32]
 [-122.38]
 [-122.09]
 [-122.1 ]
 [-122.27]
 [-121.49]
 [-120.68]], shape=(9, 1), dtype=float64)
tf.Tensor(
[[-118.2 ]
 [-122.08]
 [-121.37]
 [-118.32]
 [-122.38]
 [-122.09]
 [-122.1 ]
 [-122.27]
 [-121.49]], shape=(9, 1), dtype=float64)
tf.Tensor(
[[-120.68]
 [-118.2 ]
 [-122.08]
 [-121.37]
 [-118.32]
 [-122.38]
 [-122.09]
 [-122.1 ]
 [-122.27]], shape=(9, 1), dtype=float64)
tf.Tensor(
[[-121.49]
 [-120.68]
 [-118.2 ]], shape=(3, 1), dtype=float64)

`map`函數

dataset = dataset.map(lambda x: abs(x))
for _ in dataset:
    print(_)

tf.Tensor(
[[118.75]
 [119.25]
 [118.18]
 [118.13]
 [118.2 ]
 [117.25]
 [117.93]
 [122.96]
 [121.77]
 [121.24]], shape=(10, 1), dtype=float32)
tf.Tensor(
[[118.75]
 [119.25]
 [118.18]
 [118.13]
 [118.2 ]
 [117.25]
 [117.93]
 [122.96]
 [121.77]
 [121.24]], shape=(10, 1), dtype=float32)
tf.Tensor(
[[118.75]
 [119.25]
 [118.18]
 [118.13]
 [118.2 ]
 [117.25]
 [117.93]
 [122.96]
 [121.77]
 [121.24]], shape=(10, 1), dtype=float32)

`filter`函數

使用filter函數前需要先unbatch

dataset = dataset.unbatch()
dataset = dataset.filter(lambda x: x < 120)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 tensorflow 數據集對象（tf.data）的使用( tf.data.Dataset 、tf.data.TextLineDataset 、 tf.data.TFRecordDataset ) 示例 tf.data.Dataset類的用法 tf.keras 模型多個輸入 tf.data.Dataset tensorflow讀取數據集生成batch——tf.data.Dataset.from_tensor_slices tensorflow(十七)：數據的加載：map()、shuffle()、tf.data.Dataset.from_tensor_slices() 『TensorFlow』數據讀取類_data.Dataset tensorflow-dataset_tf1.14 tensorFlow2.1下的tf.data.Dataset.from_tensor_slices()的用法 Tensorflow中的數據對象Dataset 一文弄懂數據挖掘的十大算法，數據挖掘算法原理講解

一文講解TensorFlow數據接口 tf.data.Dataset

導入數據

map函數

filter函數

免責聲明！

`map`函數

`filter`函數