一文講解TensorFlow數據接口 tf.data.Dataset


導入數據

X = pd.read_csv('./datasets/housing/housing.csv')
X = X.sample(n=10)
X.drop(columns = X.columns.difference(['longitude']), inplace=True)

為了避免報錯,先進行格式轉換:

X = np.asarray(X).astype(np.float32)
dataset = tf.data.Dataset.from_tensor_slices(X)
for _ in dataset:
    print(_)
tf.Tensor([-118.75], shape=(1,), dtype=float32)
tf.Tensor([-119.25], shape=(1,), dtype=float32)
tf.Tensor([-118.18], shape=(1,), dtype=float32)
tf.Tensor([-118.13], shape=(1,), dtype=float32)
tf.Tensor([-118.2], shape=(1,), dtype=float32)
tf.Tensor([-117.25], shape=(1,), dtype=float32)
tf.Tensor([-117.93], shape=(1,), dtype=float32)
tf.Tensor([-122.96], shape=(1,), dtype=float32)
tf.Tensor([-121.77], shape=(1,), dtype=float32)
tf.Tensor([-121.24], shape=(1,), dtype=float32)
dataset = dataset.repeat(3).batch(10)
for _ in dataset:
    print(_)

圖解:

repeat(3)將數據集重復3次,batch(10)每次輸出一個包括10個元素的batch。

tf.Tensor(
[[-118.75]
 [-119.25]
 [-118.18]
 [-118.13]
 [-118.2 ]
 [-117.25]
 [-117.93]
 [-122.96]
 [-121.77]
 [-121.24]], shape=(10, 1), dtype=float32)
tf.Tensor(
[[-118.75]
 [-119.25]
 [-118.18]
 [-118.13]
 [-118.2 ]
 [-117.25]
 [-117.93]
 [-122.96]
 [-121.77]
 [-121.24]], shape=(10, 1), dtype=float32)
tf.Tensor(
[[-118.75]
 [-119.25]
 [-118.18]
 [-118.13]
 [-118.2 ]
 [-117.25]
 [-117.93]
 [-122.96]
 [-121.77]
 [-121.24]], shape=(10, 1), dtype=float32)

如果不能剛好等分,例如

dataset = dataset.repeat(3).batch(9)
for _ in dataset:
    print(_)

最后一個batch將包含剩下的元素

tf.Tensor(
[[-122.08]
 [-121.37]
 [-118.32]
 [-122.38]
 [-122.09]
 [-122.1 ]
 [-122.27]
 [-121.49]
 [-120.68]], shape=(9, 1), dtype=float64)
tf.Tensor(
[[-118.2 ]
 [-122.08]
 [-121.37]
 [-118.32]
 [-122.38]
 [-122.09]
 [-122.1 ]
 [-122.27]
 [-121.49]], shape=(9, 1), dtype=float64)
tf.Tensor(
[[-120.68]
 [-118.2 ]
 [-122.08]
 [-121.37]
 [-118.32]
 [-122.38]
 [-122.09]
 [-122.1 ]
 [-122.27]], shape=(9, 1), dtype=float64)
tf.Tensor(
[[-121.49]
 [-120.68]
 [-118.2 ]], shape=(3, 1), dtype=float64)

map函數

dataset = dataset.map(lambda x: abs(x))
for _ in dataset:
    print(_)
tf.Tensor(
[[118.75]
 [119.25]
 [118.18]
 [118.13]
 [118.2 ]
 [117.25]
 [117.93]
 [122.96]
 [121.77]
 [121.24]], shape=(10, 1), dtype=float32)
tf.Tensor(
[[118.75]
 [119.25]
 [118.18]
 [118.13]
 [118.2 ]
 [117.25]
 [117.93]
 [122.96]
 [121.77]
 [121.24]], shape=(10, 1), dtype=float32)
tf.Tensor(
[[118.75]
 [119.25]
 [118.18]
 [118.13]
 [118.2 ]
 [117.25]
 [117.93]
 [122.96]
 [121.77]
 [121.24]], shape=(10, 1), dtype=float32)

filter函數

使用filter函數前需要先unbatch

dataset = dataset.unbatch()
dataset = dataset.filter(lambda x: x < 120)


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM