TensorFlow2.0教程12:回歸問題


  在回歸問題中,我們的目標是預測連續值的輸出,如價格或概率。

  我們采用了經典的Auto MPG數據集,並建立了一個模型來預測20世紀70年代末和80年代初汽車的燃油效率。 為此,我們將為該模型提供該時段內許多汽車的描述。 此描述包括以下屬性:氣缸,排量,馬力和重量。

  1.Auto MPG數據集

  獲取數據

  dataset_path = keras.utils.get_file('auto-mpg.data',

  'https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data')

  print(dataset_path)

  Downloading data from https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data

  32768/30286 [================================] - 1s 25us/step

  /home/czy/.keras/datasets/auto-mpg.data

  使用pandas讀取數據

  column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',

  'Acceleration', 'Model Year', 'Origin']

  raw_dataset = pd.read_csv(dataset_path, names=column_names,

  na_values='?', comment='\t',

  sep=' ', skipinitialspace=True)

  dataset = raw_dataset.copy()

  dataset.tail()

  MPG  Cylinders  Displacement  Horsepower  Weight  Acceleration  Model Year  Origin

  393  27.0  4  140.0  86.0  2790.0  15.6  82  1

  394  44.0  4  97.0  52.0  2130.0  24.6  82  2

  395  32.0  4  135.0  84.0  2295.0  11.6  82  1

  396  28.0  4  120.0  79.0  2625.0  18.6  82  1

  397  31.0  4  119.0  82.0  2720.0  19.4  82  1

  2.數據預處理

  清洗數據

  print(dataset.isna().sum())

  dataset = dataset.dropna()

  origin = dataset.pop('Origin')

  dataset['USA'] = (origin == 1)*1.0

  dataset['Europe'] = (origin == 2)*1.0

  dataset['Japan'] = (origin == 3)*1.0

  dataset.tail()

  MPG 0

  Cylinders 0

  Displacement 0

  Horsepower 6

  Weight 0

  Acceleration 0

  Model Year 0

  Origin 0

  dtype: int64

  MPG  Cylinders  Displacement  Horsepower  Weight  Acceleration  Model Year  USA  Europe  Japan

  393  27.0  4  140.0  86.0  2790.0  15.6  82  1.0  0.0  0.0

  394  44.0  4  97.0  52.0  2130.0  24.6  82  0.0  1.0  0.0

  395  32.0  4  135.0  84.0  2295.0  11.6  82  1.0  0.0  0.0

  396  28.0  4  120.0  79.0  2625.0  18.6  82  1.0  0.0  0.0

  397  31.0  4  119.0  82.0  2720.0  19.4  82  1.0  0.0  0.0

  划分訓練集和測試集

  train_dataset = dataset.sample(frac=0.8,random_state=0)

  test_dataset = dataset.drop(train_dataset.index)

  檢測數據

  觀察訓練集中幾對列的聯合分布。

  sns.pairplot(train_dataset[["MPG", "Cylinders", "Displacement", "Weight"]], diag_kind="kde")

  

在這里插入圖片描述

 

  整體統計數據:

  train_stats = train_dataset.describe()

  train_stats.pop("MPG")

  train_stats = train_stats.transpose()

  train_stats

  count  mean  std  min  25%  50%  75%  max

  Cylinders  314.0  5.477707  1.699788  3.0  4.00  4.0  8.00  8.0

  Displacement  314.0  195.318471  104.331589  68.0  105.50  151.0  265.75  455.0

  Horsepower  314.0  104.869427  38.096214  46.0  76.25  94.5  128.00  225.0

  Weight  314.0  2990.251592  843.898596  1649.0  2256.50  2822.5  3608.00  5140.0

  Acceleration  314.0  15.559236  2.789230  8.0  13.80  15.5  17.20  24.8

  Model Year  314.0  75.898089  3.675642  70.0  73.00  76.0  79.00  82.0

  USA  314.0  0.624204  0.485101  0.0  0.00  1.0  1.00  1.0

  Europe  314.0  0.178344  0.383413  0.0  0.00  0.0  0.00  1.0

  Japan  314.0  0.197452  0.398712  0.0  0.00  0.0  0.00  1.0

  取出標簽

  train_labels = train_dataset.pop('MPG')

  test_labels = test_dataset.pop('MPG')

  標准化數據

  最好使用不同比例和范圍的特征進行標准化。 雖然模型可能在沒有特征歸一化的情況下收斂,但它使訓練更加困難,並且它使得結果模型依賴於輸入中使用的單位的選擇。

  def norm(x):

  return (x - train_stats['mean']) / train_stats['std']

  normed_train_data = norm(train_dataset)

  normed_test_data = norm(test_dataset)

  3.構建模型

  def build_model():

  model = keras.Sequential([

  layers.Dense(64, activation='relu', input_shape=[len(train_dataset.keys())]),

  layers.Dense(64, activation='relu'),

  layers.Dense(1)

  ])

  optimizer = tf.keras.optimizers.RMSprop(0.001)

  model.compile(loss='mse',

  optimizer=optimizer,

  metrics=['mae', 'mse'])

  return model

  model = build_model()

  model.summary()

  Model: "sequential"

  _________________________________________________________________

  Layer (type) Output Shape Param #

  =================================================================

  dense (Dense) (None, 64) 640

  _________________________________________________________________

  dense_1 (Dense) (None, 64) 4160

  _________________________________________________________________

  dense_2 (Dense) (None, 1) 65

  =================================================================

  Total params: 4,865

  Trainable params: 4,865

  Non-trainable params: 0

  _________________________________________________________________

  example_batch = normed_train_data[:10]

  example_result = model.predict(example_batch)

  example_result

  array([[0.18062565],

  [0.1714489 ],

  [0.22555563],

  [0.29366603],

  [0.69764495],

  [0.08851457],

  [0.6851174 ],

  [0.32245407],

  [0.02959149],

  [0.38945067]], dtype=float32)

  4.訓練模型

  class PrintDot(keras.callbacks.Callback):

  def on_epoch_end(self, epoch, logs):

  if epoch % 100 == 0: print('')

  print('.', end='')

  EPOCHS = 1000

  history = model.fit(

  normed_train_data, train_labels,

  epochs=EPOCHS, validation_split = 0.2, verbose=0,

  callbacks=[PrintDot()])

  ....................................................................................................

  ....................................................................................................

  ....................................................................................................

  ....................................................................................................

  ....................................................................................................

  ....................................................................................................

  ....................................................................................................

  ....................................................................................................

  ....................................................................................................

  ....................................................................................................

  查看訓練記錄無錫婦科醫院哪家好 http://www.ytsgfk120.com/

  hist = pd.DataFrame(history.history)

  hist['epoch'] = history.epoch

  hist.tail()

  loss  mae  mse  val_loss  val_mae  val_mse  epoch

  995  2.191127  0.940755  2.191127  10.422818  2.594117  10.422818  995

  996  2.113679  0.903680  2.113679  10.723925  2.631320  10.723926  996

  997  2.517261  0.989557  2.517261  9.497868  2.379198  9.497869  997

  998  2.250272  0.931618  2.250272  11.017041  2.658538  11.017041  998

  999  1.976393  0.853547  1.976393  9.890977  2.491739  9.890977  999

  def plot_history(history):

  hist = pd.DataFrame(history.history)

  hist['epoch'] = history.epoch

  plt.figure()

  plt.xlabel('Epoch')

  plt.ylabel('Mean Abs Error [MPG]')

  plt.plot(hist['epoch'], hist['mae'],

  label='Train Error')

  plt.plot(hist['epoch'], hist['val_mae'],

  label = 'Val Error')

  plt.ylim([0,5])

  plt.legend()

  plt.figure()

  plt.xlabel('Epoch')

  plt.ylabel('Mean Square Error [$MPG^2$]')

  plt.plot(hist['epoch'], hist['mse'],

  label='Train Error')

  plt.plot(hist['epoch'], hist['val_mse'],

  label = 'Val Error')

  plt.ylim([0,20])

  plt.legend()

  plt.show()

  plot_history(history)

  使用early stop

  model = build_model()

  early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)

  history = model.fit(normed_train_data, train_labels, epochs=EPOCHS,

  validation_split = 0.2, verbose=0, callbacks=[early_stop, PrintDot()])

  plot_history(history)

  .........................................................

  測試

  loss, mae, mse = model.evaluate(normed_test_data, test_labels, verbose=0)

  print("Testing set Mean Abs Error: {:5.2f} MPG".format(mae))

  Testing set Mean Abs Error: 1.85 MPG

  5.預測

  test_predictions = model.predict(normed_test_data).flatten()

  plt.scatter(test_labels, test_predictions)

  plt.xlabel('True Values [MPG]')

  plt.ylabel('Predictions [MPG]')

  plt.axis('equal')

  plt.axis('square')

  plt.xlim([0,plt.xlim()[1]])

  plt.ylim([0,plt.ylim()[1]])

  _ = plt.plot([-100, 100], [-100, 100])

  error = test_predictions - test_labels

  plt.hist(error, bins = 25)

  plt.xlabel("Prediction Error [MPG]")

  _ = plt.ylabel("Count")


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM