《python深度學習》筆記---5.2-2、貓狗分類(圖片數據處理)
一、總結
一句話總結:
【將訓練數據中的貓狗頭像分訓練集、驗證集、測試集分好】:其實就是將訓練數據中的貓狗頭像分訓練集、驗證集、測試集分好,簡單一點來說就是圖片的復制粘貼
1、python的os模塊的路徑拼接和創建目錄?
路徑拼接:os的path的join方法:train_dir = os.path.join(base_dir, 'train')
創建目錄:os的makedir方法:os.mkdir(train_dir)
2、這句話的意思是什么:fnames = ['cat.{}.jpg'.format(i) for i in range(1000)]?
'cat.{}.jpg'.format(i) 當i為0的時候,為cat.0.jpg
3、python將圖片復制到另一個目錄?
shutil的copyfile方法:shutil.copyfile(src, dst)
# 將前 1000 張貓的圖像復制 到 train_cats_dir fnames = ['cat.{}.jpg'.format(i) for i in range(1000)] for fname in fnames: src = os.path.join(original_dataset_dir, fname) dst = os.path.join(train_cats_dir, fname) shutil.copyfile(src, dst) pass # print(fnames)
4、python計算文件夾中文件的數目?
os的listdir方法求長度:print('total training cat images:', len(os.listdir(train_cats_dir)))
二、內容在總結中
轉自或參考:
注意:what is shutil
The shutil module offers a number of high-level operations on files and collections of files. In particular, functions are provided which support file copying and removal. For operations on individual files, see also the os module.
1、將圖像復制到訓練、驗證和測試的目錄
In [5]:
# 保存較小數據集的目錄
base_dir = 'E:\\78_recorded_lesson\\001_course_github\\AI_dataSet\\dogs-vs-cats\\cats_and_dogs_small' os.mkdir(base_dir)
In [6]:
# 分別對應划分后的訓練、驗證和測試的目錄
train_dir = os.path.join(base_dir, 'train') os.mkdir(train_dir) validation_dir = os.path.join(base_dir, 'validation') os.mkdir(validation_dir) test_dir = os.path.join(base_dir, 'test') os.mkdir(test_dir)
In [7]:
# 貓的訓練圖像目錄
train_cats_dir = os.path.join(train_dir, 'cats') os.mkdir(train_cats_dir)
In [8]:
# 狗的訓練圖像目錄
train_dogs_dir = os.path.join(train_dir, 'dogs') os.mkdir(train_dogs_dir)
In [9]:
# 貓的驗證圖像目錄
validation_cats_dir = os.path.join(validation_dir, 'cats') os.mkdir(validation_cats_dir) # 狗的驗證圖像目錄 validation_dogs_dir = os.path.join(validation_dir, 'dogs') os.mkdir(validation_dogs_dir)
In [10]:
# 貓的測試圖像目錄
test_cats_dir = os.path.join(test_dir, 'cats') os.mkdir(test_cats_dir) # 狗的測試圖像目錄 test_dogs_dir = os.path.join(test_dir, 'dogs') os.mkdir(test_dogs_dir)
注意:文件復制操作
=================================
'cat.{}.jpg'.format(i) 當i為0的時候,為cat.0.jpg
In [13]:
# 將前 1000 張貓的圖像復制 到 train_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(1000)] for fname in fnames: src = os.path.join(original_dataset_dir, fname) dst = os.path.join(train_cats_dir, fname) shutil.copyfile(src, dst) pass # print(fnames)
In [14]:
# 將接下來500 張貓的圖像復 制到 validation_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(1000, 1500)] for fname in fnames: src = os.path.join(original_dataset_dir, fname) dst = os.path.join(validation_cats_dir, fname) shutil.copyfile(src, dst)
In [12]:
import os, shutil # 原始數據集解壓目錄的路徑 original_dataset_dir = 'E:\\78_recorded_lesson\\001_course_github\\AI_dataSet\\dogs-vs-cats\\kaggle_original_data\\train'
In [15]:
# 將接下來的500 張貓的圖像 復制到 test_cats_dir
fnames = ['cat.{}.jpg'.format(i) for i in range(1500, 2000)] for fname in fnames: src = os.path.join(original_dataset_dir, fname) dst = os.path.join(test_cats_dir, fname) shutil.copyfile(src, dst)
In [16]:
# 將前 1000 張狗的圖像復制 到 train_dogs_dir
fnames = ['dog.{}.jpg'.format(i) for i in range(1000)] for fname in fnames: src = os.path.join(original_dataset_dir, fname) dst = os.path.join(train_dogs_dir, fname) shutil.copyfile(src, dst) pass # 將接下來500 張狗的圖像復 制到 validation_dogs_dir fnames = ['dog.{}.jpg'.format(i) for i in range(1000, 1500)] for fname in fnames: src = os.path.join(original_dataset_dir, fname) dst = os.path.join(validation_dogs_dir, fname) shutil.copyfile(src, dst) # 將接下來500 張狗的圖像復 制到 test_dogs_dir fnames = ['dog.{}.jpg'.format(i) for i in range(1500, 2000)] for fname in fnames: src = os.path.join(original_dataset_dir, fname) dst = os.path.join(test_dogs_dir, fname) shutil.copyfile(src, dst)
注意:驗證圖像
In [17]:
# 我們來檢查一下,看看每個分組(訓練 / 驗證 / 測試)中分別包含多少張圖像。
print('total training cat images:', len(os.listdir(train_cats_dir))) print('total training dog images:', len(os.listdir(train_dogs_dir))) print('total validation cat images:', len(os.listdir(validation_cats_dir))) print('total validation dog images:', len(os.listdir(validation_dogs_dir))) print('total test cat images:', len(os.listdir(test_cats_dir))) print('total test dog images:', len(os.listdir(test_dogs_dir)))