部分跟新於:4.24日 torchvision 0.2.2.post3
torchvision是獨立於pytorch的關於圖像操作的一些方便工具庫。
torchvision的詳細介紹在:https://pypi.org/project/torchvision/
torchvision主要包括一下幾個包:
- vision.datasets : 幾個常用視覺數據集,可以下載和加載,這里主要的高級用法就是可以看源碼如何自己寫自己的Dataset的子類
- vision.models : 流行的模型,例如 AlexNet, VGG, ResNet 和 Densenet 以及 與訓練好的參數。
- vision.transforms : 常用的圖像操作,例如:隨機切割,旋轉,數據類型轉換,圖像到tensor ,numpy 數組到tensor , tensor 到 圖像等。
- vision.utils : 用於把形似 (3 x H x W) 的張量保存到硬盤中,給一個mini-batch的圖像可以產生一個圖像格網。
安裝
Anaconda:
conda install torchvision -c pytorch
pip:
pip install torchvision
由於此包是配合pytorch的對於圖像處理來說必不可少的,
對於以后要用的torch來說一站式的anaconda是首選,畢竟人生苦短。
(anaconda + vscode +pytorch 非常好用) 值得推薦!
以下翻譯自 : https://pytorch.org/docs/master/torchvision/
數據集 torchvision.datasets
包括以下數據集:
數據集有 API: - __getitem__ - __len__ 他們都是 torch.utils.data.Dataset的子類。這樣我們在實現我們自己的Dataset數據集的時候至少要實現上邊兩個方法。
因此, 他們可以使用torch.utils.data.DataLoader里的多線程 (python multiprocessing) 。
例如:
torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)
在構造上每個數據集的API有一些輕微的差異,但是都包含以下參數:
- transform - 接受一個圖像返回變換后的圖像的函數
- 常用的操作如 ToTensor, RandomCrop等. 他們可以通過transforms.Compose被組合在一起。 (見以下transforms 章節)
- target_transform - 一個對目標值進行變換的函數。例如,輸入一個圖片描述,返回一個編碼后的張量(a tensor of word indices)。
MNIST
dset.MNIST(root, train=True, transform=None, target_transform=None, download=False)
root:數據的目錄,里邊有 processed/training.pt 和processed/test.pt 的內容
train: True -使用訓練集, False -使用測試集.
transform: 給輸入圖像施加變換
target_transform:給目標值(類別標簽)施加的變換
download: 是否下載mnist數據集
COCO
This requires the COCO API to be installed
Captions:
dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])
Example:
import torchvision.datasets as dset import torchvision.transforms as transforms cap = dset.CocoCaptions(root = 'dir where images are', annFile = 'json annotation file', transform=transforms.ToTensor()) print('Number of samples: ', len(cap)) img, target = cap[3] # load 4th sample print("Image Size: ", img.size()) print(target)
Output:
Number of samples: 82783 Image Size: (3L, 427L, 640L) [u'A plane emitting smoke stream flying over a mountain.', u'A plane darts across a bright blue sky behind a mountain covered in snow', u'A plane leaves a contrail above the snowy mountain top.', u'A mountain that has a plane flying overheard in the distance.', u'A mountain view with a plume of smoke in the background']
Detection:
dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])
LSUN
dset.LSUN(db_path, classes='train', [transform, target_transform])
- db_path = root directory for the database files
- classes =
- 'train' - all categories, training set
- 'val' - all categories, validation set
- 'test' - all categories, test set
- ['bedroom_train', 'church_train', …] : a list of categories to load
CIFAR
dset.CIFAR10(root, train=True, transform=None, target_transform=None, download=False)
dset.CIFAR100(root, train=True, transform=None, target_transform=None, download=False)
- root : root directory of dataset where there is folder cifar-10-batches-py
- train : True = Training set, False = Test set
- download : True = downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, does not do anything.
STL10
dset.STL10(root, split='train', transform=None, target_transform=None, download=False)
-
root : root directory of dataset where there is folder stl10_binary
-
- split : 'train' = Training set, 'test' = Test set, 'unlabeled' = Unlabeled set,
-
'train+unlabeled' = Training + Unlabeled set (missing label marked as -1)
-
- download : True = downloads the dataset from the internet and
-
puts it in root directory. If dataset is already downloaded, does not do anything.
SVHN
dset.SVHN(root, split='train', transform=None, target_transform=None, download=False)
-
root : root directory of dataset where there is folder SVHN
-
split : 'train' = Training set, 'test' = Test set, 'extra' = Extra training set
-
- download : True = downloads the dataset from the internet and
-
puts it in root directory. If dataset is already downloaded, does not do anything.
ImageFolder
一個通用的數據加載器,圖像應該按照以下方式放置:
root/dog/xxx.png root/dog/xxy.png root/dog/xxz.png root/cat/123.png root/cat/nsdf3.png root/cat/asd932_.png
dset.ImageFolder(root="root folder path", [transform, target_transform])
ImageFolder有以下成員:
- self.classes - 類別名列表
- self.class_to_idx - 類別名到標簽,例如 “狗”-->[1,0,0]
- self.imgs - 一個包括 (image path, class-index) 元組的列表。
Imagenet-12
This is simply implemented with an ImageFolder dataset.
The data is preprocessed as described here
PhotoTour
Learning Local Image Descriptors Data http://phototour.cs.washington.edu/patches/default.htm
import torchvision.datasets as dset import torchvision.transforms as transforms dataset = dset.PhotoTour(root = 'dir where images are', name = 'name of the dataset to load', transform=transforms.ToTensor()) print('Loaded PhotoTour: {} with {} images.' .format(dataset.name, len(dataset.data)))
模型
models 子包含了以下的模型框架:
這里對於每種模型里可能包含很多子模型,比如Resnet就有 34,51,101,152不同層數。
這些成熟的模型的意義就是你可以在torchvision的安裝路徑下找到 可以通過命令 print(torchvision.models.__file__) #'d:\\Anaconda3\\lib\\site-packages\\torchvision\\models\\__init__.py'
學習這些優秀的模型是如何搭建的。
你可以用隨機參數初始化一個模型:
import torchvision.models as models resnet18 = models.resnet18() alexnet = models.alexnet() vgg16 = models.vgg16() squeezenet = models.squeezenet1_0()
我們提供了預訓練的ResNet的模型參數,以及 SqueezeNet 1.0 and 1.1, and AlexNet, 使用 PyTorch model zoo. 可以在構造函數里添加 pretrained=True:
import torchvision.models as models resnet18 = models.resnet18(pretrained=True) alexnet = models.alexnet(pretrained=True) squeezenet = models.squeezenet1_0(pretrained=True)
所有的預訓練模型期待輸入同樣標准化的數據,例如mini-baches 包括形似(3*H*W)的3通道的RGB圖像,H,W最少是224。
圖像的范圍必須在[0,1]之間,然后使用 mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225] 進行標准化。
相關的例子在: the imagenet example here <https://github.com/pytorch/examples/blob/42e5b996718797e45c46a25c55b031e6768f8440/imagenet/main.py#L89-L101>
變換
變換(Transforms)是常用的圖像變換。可以通過 transforms.Compose進行連續操作:
transforms.Compose
你可以組合幾個變換在一起,例如:
transform = transforms.Compose([ transforms.RandomSizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ], std = [ 0.229, 0.224, 0.225 ]), ])
PIL.Image支持的變換
Scale(size, interpolation=Image.BILINEAR)
縮放輸入的 PIL.Image到給定的“尺寸”。 ‘尺寸’ 指的是較短邊的尺寸.
例如,如果 height > width, 那么圖像將被縮放為 (size * height / width, size) - size: 圖像較短邊的尺寸- interpolation: Default: PIL.Image.BILINEAR
CenterCrop(size) - 從中間裁剪圖像到指定大小
從中間裁剪一個 PIL.Image 到給定尺寸. 尺寸可以是一個元組 (target_height, target_width) 或一個整數,整數將被認為是正方形的尺寸 (size, size)
RandomCrop(size, padding=0)
Crops the given PIL.Image at a random location to have a region of the given size. size can be a tuple (target_height, target_width) or an integer, in which case the target will be of a square shape (size, size) If padding is non-zero, then the image is first zero-padded on each side with padding pixels.
RandomHorizontalFlip()
隨機進行PIL.Image圖像的水平翻轉,概率是0.5.
RandomSizedCrop(size, interpolation=Image.BILINEAR)
Random crop the given PIL.Image to a random size of (0.08 to 1.0) of the original size and and a random aspect ratio of 3/4 to 4/3 of the original aspect ratio
This is popularly used to train the Inception networks - size: size of the smaller edge - interpolation: Default: PIL.Image.BILINEAR
Pad(padding, fill=0)
Pads the given image on each side with padding number of pixels, and the padding pixels are filled with pixel value fill. If a 5x5 image is padded with padding=1 then it becomes 7x7
對於 torch.*Tensor 的變換
Normalize(mean, std)
Given mean: (R, G, B) and std: (R, G, B), will normalize each channel of the torch.*Tensor, i.e. channel = (channel - mean) / std
轉換變換
- ToTensor() - Converts a PIL.Image (RGB) or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
- ToPILImage() - Converts a torch.*Tensor of range [0, 1] and shape C x H x W or numpy ndarray of dtype=uint8, range[0, 255] and shape H x W x C to a PIL.Image of range [0, 255]
廣義變換
Lambda(lambda)
Given a Python lambda, applies it to the input img and returns it. For example:
transforms.Lambda(lambda x: x.add(10))
便利函數
make_grid(tensor, nrow=8, padding=2, normalize=False, range=None, scale_each=False)
Given a 4D mini-batch Tensor of shape (B x C x H x W), or a list of images all of the same size, makes a grid of images
normalize=True will shift the image to the range (0, 1), by subtracting the minimum and dividing by the maximum pixel value.
if range=(min, max) where min and max are numbers, then these numbers are used to normalize the image.
scale_each=True will scale each image in the batch of images separately rather than computing the (min, max) over all images.
Example usage is given in this notebook <https://gist.github.com/anonymous/bf16430f7750c023141c562f3e9f2a91>
save_image(tensor, filename, nrow=8, padding=2, normalize=False, range=None, scale_each=False)
Saves a given Tensor into an image file.
If given a mini-batch tensor, will save the tensor as a grid of images.
All options after filename are passed through to make_grid. Refer to it’s documentation for more details
用以輸出圖像的拼接,很方便。
沒想到這篇文章閱讀量這么大,考慮跟新下。
圖像引擎:由於需要讀取處理圖片所以需要相關的圖像庫。現在torchvision可以支持多個圖像讀取庫,可以切換。
使用的函數是:
torchvision.
get_image_backend
() #獲取圖像存取引擎
torchvision.
set_image_backend
(backend) #改變圖像讀取引擎
#backend (string) –圖像引擎的名字:是 {‘PIL’, ‘accimage’}其中之一。 accimage
包使用的是因特爾(Intel) IPP 庫。它的速度快於PIL,但是並不支持很多的圖像操作。
由於這個是后邊的,普通用處不大,知道即可。