torchvision庫簡介（翻譯）

本文轉載自查看原文 2018-10-11 16:37 30348 pytorch

部分跟新於：4.24日 torchvision 0.2.2.post3

torchvision是獨立於pytorch的關於圖像操作的一些方便工具庫。

torchvision的詳細介紹在：https://pypi.org/project/torchvision/

torchvision主要包括一下幾個包：

vision.datasets : 幾個常用視覺數據集，可以下載和加載，這里主要的高級用法就是可以看源碼如何自己寫自己的Dataset的子類
vision.models : 流行的模型，例如 AlexNet, VGG, ResNet 和 Densenet 以及與訓練好的參數。
vision.transforms : 常用的圖像操作，例如：隨機切割，旋轉，數據類型轉換，圖像到tensor ,numpy 數組到tensor , tensor 到圖像等。
vision.utils : 用於把形似 (3 x H x W) 的張量保存到硬盤中，給一個mini-batch的圖像可以產生一個圖像格網。

安裝

Anaconda:

conda install torchvision -c pytorch

pip:

pip install torchvision

由於此包是配合pytorch的對於圖像處理來說必不可少的，
對於以后要用的torch來說一站式的anaconda是首選，畢竟人生苦短。
(anaconda + vscode +pytorch 非常好用） 值得推薦！

以下翻譯自： https://pytorch.org/docs/master/torchvision/

數據集 torchvision.datasets

包括以下數據集:

數據集有 API: - __getitem__ - __len__ 他們都是 torch.utils.data.Dataset的子類。這樣我們在實現我們自己的Dataset數據集的時候至少要實現上邊兩個方法。

因此， 他們可以使用torch.utils.data.DataLoader里的多線程 (python multiprocessing) 。

例如:

torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)

在構造上每個數據集的API有一些輕微的差異，但是都包含以下參數：

transform - 接受一個圖像返回變換后的圖像的函數
常用的操作如 ToTensor, RandomCrop等. 他們可以通過transforms.Compose被組合在一起。 (見以下transforms 章節)
target_transform - 一個對目標值進行變換的函數。例如，輸入一個圖片描述，返回一個編碼后的張量（a tensor of word indices）。

每個數據集都有類似參數，所以很容易通過一個掌握其他全部。

MNIST

dset.MNIST(root, train=True, transform=None, target_transform=None, download=False)

root:數據的目錄，里邊有 processed/training.pt 和processed/test.pt 的內容

train: True -使用訓練集, False -使用測試集.

transform: 給輸入圖像施加變換

target_transform:給目標值(類別標簽)施加的變換

download: 是否下載mnist數據集

COCO

This requires the COCO API to be installed

Captions:

dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])

Example:

import torchvision.datasets as dset import torchvision.transforms as transforms cap = dset.CocoCaptions(root = 'dir where images are', annFile = 'json annotation file', transform=transforms.ToTensor()) print('Number of samples: ', len(cap)) img, target = cap[3] # load 4th sample print("Image Size: ", img.size()) print(target)

Output:

Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.',
u'A plane darts across a bright blue sky behind a mountain covered in snow',
u'A plane leaves a contrail above the snowy mountain top.',
u'A mountain that has a plane flying overheard in the distance.',
u'A mountain view with a plume of smoke in the background']

Detection:

dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])

LSUN

dset.LSUN(db_path, classes='train', [transform, target_transform])

db_path = root directory for the database files
classes =
'train' - all categories, training set
'val' - all categories, validation set
'test' - all categories, test set
['bedroom_train', 'church_train', …] : a list of categories to load

CIFAR

dset.CIFAR10(root, train=True, transform=None, target_transform=None, download=False)

dset.CIFAR100(root, train=True, transform=None, target_transform=None, download=False)

root : root directory of dataset where there is folder cifar-10-batches-py
train : True = Training set, False = Test set
download : True = downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, does not do anything.

STL10

dset.STL10(root, split='train', transform=None, target_transform=None, download=False)

root : root directory of dataset where there is folder stl10_binary
split : 'train' = Training set, 'test' = Test set, 'unlabeled' = Unlabeled set,

'train+unlabeled' = Training + Unlabeled set (missing label marked as -1)
download : True = downloads the dataset from the internet and

puts it in root directory. If dataset is already downloaded, does not do anything.

SVHN

dset.SVHN(root, split='train', transform=None, target_transform=None, download=False)

root : root directory of dataset where there is folder SVHN
split : 'train' = Training set, 'test' = Test set, 'extra' = Extra training set
download : True = downloads the dataset from the internet and

puts it in root directory. If dataset is already downloaded, does not do anything.

ImageFolder

一個通用的數據加載器，圖像應該按照以下方式放置：

root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png

root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png

dset.ImageFolder(root="root folder path", [transform, target_transform])

ImageFolder有以下成員:

self.classes - 類別名列表
self.class_to_idx - 類別名到標簽，例如 “狗”-->[1,0,0]
self.imgs - 一個包括 (image path, class-index) 元組的列表。

Imagenet-12

This is simply implemented with an ImageFolder dataset.

The data is preprocessed as described here

Here is an example.

PhotoTour

Learning Local Image Descriptors Data http://phototour.cs.washington.edu/patches/default.htm

import torchvision.datasets as dset import torchvision.transforms as transforms dataset = dset.PhotoTour(root = 'dir where images are', name = 'name of the dataset to load', transform=transforms.ToTensor()) print('Loaded PhotoTour: {} with {} images.' .format(dataset.name, len(dataset.data)))

模型

models 子包含了以下的模型框架：

這里對於每種模型里可能包含很多子模型，比如Resnet就有 34，51，101，152不同層數。

這些成熟的模型的意義就是你可以在torchvision的安裝路徑下找到可以通過命令 print(torchvision.models.__file__) #'d:\\Anaconda3\\lib\\site-packages\\torchvision\\models\\__init__.py'

學習這些優秀的模型是如何搭建的。

你可以用隨機參數初始化一個模型：

import torchvision.models as models resnet18 = models.resnet18() alexnet = models.alexnet() vgg16 = models.vgg16() squeezenet = models.squeezenet1_0()

我們提供了預訓練的ResNet的模型參數，以及 SqueezeNet 1.0 and 1.1, and AlexNet, 使用 PyTorch model zoo. 可以在構造函數里添加 pretrained=True:

import torchvision.models as models resnet18 = models.resnet18(pretrained=True) alexnet = models.alexnet(pretrained=True) squeezenet = models.squeezenet1_0(pretrained=True)

所有的預訓練模型期待輸入同樣標准化的數據，例如mini-baches 包括形似(3*H*W)的3通道的RGB圖像，H,W最少是224。

圖像的范圍必須在[0,1]之間，然后使用 mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225] 進行標准化。

相關的例子在： the imagenet example here <https://github.com/pytorch/examples/blob/42e5b996718797e45c46a25c55b031e6768f8440/imagenet/main.py#L89-L101>

變換

變換（Transforms）是常用的圖像變換。可以通過 transforms.Compose進行連續操作：

`transforms.Compose`

你可以組合幾個變換在一起，例如：

transform = transforms.Compose([ transforms.RandomSizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ], std = [ 0.229, 0.224, 0.225 ]), ])

PIL.Image支持的變換

`Scale(size, interpolation=Image.BILINEAR)`

縮放輸入的 PIL.Image到給定的“尺寸”。 ‘尺寸’ 指的是較短邊的尺寸.

例如,如果 height > width, 那么圖像將被縮放為 (size * height / width, size) - size: 圖像較短邊的尺寸- interpolation: Default: PIL.Image.BILINEAR

`CenterCrop(size)` - 從中間裁剪圖像到指定大小

從中間裁剪一個 PIL.Image 到給定尺寸. 尺寸可以是一個元組 (target_height, target_width) 或一個整數，整數將被認為是正方形的尺寸 (size, size)

`RandomCrop(size, padding=0)`

Crops the given PIL.Image at a random location to have a region of the given size. size can be a tuple (target_height, target_width) or an integer, in which case the target will be of a square shape (size, size) If padding is non-zero, then the image is first zero-padded on each side with padding pixels.

`RandomHorizontalFlip()`

隨機進行PIL.Image圖像的水平翻轉，概率是0.5.

`RandomSizedCrop(size, interpolation=Image.BILINEAR)`

Random crop the given PIL.Image to a random size of (0.08 to 1.0) of the original size and and a random aspect ratio of 3/4 to 4/3 of the original aspect ratio

This is popularly used to train the Inception networks - size: size of the smaller edge - interpolation: Default: PIL.Image.BILINEAR

`Pad(padding, fill=0)`

Pads the given image on each side with padding number of pixels, and the padding pixels are filled with pixel value fill. If a 5x5 image is padded with padding=1 then it becomes 7x7

對於 torch.*Tensor 的變換

`Normalize(mean, std)`

Given mean: (R, G, B) and std: (R, G, B), will normalize each channel of the torch.*Tensor, i.e. channel = (channel - mean) / std

轉換變換

ToTensor() - Converts a PIL.Image (RGB) or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
ToPILImage() - Converts a torch.*Tensor of range [0, 1] and shape C x H x W or numpy ndarray of dtype=uint8, range[0, 255] and shape H x W x C to a PIL.Image of range [0, 255]

廣義變換

`Lambda(lambda)`

Given a Python lambda, applies it to the input img and returns it. For example:

transforms.Lambda(lambda x: x.add(10))

便利函數

make_grid(tensor, nrow=8, padding=2, normalize=False, range=None, scale_each=False)

Given a 4D mini-batch Tensor of shape (B x C x H x W), or a list of images all of the same size, makes a grid of images

normalize=True will shift the image to the range (0, 1), by subtracting the minimum and dividing by the maximum pixel value.

if range=(min, max) where min and max are numbers, then these numbers are used to normalize the image.

scale_each=True will scale each image in the batch of images separately rather than computing the (min, max) over all images.

Example usage is given in this notebook <https://gist.github.com/anonymous/bf16430f7750c023141c562f3e9f2a91>

save_image(tensor, filename, nrow=8, padding=2, normalize=False, range=None, scale_each=False)

Saves a given Tensor into an image file.

If given a mini-batch tensor, will save the tensor as a grid of images.

All options after filename are passed through to make_grid. Refer to it’s documentation for more details

用以輸出圖像的拼接，很方便。

沒想到這篇文章閱讀量這么大，考慮跟新下。

圖像引擎：由於需要讀取處理圖片所以需要相關的圖像庫。現在torchvision可以支持多個圖像讀取庫，可以切換。

使用的函數是：

torchvision.get_image_backend() #獲取圖像存取引擎

torchvision.set_image_backend(backend) #改變圖像讀取引擎

#backend (string) –圖像引擎的名字：是 {‘PIL’, ‘accimage’}其中之一。 accimage 包使用的是因特爾(Intel) IPP 庫。它的速度快於PIL,但是並不支持很多的圖像操作。

由於這個是后邊的，普通用處不大，知道即可。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 vue-cli簡介（中文翻譯）【Tomcat 6.0官方文檔翻譯】—— 簡介 torchvision 作用 torchvision報錯 [翻譯]WPF控件庫 MaterialDesignInXamlToolkit (1) 谷歌翻譯python Googletrans庫 pandas庫簡介 libcurl庫的簡介（一） JS 的加密庫簡介 BerkeleyDB庫簡介