torchvision庫簡介(翻譯)


部分跟新於:4.24日    torchvision 0.2.2.post3

torchvision是獨立於pytorch的關於圖像操作的一些方便工具庫。

torchvision的詳細介紹在:https://pypi.org/project/torchvision/

torchvision主要包括一下幾個包:

  • vision.datasets : 幾個常用視覺數據集,可以下載和加載,這里主要的高級用法就是可以看源碼如何自己寫自己的Dataset的子類
  • vision.models : 流行的模型,例如 AlexNet, VGG, ResNet 和 Densenet 以及 與訓練好的參數。
  • vision.transforms : 常用的圖像操作,例如:隨機切割,旋轉,數據類型轉換,圖像到tensor ,numpy 數組到tensor , tensor 到 圖像等。
  • vision.utils : 用於把形似 (3 x H x W) 的張量保存到硬盤中,給一個mini-batch的圖像可以產生一個圖像格網。

安裝

Anaconda:

conda install torchvision -c pytorch

pip:

pip install torchvision

由於此包是配合pytorch的對於圖像處理來說必不可少的,
對於以后要用的torch來說一站式的anaconda是首選,畢竟人生苦短。
(anaconda + vscode +pytorch 非常好用) 值得推薦!


以下翻譯自 : https://pytorch.org/docs/master/torchvision/

數據集 torchvision.datasets

包括以下數據集:

數據集有 API: - __getitem__ - __len__ 他們都是 torch.utils.data.Dataset的子類。這樣我們在實現我們自己的Dataset數據集的時候至少要實現上邊兩個方法。

因此, 他們可以使用torch.utils.data.DataLoader里的多線程 (python multiprocessing) 。

例如:

torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)

在構造上每個數據集的API有一些輕微的差異,但是都包含以下參數:

  • transform - 接受一個圖像返回變換后的圖像的函數
  • 常用的操作如 ToTensor, RandomCrop等. 他們可以通過transforms.Compose被組合在一起。 (見以下transforms 章節)
  • target_transform - 一個對目標值進行變換的函數。例如,輸入一個圖片描述,返回一個編碼后的張量(a tensor of word indices)。
每個數據集都有類似參數,所以很容易通過一個掌握其他全部。

MNIST

dset.MNIST(root, train=True, transform=None, target_transform=None, download=False)

root:數據的目錄,里邊有 processed/training.pt 和processed/test.pt 的內容

train: True -使用訓練集, False -使用測試集.

transform: 給輸入圖像施加變換

target_transform:給目標值(類別標簽)施加的變換

download: 是否下載mnist數據集

COCO

This requires the COCO API to be installed

Captions:

dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])

Example:

import torchvision.datasets as dset import torchvision.transforms as transforms cap = dset.CocoCaptions(root = 'dir where images are', annFile = 'json annotation file', transform=transforms.ToTensor()) print('Number of samples: ', len(cap)) img, target = cap[3] # load 4th sample print("Image Size: ", img.size()) print(target) 

Output:

Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.',
u'A plane darts across a bright blue sky behind a mountain covered in snow',
u'A plane leaves a contrail above the snowy mountain top.',
u'A mountain that has a plane flying overheard in the distance.',
u'A mountain view with a plume of smoke in the background']

Detection:

dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])

LSUN

dset.LSUN(db_path, classes='train', [transform, target_transform])

  • db_path = root directory for the database files
  • classes =
  • 'train' - all categories, training set
  • 'val' - all categories, validation set
  • 'test' - all categories, test set
  • ['bedroom_train', 'church_train', …] : a list of categories to load

CIFAR

dset.CIFAR10(root, train=True, transform=None, target_transform=None, download=False)

dset.CIFAR100(root, train=True, transform=None, target_transform=None, download=False)

  • root : root directory of dataset where there is folder cifar-10-batches-py
  • train : True = Training set, False = Test set
  • download : True = downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, does not do anything.

STL10

dset.STL10(root, split='train', transform=None, target_transform=None, download=False)

  • root : root directory of dataset where there is folder stl10_binary

  • split : 'train' = Training set, 'test' = Test set, 'unlabeled' = Unlabeled set,

    'train+unlabeled' = Training + Unlabeled set (missing label marked as -1)

  • download : True = downloads the dataset from the internet and

    puts it in root directory. If dataset is already downloaded, does not do anything.

SVHN

dset.SVHN(root, split='train', transform=None, target_transform=None, download=False)

  • root : root directory of dataset where there is folder SVHN

  • split : 'train' = Training set, 'test' = Test set, 'extra' = Extra training set

  • download : True = downloads the dataset from the internet and

    puts it in root directory. If dataset is already downloaded, does not do anything.

ImageFolder

一個通用的數據加載器,圖像應該按照以下方式放置:

root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png

root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png

dset.ImageFolder(root="root folder path", [transform, target_transform])

ImageFolder有以下成員:

  • self.classes - 類別名列表
  • self.class_to_idx - 類別名到標簽,例如 “狗”-->[1,0,0]
  • self.imgs - 一個包括 (image path, class-index) 元組的列表。

Imagenet-12

This is simply implemented with an ImageFolder dataset.

The data is preprocessed as described here

Here is an example.

PhotoTour

Learning Local Image Descriptors Data http://phototour.cs.washington.edu/patches/default.htm

import torchvision.datasets as dset import torchvision.transforms as transforms dataset = dset.PhotoTour(root = 'dir where images are', name = 'name of the dataset to load', transform=transforms.ToTensor()) print('Loaded PhotoTour: {} with {} images.' .format(dataset.name, len(dataset.data))) 

模型

models 子包含了以下的模型框架:

這里對於每種模型里可能包含很多子模型,比如Resnet就有 34,51,101,152不同層數。

這些成熟的模型的意義就是你可以在torchvision的安裝路徑下找到 可以通過命令 print(torchvision.models.__file__)   #'d:\\Anaconda3\\lib\\site-packages\\torchvision\\models\\__init__.py'

學習這些優秀的模型是如何搭建的。

你可以用隨機參數初始化一個模型:

import torchvision.models as models resnet18 = models.resnet18() alexnet = models.alexnet() vgg16 = models.vgg16() squeezenet = models.squeezenet1_0() 

我們提供了預訓練的ResNet的模型參數,以及 SqueezeNet 1.0 and 1.1, and AlexNet, 使用 PyTorch model zoo. 可以在構造函數里添加 pretrained=True:

import torchvision.models as models resnet18 = models.resnet18(pretrained=True) alexnet = models.alexnet(pretrained=True) squeezenet = models.squeezenet1_0(pretrained=True) 

所有的預訓練模型期待輸入同樣標准化的數據,例如mini-baches 包括形似(3*H*W)的3通道的RGB圖像,H,W最少是224。

圖像的范圍必須在[0,1]之間,然后使用 mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225]  進行標准化。

相關的例子在: the imagenet example here <https://github.com/pytorch/examples/blob/42e5b996718797e45c46a25c55b031e6768f8440/imagenet/main.py#L89-L101>

變換

變換(Transforms)是常用的圖像變換。可以通過 transforms.Compose進行連續操作:

transforms.Compose

你可以組合幾個變換在一起,例如:

transform = transforms.Compose([ transforms.RandomSizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ], std = [ 0.229, 0.224, 0.225 ]), ]) 

PIL.Image支持的變換

Scale(size, interpolation=Image.BILINEAR)

縮放輸入的 PIL.Image到給定的“尺寸”。 ‘尺寸’ 指的是較短邊的尺寸.

例如,如果 height > width, 那么圖像將被縮放為 (size * height / width, size) - size: 圖像較短邊的尺寸- interpolation: Default: PIL.Image.BILINEAR

CenterCrop(size) - 從中間裁剪圖像到指定大小

從中間裁剪一個 PIL.Image 到給定尺寸. 尺寸可以是一個元組 (target_height, target_width) 或一個整數,整數將被認為是正方形的尺寸 (size, size)

RandomCrop(size, padding=0)

Crops the given PIL.Image at a random location to have a region of the given size. size can be a tuple (target_height, target_width) or an integer, in which case the target will be of a square shape (size, size) If padding is non-zero, then the image is first zero-padded on each side with padding pixels.

RandomHorizontalFlip()

隨機進行PIL.Image圖像的水平翻轉,概率是0.5.

RandomSizedCrop(size, interpolation=Image.BILINEAR)

Random crop the given PIL.Image to a random size of (0.08 to 1.0) of the original size and and a random aspect ratio of 3/4 to 4/3 of the original aspect ratio

This is popularly used to train the Inception networks - size: size of the smaller edge - interpolation: Default: PIL.Image.BILINEAR

Pad(padding, fill=0)

Pads the given image on each side with padding number of pixels, and the padding pixels are filled with pixel value fill. If a 5x5 image is padded with padding=1 then it becomes 7x7

對於 torch.*Tensor 的變換

Normalize(mean, std)

Given mean: (R, G, B) and std: (R, G, B), will normalize each channel of the torch.*Tensor, i.e. channel = (channel - mean) / std

轉換變換

  • ToTensor() - Converts a PIL.Image (RGB) or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
  • ToPILImage() - Converts a torch.*Tensor of range [0, 1] and shape C x H x W or numpy ndarray of dtype=uint8, range[0, 255] and shape H x W x C to a PIL.Image of range [0, 255]

廣義變換

Lambda(lambda)

Given a Python lambda, applies it to the input img and returns it. For example:

transforms.Lambda(lambda x: x.add(10)) 

便利函數

make_grid(tensor, nrow=8, padding=2, normalize=False, range=None, scale_each=False)

Given a 4D mini-batch Tensor of shape (B x C x H x W), or a list of images all of the same size, makes a grid of images

normalize=True will shift the image to the range (0, 1), by subtracting the minimum and dividing by the maximum pixel value.

if range=(min, max) where min and max are numbers, then these numbers are used to normalize the image.

scale_each=True will scale each image in the batch of images separately rather than computing the (min, max) over all images.

Example usage is given in this notebook <https://gist.github.com/anonymous/bf16430f7750c023141c562f3e9f2a91>

save_image(tensor, filename, nrow=8, padding=2, normalize=False, range=None, scale_each=False)

Saves a given Tensor into an image file.

If given a mini-batch tensor, will save the tensor as a grid of images.

All options after filename are passed through to make_grid. Refer to it’s documentation for more details

用以輸出圖像的拼接,很方便。





沒想到這篇文章閱讀量這么大,考慮跟新下。

圖像引擎:由於需要讀取處理圖片所以需要相關的圖像庫。現在torchvision可以支持多個圖像讀取庫,可以切換。

使用的函數是:

torchvision.get_image_backend()   #獲取圖像存取引擎

torchvision.set_image_backend(backend)   #改變圖像讀取引擎

#backend (string) –圖像引擎的名字:是  {‘PIL’, ‘accimage’}其中之一。  accimage 包使用的是因特爾(Intel) IPP 庫。它的速度快於PIL,但是並不支持很多的圖像操作。

由於這個是后邊的,普通用處不大,知道即可。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM