pytorch提供的torchvision中有三剑客
- datasets 包含了很多数据集
- models 包含了很多预训练模型
- transforms 包含了转换数据的方法或者是数据增强的方法
今天我们就来谈一下transforms中的一些方法:
1.torchvision.transforms.RandomCrop()
随机位置裁剪,下面是随机裁剪的五个参数的含义
代码语言:javascript复制Init signature:
torchvision.transforms.RandomCrop(
size,
padding=None,
pad_if_needed=False,
fill=0,
padding_mode='constant',
)
Docstring:
Crop the given PIL Image at a random location.
Args:
size (sequence or int): Desired output size of the crop. If size is an
int instead of sequence like (h, w), a square crop (size, size) is
made.
padding (int or sequence, optional): Optional padding on each border
of the image. Default is None, i.e no padding. If a sequence of length
4 is provided, it is used to pad left, top, right, bottom borders
respectively. If a sequence of length 2 is provided, it is used to
pad left/right, top/bottom borders, respectively.
pad_if_needed (boolean): It will pad the image if smaller than the
desired size to avoid raising an exception. Since cropping is done
after padding, the padding seems to be done at a random offset.
fill: Pixel fill value for constant fill. Default is 0. If a tuple of
length 3, it is used to fill R, G, B channels respectively.
This value is only used when the padding_mode is constant
padding_mode: Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant.
- constant: pads with a constant value, this value is specified with fill
- edge: pads with the last value on the edge of the image
- reflect: pads with reflection of image (without repeating the last value on the edge)
padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode
will result in [3, 2, 1, 2, 3, 4, 3, 2]
- symmetric: pads with reflection of image (repeating the last value on the edge)
padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode
will result in [2, 1, 1, 2, 3, 4, 4, 3]
2.torchvision.transforms.RandomHorizontalFlip()
随机水平翻转,一个参数p是概率参数
代码语言:javascript复制Init signature: torchvision.transforms.RandomHorizontalFlip(p=0.5)
Docstring:
Horizontally flip the given PIL Image randomly with a given probability.
Args:
p (float): probability of the image being flipped. Default value is 0.5
3.torchvision.transforms.RandomVerticalFlip()
随机上下翻转,一个参数p也为翻转的概率
代码语言:javascript复制Init signature: torchvision.transforms.RandomVerticalFlip(p=0.5)
Docstring:
Vertically flip the given PIL Image randomly with a given probability.
Args:
p (float): probability of the image being flipped. Default value is 0.5
4.torchvision.transforms.RandomRotation()
随机旋转一个角度,参数值第一个值则为角度
代码语言:javascript复制Init signature:
torchvision.transforms.RandomRotation(
degrees,
resample=False,
expand=False,
center=None,
)
Docstring:
Rotate the image by angle.
5.torchvision.transforms.ColorJitter()
修改图像的属性,参数的含义分别是亮度,对比度,饱和度和颜色
代码语言:javascript复制Init signature:
torchvision.transforms.ColorJitter(
brightness=0,
contrast=0,
saturation=0,
hue=0,
)
Docstring:
Randomly change the brightness, contrast and saturation of an image.
Args:
brightness (float or tuple of float (min, max)): How much to jitter brightness.
brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 brightness]
or the given [min, max]. Should be non negative numbers.
contrast (float or tuple of float (min, max)): How much to jitter contrast.
contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 contrast]
or the given [min, max]. Should be non negative numbers.
saturation (float or tuple of float (min, max)): How much to jitter saturation.
saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 saturation]
or the given [min, max]. Should be non negative numbers.
hue (float or tuple of float (min, max)): How much to jitter hue.
hue_factor is chosen uniformly from [-hue, hue] or the given [min, max].
Should have 0<= hue <= 0.5 or -0.5 <= min <= max <= 0.5.
6.torchvision.transforms.RandomGrayscale()
图片随机灰度化,一个参数是概率参数
代码语言:javascript复制Init signature: torchvision.transforms.RandomGrayscale(p=0.1)
Docstring:
Randomly convert image to grayscale with a probability of p (default 0.1).
Args:
p (float): probability that image should be converted to grayscale.
Returns:
PIL Image: Grayscale version of the input image with probability p and unchanged
with probability (1-p).
- If input image is 1 channel: grayscale version is 1 channel
- If input image is 3 channel: grayscale version is 3 channel with r == g == b
今天就先介绍这六种常用的增强方法,谢谢大家的观看。