从零开始学Pytorch（二）之线性回归

线性回归

简单的说，线性回归预测是基于某个变量 X （自变量）来预测变量 Y （因变量）的值，当然前提是 X 和 Y 之间存在线性关系。这两个变量之间的线性关系可以用直线表示（称为回归线）。

线性回归的基本要素

模型

举个预测波士顿房价的例子，这里我们都进行了化简：假设房屋价格只取决于两个因素，即面积（平方米）和房龄（年）。接下来我们希望探索价格与这两个因素的具体关系。线性回归假设输出与各个输入之间是线性关系:

数据集

我们收集一系列的真实数据，例如多栋房屋的真实价格和对应的面积、房龄。我们希望在这个数据集上面来拟合模型参数使模型的预测价格与真实价格的误差达到最小。在ML术语中，数据集被称为训练集（training set），一栋房屋被称为一个样本（sample），其真实售出价格叫作标签（label），用来预测标签的两个因素叫作特征（feature）。

损失函数

在模型训练中，我们需要计算价格预测值与真实值之间的误差。一个常用的选择是平方函数。它在评估索引为的样本误差的表达式为：

优化函数 - 随机梯度下降

当模型和损失函数形式较为简单时，误差最小化问题的解可以直接用公式表达出来，这类解叫作解析解（analytical solution）。本节使用的线性回归和平方误差刚好属于这个范畴。还有一类模型并没有解析解，只能通过优化算法有限次迭代来尽可能降低损失函数的值。这类解叫作数值解（numerical solution）。

求数值解的优化算法中，小批量随机梯度下降（mini-batch stochastic gradient descent）在深度学习中被广泛使用。先初始化模型参数的初始值；然后对参数进行多次迭代，使每次迭代都降低损失函数的值。在每次迭代中，先随机均匀采样一个由固定数目训练数据样本所组成的小批量（mini-batch），然后求小批量中数据样本的平均损失有关模型参数的导数（梯度），最后用此结果与预先设定的一个正数的乘积作为模型参数在本次迭代的减小量。

学习率: 代表在每次优化中，能够学习的步长的大小批量大小: 是小批量计算中的批量大小batch size

线性回归模型从零开始的实现

代码语言：javascript复制

# import packages and modules
%matplotlib inline
import torch
from IPython import display
from matplotlib import pyplot as plt
import numpy as np
import random

print(torch.__version__)

输出：1.3.0

生成数据集

使用线性模型来生成数据集，生成一个1000个样本的数据集，下面是用来生成数据的线性关系：

代码语言：javascript复制

# set input feature number 
num_inputs = 2
# set example number
num_examples = 1000

# set true weight and bias in order to generate corresponded label
true_w = [2, -3.4]
true_b = 4.2

features = torch.randn(num_examples, num_inputs,
                      dtype=torch.float32)
labels = true_w[0] * features[:, 0]   true_w[1] * features[:, 1]   true_b
labels  = torch.tensor(np.random.normal(0, 0.01, size=labels.size()),
                       dtype=torch.float32)

使用图像来展示生成的数据

代码语言：javascript复制

plt.scatter(features[:, 1].numpy(), labels.numpy(), 1);

代码语言：javascript复制

features = torch.randn(num_examples, num_inputs,
                      dtype=torch.float32)
print(features)

输出：tensor([[ 0.0908, -0.8646],
        [-1.6370,  1.6305],
        [-0.1965,  0.8613],
        ...,
        [-0.9776,  0.0575],
        [ 1.9371, -0.1497],
        [-0.1417, -1.0046]])

读取数据集

代码语言：javascript复制

def data_iter(batch_size, features, labels):
    num_examples = len(features)
    indices = list(range(num_examples))
    random.shuffle(indices)  # random read 10 samples
    for i in range(0, num_examples, batch_size):
        j = torch.LongTensor(indices[i: min(i   batch_size, num_examples)]) # the last time may be not enough for a whole batch
        yield  features.index_select(0, j), labels.index_select(0, j)
        
batch_size = 10
for X, y in data_iter(batch_size, features, labels):
    print(X, 'n', y)
    break
输出：tensor([[ 1.3591,  0.6950],
        [ 0.5206, -0.2726],
        [-0.6639,  0.9716],
        [ 2.7164, -0.6513],
        [-1.0642,  1.9331],
        [-2.2240, -0.3616],
        [-0.9094,  0.6691],
        [-0.2991,  0.2488],
        [ 1.8312,  0.2209],
        [ 0.2833, -1.1672]]) 
 tensor([6.9694, 6.0005, 9.5797, 0.6944, 4.1964, 6.8519, 2.5178, 4.4217, 5.4679,
        9.9754])

初始化模型参数

代码语言：javascript复制

w = torch.tensor(np.random.normal(0, 0.01, (num_inputs, 1)), dtype=torch.float32)
b = torch.zeros(1, dtype=torch.float32)

w.requires_grad_(requires_grad=True)
b.requires_grad_(requires_grad=True)

定义模型

定义用来训练参数的训练模型：

代码语言：javascript复制

def linreg(X, w, b):
    return torch.mm(X, w)   b

定义损失函数

我们使用的是均方误差损失函数：

代码语言：javascript复制

def squared_loss(y_hat, y): 
    return (y_hat - y.view(y_hat.size())) ** 2 / 2

定义优化函数

在这里优化函数使用的是小批量随机梯度下降：

代码语言：javascript复制

def sgd(params, lr, batch_size): 
    for param in params:
        param.data -= lr * param.grad / batch_size

训练

训练模型：

代码语言：javascript复制

# super parameters init
lr = 0.03
num_epochs = 5

net = linreg
loss = squared_loss

# training
for epoch in range(num_epochs):  # training repeats num_epochs times
    # in each epoch, all the samples in dataset will be used once
    
    # X is the feature and y is the label of a batch sample
    for X, y in data_iter(batch_size, features, labels):
        l = loss(net(X, w, b), y).sum()  
        # calculate the gradient of batch sample loss 
        l.backward()  
        # using small batch random gradient descent to iter model parameters
        sgd([w, b], lr, batch_size)  
        # reset parameter gradient
        w.grad.data.zero_()
        b.grad.data.zero_()
    train_l = loss(net(features, w, b), labels)
    print('epoch %d, loss %f' % (epoch   1, train_l.mean().item()))
    
输出：epoch 1, loss 7.605014
epoch 2, loss 7.521966
epoch 3, loss 7.550967
epoch 4, loss 7.542496
epoch 5, loss 7.535208

线性回归模型使用pytorch的简洁实现

代码语言：javascript复制

import torch
from torch import nn
import numpy as np
torch.manual_seed(1)
torch.set_default_tensor_type('torch.FloatTensor')

生成数据集

在这里生成数据集跟从零开始的实现中是完全一样的。

代码语言：javascript复制

num_examples = 1000

true_w = [2, -3.4]
true_b = 4.2

features = torch.tensor(np.random.normal(0, 1, (num_examples, num_inputs)), dtype=torch.float)
labels = true_w[0] * features[:, 0]   true_w[1] * features[:, 1]   true_b
labels  = torch.tensor(np.random.normal(0, 0.01, size=labels.size()), dtype=torch.float)

读取数据集

代码语言：javascript复制

import torch.utils.data as Data

batch_size = 10

# combine featues and labels of dataset
dataset = Data.TensorDataset(features, labels)

# put dataset into DataLoader
data_iter = Data.DataLoader(
    dataset=dataset,            # torch TensorDataset format
    batch_size=batch_size,      # mini batch size
    shuffle=True,               # whether shuffle the data or not
    num_workers=2,              # read data in multithreading
)
for X, y in data_iter:
    print(X, 'n', y)
    break
    
输出：tensor([[ 0.5584, -0.4995],
        [-0.1495, -1.6520],
        [-0.3280,  0.2594],
        [-0.4857, -1.2976],
        [ 1.8603,  0.4539],
        [-0.3628,  0.0064],
        [ 1.3235, -0.3536],
        [-2.3426, -0.5968],
        [-0.6290, -0.2948],
        [-0.0787,  0.2180]]) 
 tensor([7.0088, 9.5071, 2.6718, 7.6535, 6.3802, 3.4601, 8.0475, 1.5223, 3.9682,
        3.2977])

定义模型

代码语言：javascript复制

class LinearNet(nn.Module):
    def __init__(self, n_feature):
        super(LinearNet, self).__init__()      # call father function to init 
        self.linear = nn.Linear(n_feature, 1)  # function prototype: `torch.nn.Linear(in_features, out_features, bias=True)`

    def forward(self, x):
        y = self.linear(x)
        return y
    
net = LinearNet(num_inputs)
# ways to init a multilayer network
# method one
net = nn.Sequential(
    nn.Linear(num_inputs, 1)
    # other layers can be added here
    )

# method two
net = nn.Sequential()
net.add_module('linear', nn.Linear(num_inputs, 1))
# net.add_module ......

# method three
from collections import OrderedDict
net = nn.Sequential(OrderedDict([
          ('linear', nn.Linear(num_inputs, 1))
          # ......
        ]))

初始化模型参数

代码语言：javascript复制

from torch.nn import init

init.normal_(net[0].weight, mean=0.0, std=0.01)
init.constant_(net[0].bias, val=0.0)  # or you can use `net[0].bias.data.fill_(0)` to modify it directly

输出：Parameter containing:
tensor([0.], requires_grad=True)

for param in net.parameters():
    print(param)
输出：Parameter containing:
tensor([[-0.0142, -0.0161]], requires_grad=True)
Parameter containing:
tensor([0.], requires_grad=True)

定义损失函数

代码语言：javascript复制

loss = nn.MSELoss()    # nn built-in squared loss function
                       # function prototype: `torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')`

定义优化函数

代码语言：javascript复制

import torch.optim as optim

optimizer = optim.SGD(net.parameters(), lr=0.03)   # built-in random gradient descent function
print(optimizer)  # function prototype: `torch.optim.SGD(params, lr=, momentum=0, dampening=0, weight_decay=0, nesterov=False)`
输出：SGD (
Parameter Group 0
    dampening: 0
    lr: 0.03
    momentum: 0
    nesterov: False
    weight_decay: 0
)

训练

代码语言：javascript复制

num_epochs = 3
for epoch in range(1, num_epochs   1):
    for X, y in data_iter:
        output = net(X)
        l = loss(output, y.view(-1, 1))
        optimizer.zero_grad() # reset gradient, equal to net.zero_grad()
        l.backward()
        optimizer.step()
    print('epoch %d, loss: %f' % (epoch, l.item()))
# result comparision
dense = net[0]
print(true_w, dense.weight.data)
print(true_b, dense.bias.data)

输出：epoch 1, loss: 0.000103
epoch 2, loss: 0.000097
epoch 3, loss: 0.000079

参考文献

[1]《动手深度学习》李沐

[2]伯禹教育课程

线性回归批量计算深度学习

0 人点赞