Using ridge regression to overcome linear regression's shortfalls

2020-04-21 14:22:26 浏览数 (1)

使用岭回归克服线性回归的偏差

In this recipe, we'll learn about ridge regression. It is different from vanilla linear regression;it introduces a regularization parameter to "shrink" the coefficients. This is useful when the dataset has collinear factors.

在这节,我们学习岭回归,这和寻常的线性回归不同,它尝试用正则化其范围来调和系数,这在数据集含有相关线性因子时非常有用。

Getting ready准备工作

Let's load a dataset that has a low effective rank and compare ridge regression with linear regression by way of the coefficients. If you're not familiar with rank, it's the smaller of the linearly independent columns and the linearly independent rows. One of the assumptions of linear regression is that the data matrix is of "full rank".

让我们导入含有少数有影响的秩的数据集,并比较岭回归和线性回归的系数。你如果不熟悉秩,它其实就是最小的线性无关列和线性无关行。一种假设是线性回归是满秩矩阵。

How to do it...怎么做

First, use make_regression to create a simple dataset with three predictors, but an effective rank of 2 .

Effective rank means that while technically the matrix is of full rank,many of the columns have a high degree of colinearity:

首先使用make_regression来生成一个含有三个预测值的简单的数据集,但是有影响的秩只有2个,Effective rank的意思是理论上,如果矩阵满秩,意味着很多列都有高度的共线性。

代码语言:javascript复制
from sklearn.datasets import make_regression
reg_data,reg_target=make_regression(n_samples=2000,n_features=3, effective_rank=2, noise=10)

First, let's take a look at regular linear regression:首先,我们看一下常规线性回归

代码语言:javascript复制
import numpy as np
n_bootstraps = 1000
len_data = len(reg_data)
subsample_size = np.int(0.75*len_data)
subsample = lambda: np.random.choice(np.arange(0, len_data),size=subsample_size)
coefs = np.ones((n_bootstraps, 3))
for i in range(n_bootstraps):
    subsample_idx = subsample()
    subsample_X = reg_data[subsample_idx]
    subsample_y = reg_target[subsample_idx]
    lr.fit(subsample_X, subsample_y)
    coefs[i][0] = lr.coef_[0]
    coefs[i][1] = lr.coef_[1]
    coefs[i][2] = lr.coef_[2]

The following is the output that gets generated:输出如下图所示

Follow the same procedure with Ridge , and have a look at the output:用同样的步骤实现Ridge,然后看一下输出结果:

代码语言:javascript复制
from sklearn.linear_model import Ridge()
r = Ridge()
n_bootstraps = 1000
len_data = len(reg_data)
subsample_size = np.int(0.75*len_data)
subsample = lambda: np.random.choice(np.arange(0, len_data),size=subsample_size)
coefs_r = np.ones((n_bootstraps, 3))  # carry out the same procedure from above和上面的步骤一样

The following is the output that gets generated:输出结果如下图所示

Don't let the similar width of the plots fool you; the coefficients for ridge regression are much closer to 0 . Let's look at the average spread between the coefficients:

不要让图片中相似的宽度欺骗了你,其实岭回归的系数更接近0,让我们看一下系数的均值分布

代码语言:javascript复制
>>> np.mean(coefs - coefs_r, axis=0) 
#coefs_r stores the ridge regression coefficients    coefs_r 存储着岭回归的系数
array([13.24098749, 18.28340271, 61.73626459])

So, on an average, the coefficients for linear regression are much higher than the ridge regression coefficients. This difference is the bias in the coefficients (forgetting, for a second,the potential bias of the linear regression coefficients). So then, what is the advantage of ridge regression? Well, let's look at the variance of our coefficients:

所以,在均值比较中,线性回归的系数比岭回归的系数高出很多,这就是线性回归系数的方差产生的区别。所以,岭回归到底有什么优势呢?然我们先看下我们的系数的方差。

代码语言:javascript复制
np.var(coefs, axis=0)
array([255.01858444, 182.01195126, 218.14725252])
np.var(coefs_r, axis=0)
array([19.87551666, 22.97529897, 20.99950272])

The variance has been dramatically reduced. This is the bias-variance trade-off that is so often discussed in machine learning. The next recipe will introduce how to tune the regularization parameter in ridge regression, which is at the heart of this trade-off.

变量很明显的降低了。这就是机器学习当中经常说到的偏差-方差调节。下一节将介绍在岭回归中如何调节正则化范围,这是该调节方法的核心

How it works...怎么运行的

Speaking of the regularization parameter, let's go through how ridge regression differs from linear regression. As was already shown, linear regression works, but it finds the vector of betas that minimize ||y-X β||^2

让我们通过岭回归与线性回归来讲讲正则化范围,像已经展示过的,线性回归有效,但是他是最小化||y-X β||^2

来寻找β向量。

Ridge regression finds the vector of betas that minimize ||y-X β||^2 || ΓX||^2 岭回归是通过最小化||y-X β||^2 || ΓX||^2

来寻找β向量。

Γ is typically al, or it's some scalar times the identity matrix. We actually used the default alpha when initializing ridge regression.

Γ代表al,或者缩放过的单位矩阵。初始化岭回归时,我们实际上使用自定义的α

Now that we created the object, we can look at its attributes:现在我们生成一个对象来看一下它的属性

代码语言:javascript复制
r #notice the alpha paramete
Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=None, solver='auto', tol=0.001)

This minimization has the following solution:优化过程经过以下步骤

The previous solution is the same as linear regression, except for the term. For a matrix A,is symmetric, and thus positive semidefinite. So, thinking about the translation of matrix algebra from scalar algebra, we effectively divide by a larger number. Multiplication by an inverse is analogous to division. So, this is what squeezes the coefficients towards 0. This is a bit of a crude explanation; for a deeper understanding, you should look at the connections between SVD and ridge regression.

前半部分和线性回归一样,除了后面这一项,比如一个对称矩阵A是半正定的,考虑变换从标量代数变换为矩阵代数,我们实际上分离了较大的数据,乘上一个相反的值就和做除法是一样的,这样就可以把系数压缩到0附近。这是个粗略的解释,如果要更深刻的理解,你需要去学习SVD(奇异值分解)和岭回归之间的联系。

0 人点赞