Using sparsity to regularize models使用稀疏性来正则化模型

2020-04-21 18:06:21 浏览数 (1)

The least absolute shrinkage and selection operator (LASSO) method is very similar to ridge regression and LARS. It's similar to Ridge Regression in the sense that we penalize our regression by some amount, and it's similar to LARS in that it can be used as a parameter selection, and it typically leads to a sparse vector of coefficients.

最小化压缩和选择因子LASSO方法很像岭回归和最小角回归LARS。在理解上与岭回归很相似,我们用特定的值来惩罚回归。它与LARS方法相似之处在于它可以被用于分类参数。它经常导致一个有稀疏向量的系数。

Getting ready准备

To be clear, lasso regression is not a panacea. There can be computation consequences to using lasso regression. As we'll see in this recipe, we'll use a loss function that isn't differential,and therefore, requires special, and more importantly, performance-impairing workarounds.

必须明白,Lasso回归并不是万能药,这儿可以使用Lasso回归计算结果。如我们所见,我们要使用一个普通的损失函数,但需求很特别,和工作环境中损失的表现。

How to do it...怎么做

Let's go back to the trusty make_regression function and create a dataset with the same parameters:

让我们回到可信赖的make_regression函数,生成一个有相同参数数据集

代码语言:javascript复制
from sklearn.datasets import make_regression
reg_data,reg_target=make_regression(n_samples=200, n_features=500,n_informative=5, noise=5)

Next, we need to import the Lasso object:然后,我们需要导入Lasso对象

代码语言:javascript复制
from sklearn.linear_model import Lasso
lasso = Lasso()

Lasso contains many parameters, but the most interesting parameter is alpha . It scales the penalization term of the Lasso method, which we'll look at in the How it works... section.For now, leave it as 1 . As an aside, and much like ridge regression, if this term is 0 , lasso is equivalent to linear regression:

Lasso含有很多参数,但是最有用的参数是α,它放缩Lasso方法的惩罚项我们将在后面看到,现在,设它为1,就很像岭回归,如果这项是0,Lasso就相当于是线性回归。

代码语言:javascript复制
lasso.fit(reg_data, reg_target)

Again, let's see how many of the coefficients remain nonzero:再一次,我们看看有多少系数仍然是非零项。

代码语言:javascript复制
np.sum(lasso.coef_ != 0)
10
lasso_0 = Lasso(0)
lasso_0.fit(reg_data, reg_target)  # 警告,未标准化数据
np.sum(lasso_0.coef_ != 0)
500

None of our coefficients turn out to be 0 , which is what we expect. Actually, if you run this, you might get a warning from scikit-learn that advises you to choose LinearRegression .

我们希望没有一个系数被调整为0,实际上,如果你使用这个方法,scikit-learn将给你一个警告,来建议你选择线性回归模型。

How it works...它怎么做的

For linear regression, we minimized the squared error. Here, we're still going to minimize the squared error, but we'll add a penalization term that will induce the scarcity. The equation looks like the following:

对于线性回归,我们最小化平方误差,这里,我们仍然进行最小化平方误差。但是,我们将要加入导致缺失性的惩罚项,方程如下所示:

An alternate way of looking at this is to minimize the residual sum of squares:可代替的方法是最小化残差平方和

This constraint is what leads to the scarcity. Lasso regression's constraint creates a hypercube around the origin (the coefficients being the axis), which means that the most extreme points are the corners, where many of the coefficients are 0 . Ridge regression creates a hypersphere due to the constraint of the l2 norm being less than some constant, but it's very likely that coefficients will not be zero even if they are constrained.

这就是导致稀缺性的限制,Lasso回归的限制产生一个围绕原点的立方体,以系数为轴。意味着最偏差的点是角落的点,那里的系数多为0。岭回归生成一个超球体来限制L2范数小于某个常数,并且它甚至限制了其系数不会是0

Lasso cross-validation Lasso交叉验证

Choosing the most appropriate lambda is a critical problem. We can specify the lambda ourselves or use cross-validation to find the best choice given the data at hand:

选择最合适的λ是一个最重要的问题,我们能自己规定λ的值,或者使用交叉验证用手边的数据来寻找最好的值。

代码语言:javascript复制
from sklearn.linear_model import LassoCV
lassocv = LassoCV()
lassocv.fit(reg_data, reg_target)

lassocv will have, as an attribute, the most appropriate lambda. scikit-learn mostly uses alpha in its notation, but the literature uses lambda:

LassoCV有最合适的λ的参数,scikit-learn总是用自己的α字母,但是文献上一般用λ。

代码语言:javascript复制
lassocv.alpha_
0.532283861281302

The number of coefficients can be accessed in the regular manner:系数的数量可以用传统方法引入:

代码语言:javascript复制
lassocv.coef_[:5]
array([0., 42.41, 0.,0., -0.])

Letting lassocv choose the appropriate best fit leaves us with 11 nonzero coefficients:让LassoCV选择最合适的拟合,留给我们11个非0的系数。

代码语言:javascript复制
np.sum(lassocv.coef_ != 0)
33

Lasso for feature selection Lasso can often be used for feature selection for other methods. For example, you might run lasso regression to get the appropriate number of features, and then use these features in another algorithm.

Lasso用于特征选择常常被用于为其他方法选择特征,例如,你可能运行Lasso回归函数来得到最合适的特征数量,然后在其他算法中使用这些特征。

To get the features we want, create a masking array based on the columns that aren't zero,and then filter to keep the features we want:

为了得到我们想要的特征,依靠非0列生成蒙版数组,然后过滤出我们需要的特征。

代码语言:javascript复制
mask = lassocv.coef_ != 0
new_reg_data = reg_data[:, mask]
new_reg_data.shape
(200, 33)

0 人点赞