Directly applying Bayesian ridge regression直接使用贝叶斯岭回归

2020-04-22 15:42:31 浏览数 (1)

In the Using ridge regression to overcome linear regression's shortfalls recipe, we discussed the connections between the constraints imposed by ridge regression from an optimization standpoint. We also discussed the Bayesian interpretation of priors on the coefficients, which attract the mass of the density towards the prior, which often has a mean of 0 .

在使用岭回归解决线性回归方法的不足之处时,我们讨论了岭回归在最优点与施加限制之间的联系。我们也讨论了贝叶斯对先验概率的系数方面的解,其均值为0并且集中在先验密度附近。

So, now we'll look at how we can directly apply this interpretation though scikit-learn.

所以,现在我们看一看通过scikit-learn,我们如何直接应用这个解。

Getting ready准备工作

Ridge and lasso regression can both be understood through a Bayesian lens as opposed to an optimization lens. Only Bayesian ridge regression is implemented by scikit-learn, but in the How it works... section, we'll look at both cases.

岭回归和Lasso回归都能够通过一个与贝叶斯模型最优模型相反的模型来理解,只有贝叶斯岭回归被scikit-learn执行。但在如何运行部分,我们将观察两种例子:

First, as usual, let's create some regression data:首先,让我们生成回归数据集:

代码语言:javascript复制
from sklearn.datasets import make_regression
X, y = make_regression(1000, 10, n_informative=2, noise=20)

How to do it...如何做的

We can just "throw" ridge regression at the problem with a few simple steps:在这个问题中,我们只需要丢给岭回归很少的步骤

代码语言:javascript复制
from sklearn.linear_model import BayesianRidge
br = BayesianRidge()

The two sets of coefficients of interest are alpha_1 / alpha_2 and lambda_1 / lambda_2 .The alphas are the hyperparameters for the prior over the alpha parameter, and the lambda are the hyperparameters of the prior over the lambda parameter.

有趣的两组系数集就是α1/α2和λ1/λ2,其中α是初始值的超参数超过了α参数,其中λ是初始值的超参数超过了λ参数。

First, let's fit a model without any modification to the hyperparameters:

首先,我们来拟合一个不含任何调整的超参数的模型

代码语言:javascript复制
br.fit(X, y)
br.coef_
array([-0.37073297,  0.16745965, -0.77672044, 29.24241894, -0.69319217,
        0.64905847, 86.9454228 , -0.24738249, -1.63909699,  1.43038709])

Now, if we modify the hyperparameters, notice the slight changes in the coefficients:

现在,如果我们调整超参数,注意系数微小的变化:

代码语言:javascript复制
br_alphas = BayesianRidge(alpha_1=10, lambda_1=10)
br_alphas.fit(X, y)
br_alphas.coef_
array([-0.36917484,  0.16682313, -0.77961059, 29.21596299, -0.69730227,
        0.64425288, 86.86658136, -0.2477023 , -1.63266313,  1.42687844])

How it works...如何工作的

For Bayesian ridge regression, we assume a prior over the errors and alpha. Both these priors are gamma distributions.The gamma distribution is a very flexible distribution. Here are some of the different shapes the gamma distribution can take given the different parameterization techniques for location and scale. 1e-06 is the default parameterization of BayesianRidge in scikit-learn:

对于贝叶斯岭回归,我们设定一个初始值大于误差和α,他们都是γ分布。γ分布是很有弹性的分布,这里有一些不同形状的γ分布,它们是被给与不同参数化技术的基本值和缩放,1e-06是scikit-learn定义的贝叶斯岭回归的参数。

As you can see, the coefficients are naturally shrunk towards 0 , especially with a very small location parameter.

如你所见,系数都向0自然缩减,尤其是基本参数很小的时候。

There's more...扩展阅读

Like I mentioned earlier, there's also a Bayesian interpretation of lasso regression. Imagine we set priors over the coefficients; remember that they are random numbers themselves.

像我之前提到的,这里也有Lasso回归的一个贝叶斯的解,想象我们设置系数的初始值,记得他是自动随机数。

For lasso regression, we will choose a prior that naturally produces 0s, for example, the double exponential.

对于Lasso回归,我们将选择初始值自然生成,比如双重指数。

Notice the peak around 0. This will naturally lead to the zero coefficients in lasso regression.By tuning the hyperparameters, it's also possible to create 0 coefficients that more or less depend on the setup of the problem.

注意峰值在0附近,这将自然导致Lasso回归有0系数,通过调整超参数,可能或多或少的依据问题生成0系数。

0 人点赞