Using boosting to learn from errors使用提升法来从误差中学习

2020-04-22 15:42:32 浏览数 (2)

Gradient boosting regression is a technique that learns from its mistakes. Essentially, it tries to fit a bunch of weak learners. There are two things to note:

梯度上升回归是一种从错误中学习的技术,本质上,它在尝试拟合大量的弱学习器,这里有两件事需要注释:

1、 Individually, each learner has poor accuracy, but together they can have very good accuracy

单独的讲,每个学习器准确性都比较差,但是放在一起能够有很好的准确性

2、 They're applied sequentially, which means that each learner becomes an expert in the mistakes of the prior learner

他们被连续运用,意思是在前面的误差学习器都是下一个学习器的经验。

Getting ready准备工作

Let's use some basic regression data and see how gradient boosting regression (henceforth,GBR) works:

让我们使用一些基础回归数据,看一下GBR怎么工作:

代码语言:javascript复制
from sklearn.datasets import make_regression
X, y = make_regression(1000, 2, noise=10)

How to do it...怎么做的:

GBR is part of the ensemble module because it's an ensemble learner. This is the name for the idea behind using many weak learners to simulate a strong learner:

GBR是整体模型的一部分,因为他是一个整体学习器,这就是那个名字的意义:使用很多弱学习器模拟一个强学习器。

代码语言:javascript复制
from sklearn.ensemble import GradientBoostingRegressor as GBR
gbr = GBR()
gbr.fit(X, y)
gbr_preds = gbr.predict(X)

Clearly, there's more to fitting a usable model, but this pattern should be pretty clear by now.Now, let's fit a basic regression as well so that we can use it as the baseline:

必须清楚,想要拟合一个有用模型需要更多,但是此部分,我们应该做到更加清晰,让我们拟合一个基本回归,让它作为我们的基准线。

代码语言:javascript复制
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X, y)
lr_preds = lr.predict(X)

Now that we have a baseline, let's see how well GBR performed against linear regression.I'll leave it as an exercise for you to plot the residuals, but to get started, do the following:

现在我们有了基础线,让我们看看GBR相对线性回归的表现好在哪里。我将把画出残差图给你留成练习,但是为了开始先做以下步骤:

代码语言:javascript复制
gbr_residuals = y - gbr_preds
lr_residuals = y - lr_preds

The following will be the output:以下是输出:

It looks like GBR has a better fit, but it's a bit hard to tell. Let's take the 95 percent CI and compare:

看起来GBR拟合的好一点,但是有点难说,让我们使用95%置信区间,并作比较。

代码语言:javascript复制
import numpy as np
np.percentile(gbr_residuals, [2.5, 97.5])
array([-16.05443674, 17.53946294])
np.percentile(lr_residuals, [2.5, 97.5])
array([-20.05434912, 19.80272884])

So, GBR clearly fits a bit better; we can also make several modifications to the GBR algorithm,which might improve performance. I'll show an example here, then we'll walkthrough the different options in the How it works... section:

所以GBR拟合情况稍好一些,我们也能再对GBR算法做些改进,让表现有所改善。在这将举个例子,然后我们通过在how it works的章节做不同的设置选择

代码语言:javascript复制
n_estimators = np.arange(100, 1100, 350)
gbrs = [GBR(n_estimators=n_estimator) for n_estimator in n_estimators]
residuals = {}
for i, gbr in enumerate(gbrs):
    gbr.fit(X, y)
    residuals[gbr.n_estimators] = y - gbr.predict(X)

The following is the output:以下是新的输出

It's a bit muddled, but hopefully, it's clear that as the number of estimators increases, the error goes down. Sadly, this isn't a panacea; first, we don't test against a holdout set, and second, as the number of estimators goes up, the training time takes longer. This isn't a big deal on the dataset we use here, but imagine one or two magnitudes higher.

这有点混乱,但是幸运的是,估计值的增长很明显,误差下降了,遗憾的是,这不是万能药,首先,我们不能测试不相关的集合,其次,随着估计值上升,训练次数变得很大,对于我们这里用的数据集,这不是个大问题,但是想象一下提高1到2个量级。

How it works...怎么运行的

The first parameter, and the one we already looked at, is n_estimators —the number of weak learners that are used in GBR. In general, if you can get away with more (that is, have enough computational power), it is probably better. There are more nuances to the other parameters.

第一个参数,我们已经见过的是n_estimators,表示在GBR中运用的弱学习器的数量,总的来说,如果你设置的更多(在计算能力能跟上的情况下)它会表现的更好。这对于其他参数来说很微妙。

You should tune the max_depth parameter before all others. Since the individual learners are trees, max_depth controls how many nodes are produced for the trees. There's a subtle line between using the appropriate number of nodes that can fit the data well and using too many, which might cause overfitting.

你需要首先调试max_depth参数,当独立学习器是树模型,max_depth控制树要生成多少个节点,这有一个微妙的线介于使用适量的节点来拟合数据与使用太多节点造成过拟合之间。

The loss parameter controls the loss function, which determines the error. The ls paramete is the default, and stands for least squares. Least absolute deviation, Huber loss, and quantiles are also available.

loss参数控制决定误差的loss函数,ls参数是定义好的,表示最小值的平方,Least absolute deviation, Huber loss, and quantiles也能被选择。

0 人点赞