入门|调参技能之学习率衰减(Learning Rate Decay)

调参技巧是一名合格的算法工程师的必备技能，本文主要分享在训练神经网络的过程中如何通过使用Keras实现不同的Learning Rate Decay策略，从而达到动态的调整Learning Rate的目的。

不同Learning Rate对收敛的影响（图片来源：cs231n）

1.为何要动态调整Learning Rate

采用Small Learning Rate（上）和Large Learning Rate(下)的梯度下降。来源：Coursera 上吴恩达（Andrew Ng）的机器学习课程

从上图可以看到，小的Learning Rate导致Gradient Descent的速度非常缓慢；大的Learning Rate导致Gradient Descent会Overshoot Minimum，甚至导致训练结果无法收敛。

因此我们在Trainning过程中动态的调整Learning Rate是一种常用的策略。

还有一些场景，如下图所示，迭代过程进入误差曲面中的鞍点，就可能被困在鞍点，而难以达到局部最小值或者全局最小值。

误差曲面鞍点。图片来源【1】

为了解决这个问题，可使用一些周期函数f来改变每次迭代的学习速率，这种方法让Learning Rate在合理的边界值之间周期变化，从而跳出鞍点。

三角形的周期函数作为Learning Rate。图片来源【1】

使用余弦函数作为周期函数的Learning Rate。图片来源【1】

通过周期性的动态改变Learning Rate，可以跳跃"山脉"收敛更快收敛到全局或者局部最优解。

固定Learning Rate VS 周期性的Learning Rete。图片来源【1】

2.Keras中的Learning Rate实现

2.1 Keras Standard Decay Schedule

Keras通过在Optimizer(SGD、Adam等)的decay参数提供了一个Learning Rate Scheduler。如下所示。

代码语言：javascript复制

# initialize our optimizer and model, then compile it
opt = SGD(lr=1e-2, momentum=0.9, decay=1e-2/epochs)
model = ResNet.build(32, 32, 3, 10, (9, 9, 9),
	(64, 64, 128, 256), reg=0.0005)
model.compile(loss="categorical_crossentropy", optimizer=opt,metrics=["accuracy"])

各个Epoch的起始Learning Rate如下：

Epoch	Learning Rate
1	0.01000
2	0.00836
3	0.00719
...	...
37	0.00121
38	0.00119
39	0.00116

2.2 Keras中自带的Learning Rate Scheduler

除了Keras Standard Decay之外，Keras还提供了如下的Learning Rate Scheduler实现:

ExponentialDecay
PiecewiseConstantDecay
PolynomialDecay
InverseTimeDecay

我们看看如何使用keras.optimizers. schedules调整学习率(Learning Rate)。

1）实现每100000个Step学习率衰减0.96。

代码语言：javascript复制

initial_learning_rate = 0.1
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate,
    decay_steps=100000,
    decay_rate=0.96,
    staircase=True)

model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=lr_schedule),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(data, labels, epochs=5)

2）在0~100000个Step使用Learning Rate=1.0；在100001~110000个Step使用Learning Rate=0.5；其余的Step使用Learning Rate=0.1。

代码语言：javascript复制

step = tf.Variable(0, trainable=False)
boundaries = [100000, 110000]
values = [1.0, 0.5, 0.1]
learning_rate_fn = keras.optimizers.schedules.PiecewiseConstantDecay(
    boundaries, values)

# Later, whenever we perform an optimization step, we pass in the step.
learning_rate = learning_rate_fn(step)

3）在10000个Step内将Learning Rate从0.1衰减到0.01。

代码语言：javascript复制

...
starter_learning_rate = 0.1
end_learning_rate = 0.01
decay_steps = 10000
learning_rate_fn = tf.keras.optimizers.schedules.PolynomialDecay(
    starter_learning_rate,
    decay_steps,
    end_learning_rate,
    power=0.5)

model.compile(optimizer=tf.keras.optimizers.SGD(
                  learning_rate=learning_rate_fn),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(data, labels, epochs=5)

4）以1/t衰减Learning Rate。

代码语言：javascript复制

initial_learning_rate = 0.1
decay_steps = 1.0
decay_rate = 0.5
learning_rate_fn = keras.optimizers.schedules.InverseTimeDecay(
  initial_learning_rate, decay_steps, decay_rate)

model.compile(optimizer=tf.keras.optimizers.SGD(
                  learning_rate=learning_rate_fn),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(data, labels, epochs=5)

2.3 Custom Keras Learning Rate

下面我们看看如何在Keras中自定义神经网络的Learning Rate。

为了保证代码的干净整洁，同时遵循”面向对象编程“的最佳实践，我们首先定义一个Learning Rate基类:

代码语言：javascript复制

# import the necessary packages
import matplotlib.pyplot as plt
import numpy as np
class LearningRateDecay:
    def plot(self, epochs, title="Learning Rate Schedule"):
	# compute the set of learning rates for each corresponding epoch
        lrs = [self(i) for i in epochs]
        # the learning rate schedule
        plt.style.use("ggplot")
        plt.figure()
        plt.plot(epochs, lrs)
        plt.title(title)
        plt.xlabel("Epoch #")
        plt.ylabel("Learning Rate")

在该基类中实现了Plot函数，通过该函数可以绘制出Learning Rate随时间变换的图像。

2.3.1 Step-based Learning Rate Schedules

Keras learning rate step-based decay. The schedule in red is a decay factor of 0.5 and blue is a factor of 0.25.

Step-based Decay可以实现在神经网络训练过程中每间隔指定的Epoch减少特定的Learning Rate。 Step-based Decay可以看做一个分段函数。如上图所致，Learning Rate在几个连续的Epoch中维持固定值，然后衰减到一个较小的值，再在几个连续的Epoch中维持这个固定值，如此往复，直至Trainning结束。

Python代码实现如下:

代码语言：javascript复制

class StepDecay(LearningRateDecay):
  def __init__(self, initAlpha=0.01, factor=0.25, dropEvery=10):
    # store the base initial learning rate, drop factor, and epochs to drop every
	self.initAlpha = initAlpha
	self.factor = factor
	self.dropEvery = dropEvery
  
  def __call__(self, epoch):
    # compute the learning rate for the current epoch
	exp = np.floor((1   epoch) / self.dropEvery)
	alpha = self.initAlpha * (self.factor ** exp)
	# return the learning rate
	return float(alpha)

2.3.2 Linear And Polynomial Learning Rate Schedule

Linear And Polynomial Decay可以实现在神经网络训练过程中每个Epoch持续衰减Learning Rate，直至为0。

Python代码实现如下:

代码语言：javascript复制

class PolynomialDecay(LearningRateDecay):
  def __init__(self, maxEpochs=100, initAlpha=0.01, power=1.0):
    # store the maximum number of epochs, base learning rate,and power of the polynomial
    self.maxEpochs = maxEpochs
    self.initAlpha = initAlpha
    self.power = power
    
  def __call__(self, epoch):
    # compute the new learning rate based on polynomial decay
    decay = (1 - (epoch / float(self.maxEpochs))) ** self.power
    alpha = self.initAlpha * decay
    # return the new learning rate
    return float(alpha)

2.3.3 通过Model.fit的callback设置学习率

在Training中，我们可通过参数配置的方式选择不同的Learning Rate策略。

代码语言：javascript复制

# store the number of epochs to train for in a convenience variable,
# then initialize the list of callbacks and learning rate scheduler
# to be used
epochs = args["epochs"]
callbacks = []
schedule = None
# check to see if step-based learning rate decay should be used
if args["schedule"] == "step":
  print("[INFO] using 'step-based' learning rate decay...")
  schedule = StepDecay(initAlpha=1e-1, factor=0.25, dropEvery=15)
# check to see if linear learning rate decay should should be used
elif args["schedule"] == "linear":
  print("[INFO] using 'linear' learning rate decay...")
  schedule = PolynomialDecay(maxEpochs=epochs, initAlpha=1e-1, power=1)
# check to see if a polynomial learning rate decay should be used
elif args["schedule"] == "poly":
  print("[INFO] using 'polynomial' learning rate decay...")
  schedule = PolynomialDecay(maxEpochs=epochs, initAlpha=1e-1, power=5)
# if the learning rate schedule is not empty, add it to the list of
# callbacks
if schedule is not None:
  callbacks = [LearningRateScheduler(schedule)]

有了Callbacks之后，在model的Fit函数中设置Callback函数，就可以实现Learning Rate在训练过程中的动态调整。

代码语言：javascript复制

# initialize our optimizer and model, then compile it
opt = SGD(lr=1e-1, momentum=0.9, decay=decay)
model = ResNet.build(32, 32, 3, 10, (9, 9, 9),(64, 64, 128, 256), reg=0.0005)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
# train the network
H = model.fit(trainX, trainY, validation_data=(testX, testY), batch_size=128, epochs=epochs, callbacks=callbacks, verbose=1)

keras linux 神经网络 python

0 人点赞