调参技巧是一名合格的算法工程师的必备技能,本文主要分享在训练神经网络的过程中如何通过使用Keras实现不同的Learning Rate Decay策略,从而达到动态的调整Learning Rate的目的。
不同Learning Rate对收敛的影响(图片来源:cs231n)
1.为何要动态调整Learning Rate
采用Small Learning Rate(上)和Large Learning Rate(下)的梯度下降。来源:Coursera 上吴恩达(Andrew Ng)的机器学习课程
从上图可以看到,小的Learning Rate导致Gradient Descent的速度非常缓慢;大的Learning Rate导致Gradient Descent会Overshoot Minimum,甚至导致训练结果无法收敛。
因此我们在Trainning过程中动态的调整Learning Rate是一种常用的策略。
还有一些场景,如下图所示,迭代过程进入误差曲面中的鞍点,就可能被困在鞍点,而难以达到局部最小值或者全局最小值。
误差曲面鞍点。图片来源【1】
为了解决这个问题,可使用一些周期函数f来改变每次迭代的学习速率,这种方法让Learning Rate在合理的边界值之间周期变化,从而跳出鞍点。
三角形的周期函数作为Learning Rate。图片来源【1】
使用余弦函数作为周期函数的Learning Rate。图片来源【1】
通过周期性的动态改变Learning Rate,可以跳跃"山脉"收敛更快收敛到全局或者局部最优解。
固定Learning Rate VS 周期性的Learning Rete。图片来源【1】
2.Keras中的Learning Rate实现
2.1 Keras Standard Decay Schedule
Keras通过在Optimizer(SGD、Adam等)的decay参数提供了一个Learning Rate Scheduler。如下所示。
代码语言:javascript复制# initialize our optimizer and model, then compile it
opt = SGD(lr=1e-2, momentum=0.9, decay=1e-2/epochs)
model = ResNet.build(32, 32, 3, 10, (9, 9, 9),
(64, 64, 128, 256), reg=0.0005)
model.compile(loss="categorical_crossentropy", optimizer=opt,metrics=["accuracy"])
各个Epoch的起始Learning Rate如下:
Epoch | Learning Rate |
---|---|
1 | 0.01000 |
2 | 0.00836 |
3 | 0.00719 |
... | ... |
37 | 0.00121 |
38 | 0.00119 |
39 | 0.00116 |
2.2 Keras中自带的Learning Rate Scheduler
除了Keras Standard Decay之外,Keras还提供了如下的Learning Rate Scheduler实现:
- ExponentialDecay
- PiecewiseConstantDecay
- PolynomialDecay
- InverseTimeDecay
我们看看如何使用keras.optimizers. schedules调整学习率(Learning Rate)。
1)实现每100000个Step学习率衰减0.96。
代码语言:javascript复制initial_learning_rate = 0.1
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
initial_learning_rate,
decay_steps=100000,
decay_rate=0.96,
staircase=True)
model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=lr_schedule),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(data, labels, epochs=5)
2)在0~100000个Step使用Learning Rate=1.0;在100001~110000个Step使用Learning Rate=0.5;其余的Step使用Learning Rate=0.1。
代码语言:javascript复制step = tf.Variable(0, trainable=False)
boundaries = [100000, 110000]
values = [1.0, 0.5, 0.1]
learning_rate_fn = keras.optimizers.schedules.PiecewiseConstantDecay(
boundaries, values)
# Later, whenever we perform an optimization step, we pass in the step.
learning_rate = learning_rate_fn(step)
3)在10000个Step内将Learning Rate从0.1衰减到0.01。
代码语言:javascript复制...
starter_learning_rate = 0.1
end_learning_rate = 0.01
decay_steps = 10000
learning_rate_fn = tf.keras.optimizers.schedules.PolynomialDecay(
starter_learning_rate,
decay_steps,
end_learning_rate,
power=0.5)
model.compile(optimizer=tf.keras.optimizers.SGD(
learning_rate=learning_rate_fn),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(data, labels, epochs=5)
4)以1/t衰减Learning Rate。
代码语言:javascript复制initial_learning_rate = 0.1
decay_steps = 1.0
decay_rate = 0.5
learning_rate_fn = keras.optimizers.schedules.InverseTimeDecay(
initial_learning_rate, decay_steps, decay_rate)
model.compile(optimizer=tf.keras.optimizers.SGD(
learning_rate=learning_rate_fn),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(data, labels, epochs=5)
2.3 Custom Keras Learning Rate
下面我们看看如何在Keras中自定义神经网络的Learning Rate。
为了保证代码的干净整洁,同时遵循”面向对象编程“的最佳实践,我们首先定义一个Learning Rate基类:
代码语言:javascript复制# import the necessary packages
import matplotlib.pyplot as plt
import numpy as np
class LearningRateDecay:
def plot(self, epochs, title="Learning Rate Schedule"):
# compute the set of learning rates for each corresponding epoch
lrs = [self(i) for i in epochs]
# the learning rate schedule
plt.style.use("ggplot")
plt.figure()
plt.plot(epochs, lrs)
plt.title(title)
plt.xlabel("Epoch #")
plt.ylabel("Learning Rate")
在该基类中实现了Plot函数,通过该函数可以绘制出Learning Rate随时间变换的图像。
2.3.1 Step-based Learning Rate Schedules
Keras learning rate step-based decay. The schedule in red is a decay factor of 0.5 and blue is a factor of 0.25.
Step-based Decay可以实现在神经网络训练过程中每间隔指定的Epoch减少特定的Learning Rate。 Step-based Decay可以看做一个分段函数。如上图所致,Learning Rate在几个连续的Epoch中维持固定值,然后衰减到一个较小的值,再在几个连续的Epoch中维持这个固定值,如此往复,直至Trainning结束。
Python代码实现如下:
代码语言:javascript复制class StepDecay(LearningRateDecay):
def __init__(self, initAlpha=0.01, factor=0.25, dropEvery=10):
# store the base initial learning rate, drop factor, and epochs to drop every
self.initAlpha = initAlpha
self.factor = factor
self.dropEvery = dropEvery
def __call__(self, epoch):
# compute the learning rate for the current epoch
exp = np.floor((1 epoch) / self.dropEvery)
alpha = self.initAlpha * (self.factor ** exp)
# return the learning rate
return float(alpha)
2.3.2 Linear And Polynomial Learning Rate Schedule
Linear And Polynomial Decay可以实现在神经网络训练过程中每个Epoch持续衰减Learning Rate,直至为0。
Python代码实现如下:
代码语言:javascript复制class PolynomialDecay(LearningRateDecay):
def __init__(self, maxEpochs=100, initAlpha=0.01, power=1.0):
# store the maximum number of epochs, base learning rate,and power of the polynomial
self.maxEpochs = maxEpochs
self.initAlpha = initAlpha
self.power = power
def __call__(self, epoch):
# compute the new learning rate based on polynomial decay
decay = (1 - (epoch / float(self.maxEpochs))) ** self.power
alpha = self.initAlpha * decay
# return the new learning rate
return float(alpha)
2.3.3 通过Model.fit的callback设置学习率
在Training中,我们可通过参数配置的方式选择不同的Learning Rate策略。
代码语言:javascript复制# store the number of epochs to train for in a convenience variable,
# then initialize the list of callbacks and learning rate scheduler
# to be used
epochs = args["epochs"]
callbacks = []
schedule = None
# check to see if step-based learning rate decay should be used
if args["schedule"] == "step":
print("[INFO] using 'step-based' learning rate decay...")
schedule = StepDecay(initAlpha=1e-1, factor=0.25, dropEvery=15)
# check to see if linear learning rate decay should should be used
elif args["schedule"] == "linear":
print("[INFO] using 'linear' learning rate decay...")
schedule = PolynomialDecay(maxEpochs=epochs, initAlpha=1e-1, power=1)
# check to see if a polynomial learning rate decay should be used
elif args["schedule"] == "poly":
print("[INFO] using 'polynomial' learning rate decay...")
schedule = PolynomialDecay(maxEpochs=epochs, initAlpha=1e-1, power=5)
# if the learning rate schedule is not empty, add it to the list of
# callbacks
if schedule is not None:
callbacks = [LearningRateScheduler(schedule)]
有了Callbacks之后,在model的Fit函数中设置Callback函数,就可以实现Learning Rate在训练过程中的动态调整。
代码语言:javascript复制# initialize our optimizer and model, then compile it
opt = SGD(lr=1e-1, momentum=0.9, decay=decay)
model = ResNet.build(32, 32, 3, 10, (9, 9, 9),(64, 64, 128, 256), reg=0.0005)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
# train the network
H = model.fit(trainX, trainY, validation_data=(testX, testY), batch_size=128, epochs=epochs, callbacks=callbacks, verbose=1)