TensorFlow 2.0快速上手指南12条：“Keras之父”亲授

铜灵编译整理量子位出品| 公众号 QbitAI

如何用TensorFlow 2.0 Keras进行机器学习研究？

谷歌深度学习研究员、“Keras之父”François Chollet发表推特，总结了一份TensorFlow 2.0 Keras做深度学习研究的速成指南。

在这份指南中，Chollet提出了12条必备准则，条条简短易用，全程干货满满，在推特上引发了近3K网友点赞，千人转发。

不多说了，一起看看大神“化繁为简”的编程世界：

必备指南12条

1）你首先需要学习层（Layer），一层Layer里就封装了一种状态和一些计算。

代码语言：javascript复制

from tensorflow.keras.layers import Layer

class Linear(Layer):
  """y = w.x   b"""

  def __init__(self, units=32, input_dim=32):
      super(Linear, self).__init__()
      w_init = tf.random_normal_initializer()
      self.w = tf.Variable(
          initial_value=w_init(shape=(input_dim, units), dtype='float32'),
          trainable=True)
      b_init = tf.zeros_initializer()
      self.b = tf.Variable(
          initial_value=b_init(shape=(units,), dtype='float32'),
          trainable=True)

  def call(self, inputs):
      return tf.matmul(inputs, self.w)   self.b


# Instantiate our layer.
linear_layer = Linear(4, 2)

# The layer can be treated as a function.
# Here we call it on some data.
y = linear_layer(tf.ones((2, 2)))
assert y.shape == (2, 4)

# Weights are automatically tracked under the `weights` property.
assert linear_layer.weights == [linear_layer.w, linear_layer.b]

2）add_weight方法可能是构建权重的捷径。

3）可以实践一下在单独的build中构建权重，用layer捕捉的第一个输入的shape来调用add_weight方法，这种模式不用我们再去指定input_dim了。

代码语言：javascript复制

class Linear(Layer):
  """y = w.x   b"""

  def __init__(self, units=32):
      super(Linear, self).__init__()
      self.units = units

  def build(self, input_shape):
      self.w = self.add_weight(shape=(input_shape[-1], self.units),
                               initializer='random_normal',
                               trainable=True)
      self.b = self.add_weight(shape=(self.units,),
                               initializer='random_normal',
                               trainable=True)

  def call(self, inputs):
      return tf.matmul(inputs, self.w)   self.b


# Instantiate our lazy layer.
linear_layer = Linear(4)

# This will also call `build(input_shape)` and create the weights.
y = linear_layer(tf.ones((2, 2)))

4）如果想自动检索这一层权重的梯度，可以在GradientTape中调用。利用这些梯度，你可以使用优化器或者手动更新的权重。当然，你也可以在使用前修正梯度。

代码语言：javascript复制

# Prepare a dataset.
(x_train, y_train), _ = tf.keras.datasets.mnist.load_data()
dataset = tf.data.Dataset.from_tensor_slices(
    (x_train.reshape(60000, 784).astype('float32') / 255, y_train))
dataset = dataset.shuffle(buffer_size=1024).batch(64)

# Instantiate our linear layer (defined above) with 10 units.
linear_layer = Linear(10)

# Instantiate a logistic loss function that expects integer targets.
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)

# Iterate over the batches of the dataset.
for step, (x, y) in enumerate(dataset):

  # Open a GradientTape.
  with tf.GradientTape() as tape:

    # Forward pass.
    logits = linear_layer(x)

    # Loss value for this batch.
    loss = loss_fn(y, logits)

  # Get gradients of weights wrt the loss.
  gradients = tape.gradient(loss, linear_layer.trainable_weights)

  # Update the weights of our linear layer.
  optimizer.apply_gradients(zip(gradients, linear_layer.trainable_weights))

  # Logging.
  if step % 100 == 0:
    print(step, float(loss))

5）层创建的权重可以是可训练的，也可以是不可训练的，是否可训练在trainable_weights和non_trainable_weights中查看。下面这个是一个不可训练的层：

代码语言：javascript复制

class ComputeSum(Layer):
  """Returns the sum of the inputs."""

  def __init__(self, input_dim):
      super(ComputeSum, self).__init__()
      # Create a non-trainable weight.
      self.total = tf.Variable(initial_value=tf.zeros((input_dim,)),
                               trainable=False)

  def call(self, inputs):
      self.total.assign_add(tf.reduce_sum(inputs, axis=0))
      return self.total  

my_sum = ComputeSum(2)
x = tf.ones((2, 2))

y = my_sum(x)
print(y.numpy())  # [2. 2.]

y = my_sum(x)
print(y.numpy())  # [4. 4.]

assert my_sum.weights == [my_sum.total]
assert my_sum.non_trainable_weights == [my_sum.total]
assert my_sum.trainable_weights == []

6）可以将层递归嵌套创建一个更大的计算块。无论是可训练的还是不可训练的，每一层都与它子层（sublayer）的权重有关联。

代码语言：javascript复制

# Let's reuse the Linear class
# with a `build` method that we defined above.

class MLP(Layer):
    """Simple stack of Linear layers."""

    def __init__(self):
        super(MLP, self).__init__()
        self.linear_1 = Linear(32)
        self.linear_2 = Linear(32)
        self.linear_3 = Linear(10)

    def call(self, inputs):
        x = self.linear_1(inputs)
        x = tf.nn.relu(x)
        x = self.linear_2(x)
        x = tf.nn.relu(x)
        return self.linear_3(x)

mlp = MLP()

# The first call to the `mlp` object will create the weights.
y = mlp(tf.ones(shape=(3, 64)))

# Weights are recursively tracked.
assert len(mlp.weights) == 6

7）层可以在向前传递的过程中带来损失，将损失正则化很管用。

代码语言：javascript复制

class ActivityRegularization(Layer):
  """Layer that creates an activity sparsity regularization loss."""

  def __init__(self, rate=1e-2):
    super(ActivityRegularization, self).__init__()
    self.rate = rate

  def call(self, inputs):
    # We use `add_loss` to create a regularization loss
    # that depends on the inputs.
    self.add_loss(self.rate * tf.reduce_sum(inputs))
    return inputs


# Let's use the loss layer in a MLP block.

class SparseMLP(Layer):
  """Stack of Linear layers with a sparsity regularization loss."""

  def __init__(self):
      super(SparseMLP, self).__init__()
      self.linear_1 = Linear(32)
      self.regularization = ActivityRegularization(1e-2)
      self.linear_3 = Linear(10)

  def call(self, inputs):
      x = self.linear_1(inputs)
      x = tf.nn.relu(x)
      x = self.regularization(x)
      return self.linear_3(x)


mlp = SparseMLP()
y = mlp(tf.ones((10, 10)))

print(mlp.losses)  # List containing one float32 scalar

8）这些损失在向前传递时开始由顶层清除，因此不会累积。layer.losses只包含在最后一次向前传递中产生的损失。在写训练循环时，你通常会在计算梯度之前，将这些损失再累加起来。

代码语言：javascript复制

# Losses correspond to the *last* forward pass.
mlp = SparseMLP()
mlp(tf.ones((10, 10)))
assert len(mlp.losses) == 1
mlp(tf.ones((10, 10)))
assert len(mlp.losses) == 1  # No accumulation.

# Let's demonstrate how to use these losses in a training loop.

# Prepare a dataset.
(x_train, y_train), _ = tf.keras.datasets.mnist.load_data()
dataset = tf.data.Dataset.from_tensor_slices(
    (x_train.reshape(60000, 784).astype('float32') / 255, y_train))
dataset = dataset.shuffle(buffer_size=1024).batch(64)

# A new MLP.
mlp = SparseMLP()

# Loss and optimizer.
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)

for step, (x, y) in enumerate(dataset):
  with tf.GradientTape() as tape:

    # Forward pass.
    logits = mlp(x)

    # External loss value for this batch.
    loss = loss_fn(y, logits)

    # Add the losses created during the forward pass.
    loss  = sum(mlp.losses)

    # Get gradients of weights wrt the loss.
    gradients = tape.gradient(loss, mlp.trainable_weights)

  # Update the weights of our linear layer.
  optimizer.apply_gradients(zip(gradients, mlp.trainable_weights))

  # Logging.
  if step % 100 == 0:
    print(step, float(loss))

9）把计算编译成静态图再运行，可能在debug阶段比直接运行表现更好。静态图是研究人员的好朋友，你可以通过将函数封装在tf.function decorator中来编译它们。

代码语言：javascript复制

# Prepare our layer, loss, and optimizer.
mlp = MLP()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)

# Create a training step function.

@tf.function  # Make it fast.
def train_on_batch(x, y):
  with tf.GradientTape() as tape:
    logits = mlp(x)
    loss = loss_fn(y, logits)
    gradients = tape.gradient(loss, mlp.trainable_weights)
  optimizer.apply_gradients(zip(gradients, mlp.trainable_weights))
  return loss

# Prepare a dataset.
(x_train, y_train), _ = tf.keras.datasets.mnist.load_data()
dataset = tf.data.Dataset.from_tensor_slices(
    (x_train.reshape(60000, 784).astype('float32') / 255, y_train))
dataset = dataset.shuffle(buffer_size=1024).batch(64)

for step, (x, y) in enumerate(dataset):
  loss = train_on_batch(x, y)
  if step % 100 == 0:
    print(step, float(loss))

10）在训练和推理的过程中，尤其是在批标准化层和Dropout层中，执行训练和推理操作的流程是不一样的。这时可以套用一种模板做法，在call中增加training(boolean) 参数。

通过此举，你可以在训练和推理阶段正确使用内部评估循环。

代码语言：javascript复制

class Dropout(Layer):

  def __init__(self, rate):
    super(Dropout, self).__init__()
    self.rate = rate

  @tf.function
  def call(self, inputs, training=None):
    # Note that the tf.function decorator enables use
    # to use imperative control flow like this `if`,
    # while defining a static graph!
    if training:
      return tf.nn.dropout(inputs, rate=self.rate)
    return inputs


class MLPWithDropout(Layer):

  def __init__(self):
      super(MLPWithDropout, self).__init__()
      self.linear_1 = Linear(32)
      self.dropout = Dropout(0.5)
      self.linear_3 = Linear(10)

  def call(self, inputs, training=None):
      x = self.linear_1(inputs)
      x = tf.nn.relu(x)
      x = self.dropout(x, training=training)
      return self.linear_3(x)

mlp = MLPWithDropout()
y_train = mlp(tf.ones((2, 2)), training=True)
y_test = mlp(tf.ones((2, 2)), training=False)

11）你可以有很多内置层，从Dense、Conv2D、LSTM到Conv2DTranspose和 ConvLSTM2D都可以拥有，学会重新利用内置功能。

12）如果要构建深度学习模型，你不必总是面向对象编程。到目前为止，你能看到的所有层都可以在功能上进行组合，就像下面这样：

代码语言：javascript复制

# We use an `Input` object to describe the shape and dtype of the inputs.
# This is the deep learning equivalent of *declaring a type*.
# The shape argument is per-sample; it does not include the batch size.
# The functional API focused on defining per-sample transformations.
# The model we create will automatically batch the per-sample transformations,
# so that it can be called on batches of data.
inputs = tf.keras.Input(shape=(16,))

# We call layers on these "type" objects
# and they return updated types (new shapes/dtypes).
x = Linear(32)(inputs) # We are reusing the Linear layer we defined earlier.
x = Dropout(0.5)(x) # We are reusing the Dropout layer we defined earlier.
outputs = Linear(10)(x)

# A functional `Model` can be defined by specifying inputs and outputs.
# A model is itself a layer like any other.
model = tf.keras.Model(inputs, outputs)

# A functional model already has weights, before being called on any data.
# That's because we defined its input shape in advance (in `Input`).
assert len(model.weights) == 4

# Let's call our model on some data, for fun.
y = model(tf.ones((2, 16)))
assert y.shape == (2, 10)

# You can pass a `training` argument in `__call__`
# (it will get passed down to the Dropout layer).
y = model(tf.ones((2, 16)), training=True)

这就是函数API，它比子分类更简洁易用，不过它只能用于定义层中的DAG。

掌握了上述指南12条，就能实现大多数深度学习研究了，是不是赞赞的。

传送门

最后，附Chollet推特原文地址： https://twitter.com/fchollet/status/1105139360226140160

Google Colab Notebook地址： https://colab.research.google.com/drive/17u-pRZJnKN0gO5XZmq8n5A2bKGrfKEUg#scrollTo=rwREGJ7Wiyl9

— 完 —

keras 深度学习 tensorflow https 机器学习

0 人点赞

TensorFlow 2.0快速上手指南12条：“Keras之父”亲授 | 高赞热贴

铜灵 编译整理 量子位 出品| 公众号 QbitAI

必备指南12条

传送门

铜灵编译整理量子位出品| 公众号 QbitAI