【关系抽取-mre-in-one-pass】模型的建立

模型创建相关代码

代码语言：javascript复制

def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,
                 labels, num_labels, use_one_hot_embeddings, extras):
  """Creates a classification model."""
  model = modeling.BertModel(
      config=bert_config,
      is_training=is_training,
      input_ids=input_ids,
      input_mask=input_mask,
      token_type_ids=segment_ids,
      use_one_hot_embeddings=use_one_hot_embeddings,
      extras=extras)

  output_layer = model.get_sequence_output()

  from_seq_length = output_layer.shape[1].value
  hidden_size = output_layer.shape[2].value

  # B 10 F 768
  output_layer = tf.stack([output_layer] * FLAGS.max_num_relations, axis=1)
  # B 10 F 1
  e1_mas = tf.reshape(extras.e1_mas, [-1, FLAGS.max_num_relations, from_seq_length, 1])
  # B 10 F 768
  e1 = tf.multiply(output_layer, tf.to_float(e1_mas))
  # B 10 768
  e1 = tf.reduce_sum(e1, axis=-2) / tf.maximum(1.0, tf.reduce_sum(tf.to_float(e1_mas), axis=-2))
  # B*10 768
  e1 = tf.reshape(e1, [-1, hidden_size])
    # B 10 F 1
  e2_mas = tf.reshape(extras.e2_mas, [-1, FLAGS.max_num_relations, from_seq_length, 1])
  # B 10 F 768
  e2 = tf.multiply(output_layer, tf.to_float(e2_mas))
  # B 10 768
  e2 = tf.reduce_sum(e2, axis=-2) / tf.maximum(1.0, tf.reduce_sum(tf.to_float(e2_mas), axis=-2))
  # B*10 768
  e2 = tf.reshape(e2, [-1, hidden_size])
  # B*10 768*2
  output_layer = tf.concat([e1, e2], axis=-1)

  output_weights = tf.get_variable(
      "cls/entity/output_weights", [num_labels, hidden_size*2],
      initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable(
      "cls/entity/output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):
    if is_training:
      # I.e., 0.1 dropout
      output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)
    # B*10 num_label
    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    # B*10 num_label
    logits = tf.nn.bias_add(logits, output_bias)
    # B*10 num_label
    probabilities = tf.nn.softmax(logits, axis=-1)
    # B*10 num_label
    log_probs = tf.nn.log_softmax(logits, axis=-1)
    # B*10
    labels = tf.reshape(labels, [-1])
    # B*10 num_label
    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)
    # B*10
    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
    # B*10
    cls_mask = tf.reshape(tf.to_float(extras.cls_mask), [-1])
    # B*10
    per_example_loss = per_example_loss * cls_mask

    loss = tf.reduce_sum(per_example_loss) / tf.reduce_sum(cls_mask)

    return (loss, per_example_loss, logits, probabilities)

说明

通过bert得到的输出output_layer的形状是[4,128,768]（这里表是句子的表示），其中4是batchsize的大小，128是最大的句子长度，768是每一个字对应的维度大小。
我们预先定义了一个最大的关系数量为12，我们将 output_layer变形为[4,12,128,768]，这里的12是定义的最大的关系相数量。
对于extras.e1_mas而言，它的维度是[4,1536]，我们将他们重新调整为[4,12,128,1]
接着将output_layer：[4,12,128,768]和e1_mas：[4,12,128,1]进行逐元素相乘，得到e1：[4,12,128,768]，由于e1_mas是一个mask矩阵，相乘之后我们就将不是实体的字进行屏蔽了。
对实体表示进行归一化后得到[4,12,768]，在转换为[48,768]。
对一个句子中的另一个实体进行同样的处理，得到e2，维度是[48,768]。
将e1和e2进行拼接，得到最终的output_layer：[48,1536]
经过一个全连接层，即：[48,1536]和[6,1536]作矩阵乘法，得到[48,6]
最后就是计算一些相关的东西了，比如loss等。这里需要注意的是，我们需要将没有关系的地方忽略掉，让它们不参与计算。

参考代码：https://sourcegraph.com/github.com/helloeve/mre-in-one-pass/-/blob/run_classifier.py#L379

layer mask output

0 人点赞