NAACL2022：（代码实践）好的视觉引导促进更好的特征提取，多模态命名实体识别（附源代码下载）

关注并星标

从此不迷路

计算机视觉研究院

公众号ID｜ComputerVisionGzq

学习群｜扫码在主页获取加入方式

论文地址：https://arxiv.org/pdf/2205.03521.pdf

代码地址: https://github.com/zjunlp/HVPNeT

计算机视觉研究院专栏

作者：Edison_G

多模态命名实体识别和关系提取（MNER 和 MRE）是信息提取中的一个基础和关键分支。

概括

多模态命名实体识别和关系提取（MNER和MRE）是信息提取中的一个基础和关键分支。然而，当文本中包含不相关的对象图像时，现有的MNER和MRE方法通常会受到错误敏感性的影响。

为了解决这些问题，有研究者提出了一种新颖的分层视觉前缀融合网络（HVPNeT），用于视觉增强实体和关系提取，旨在实现更有效和更强大的性能。

具体来说，将视觉表示视为可插入的视觉前缀，以指导错误不敏感预测决策的文本表示。进一步提出了一种动态门控聚合策略，以实现分层多尺度视觉特征作为融合的视觉前缀。在三个基准数据集上进行的大量实验证明了新方法的有效性，并实现了最先进的性能。

新框架

Collection of Pyramidal Visual Feature

一方面，与句子关联的图像维护了与句子中的实体相关的多个视觉对象，进一步提供了更多的语义知识来辅助信息提取。另一方面，全局图像特征可能表达抽象概念，起到弱学习信号的作用。因此，为多模态实体和关系提取收集了多个视觉线索，其中包括以区域图像为重要信息，以全局图像为补充。

Dynamic Gated Aggregation

尽管不同大小的对象可以在相应的尺度上具有适当的特征表示，但决定视觉骨干中的哪个块为Transformer中的每一层分配视觉前缀并非易事。为了应对这一挑战，研究者建议构建密集连接的路由空间，其中分层多尺度视觉特征与每个变压器层连接。

Dynamic Gate Module

通过动态门模块进行例行处理，可以将其视为路径决策的过程。动态门的动机是预测一个归一化向量，它表示执行每个块的视觉特征的程度。

Aggregated Hierarchical Feature

基于上述动态门g(l)，可以推导出最终聚合的层次视觉特征Vgated，以匹配Transformer中的第l层：

Visual Prefix-guided Fusion

将分层多尺度图像特征作为视觉前缀，并在BERT的每个自注意力层将视觉前缀序列添加到文本序列中。

将分层多尺度视觉特征作为每个融合层的视觉前缀，并依次进行多模态注意力以更新所有文本状态。通过这种方式，最终的文本状态同时对上下文和跨模态语义信息进行编码。这有利于降低不相关对象元素的错误敏感性。

实验

代码实践

To run the codes, you need to install the requirements:

代码语言：javascript复制

pip install -r requirements.txt

Data Collection：

The datasets that we used in our experiments are as follows:

Twitter2015 & Twitter2017 The text data follows the conll format. You can download the Twitter2015 data via this link and download the Twitter2017 data via this link. Please place them in data/NER_data. You can also put them anywhere and modify the path configuration in run.py
MNER The MRE dataset comes from MEGA and you can download the MRE dataset with detected visual objects using folloing command:

代码语言：javascript复制

cd datawget 120.27.214.45/Data/re/multimodal/data.tar.gztar -xzvf data.tar.gzmv data RE_data

Data Preprocess：

代码语言：javascript复制

HMNeT |-- data  # conll2003, mit-movie, mit-restaurant and atis |    |-- NER_data |    |    |-- twitter2015  # text data |    |    |    |-- train.txt |    |    |    |-- valid.txt |    |    |    |-- test.txt |    |    |    |-- twitter2015_train_dict.pth  # {full-image-[object-image]} |    |    |    |-- ... |    |    |-- twitter2015_images       # full image data |    |    |-- twitter2015_aux_images   # object image data |    |    |-- twitter2017 |    |    |-- twitter2017_images |    |-- RE_data |    |    |-- ... |-- models  # models |    |-- bert_model.py |    |-- modeling_bert.py |-- modules |    |-- metrics.py    # metric |    |-- train.py  # trainer |-- processor |    |-- dataset.py    # processor, dataset |-- logs     # code logs |-- run.py   # main  |-- run_ner_task.sh |-- run_re_task.sh

Train：

NER Task

The data path and GPU related configuration are in the run.py. To train ner model, run this script.

代码语言：javascript复制

bash run_twitter15.shbash run_twitter17.sh

checkpoints can be download via Twitter15_ckpt, Twitter17_ckpt.

RE Task

To train re model, run this script.

代码语言：javascript复制

bash run_re_task.sh

checkpoints can be download via re_ckpt

Test：

NER Task

To test ner model, you can download the model chekpoints we provide via Twitter15_ckpt, Twitter17_ckpt or use your own tained model and set load_path to the model path, then run following script:

代码语言：javascript复制

python -u run.py       --dataset_name="twitter15/twitter17"       --bert_name="bert-base-uncased"       --seed=1234       --only_test       --max_seq=80       --use_prompt       --prompt_len=4       --sample_ratio=1.0       --load_path='your_ner_ckpt_path'

RE Task

To test re model, you can download the model chekpoints we provide via re_ckpt or use your own tained model and set load_path to the model path, then run following script:

代码语言：javascript复制

python -u run.py       --dataset_name="MRE"       --bert_name="bert-base-uncased"       --seed=1234       --only_test       --max_seq=80       --use_prompt       --prompt_len=4       --sample_ratio=1.0       --load_path='your_re_ckpt_path'

© THE END

转载请联系本公众号获得授权

计算机视觉研究院学习群等你加入！

计算机视觉研究院主要涉及深度学习领域，主要致力于人脸检测、人脸识别，多目标检测、目标跟踪、图像分割等研究方向。研究院接下来会不断分享最新的论文算法新框架，我们这次改革不同点就是，我们要着重”研究“。之后我们会针对相应领域分享实践过程，让大家真正体会摆脱理论的真实场景，培养爱动手编程爱动脑思考的习惯！

扫码关注

计算机视觉研究院

公众号ID｜ComputerVisionGzq

学习群｜扫码在主页获取加入方式

往期推荐

图像处理图像识别 ide

0 人点赞

上一篇：分享雷军22年前编写的代码