博客 | 常见32项NLP任务及其评价指标和对应达到SOTA的paper

2019-05-08 16:05:46 浏览数 (1)

本文原载于微信公众号:AI部落联盟(AI_Tribe)AI研习社经授权转载。欢迎关注 AI部落联盟 微信公众号、知乎专栏 AI部落、及 AI研习社博客专栏

对于初学NLP的人,了解NLP的各项技术非常重要;对于想进阶的人,了解各项技术的评测指标、数据集很重要;对于想做学术和研究的人,了解各项技术在对应的评测数据集上达到SOTA效果的Paper非常重要,因为了解评测数据集、评测指标和目前最好的结果是NLP研究工作的基础。因此,本文整理了常见的32项NLP任务以及对应的评测数据、评测指标、目前的SOTA结果以及对应的Paper。

1. 先来看下按粒度对NLP任务进行划分:词粒度、短语粒度、句子粒度、篇章粒度以及对应的一些主要任务。以便于初学者能明确这些NLP基础任务之间的关系。

2. 再来看看周明老师按基础任务、核心任务对NLP的划分(http://zhigu.news.cn/2017-06/08/c_129628590.htm)

3. 再来看看王海峰老师在AAAI2017的关于百度NLP的keynote(http://www.aaai.org/Conferences/AAAI/2017/aaai17inpractice.php),主要是让大家明白NLP的基础、技术、应用之间的关系。

4. 常见的32项NLP任务以及对应的评测数据、评测指标、目前的SOTA结果以及对应的Paper。

任务

描述

corpus/dataset

评价指标

SOTA结果

Papers

Chunking

组块分析

Penn Treebank

F1

95.77

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Common sense reasoning

常识推理

Event2Mind

cross-entropy

4.22

Event2Mind: Commonsense Inference on Events, Intents, and Reactions

Parsing

句法分析

Penn Treebank

F1

95.13

Constituency Parsing with a Self-Attentive Encoder

Coreference resolution

指代消解

CoNLL 2012

average F1

73

Higher-order Coreference Resolution with Coarse-to-fine Inference

Dependency parsing

依存句法分析

Penn Treebank

POSUASLAS

97.395.4493.76

Deep Biaffine Attention for Neural Dependency Parsing

Task-Oriented Dialogue/Intent Detection

任务型对话/意图识别

ATIS/Snips

accuracy

94.1 97.0

Slot-Gated Modeling for Joint Slot Filling and Intent Prediction

Task-Oriented Dialogue/Slot Filling

任务型对话/槽填充

ATIS/Snips

F1

95.288.8

Slot-Gated Modeling for Joint Slot Filling and Intent Prediction

Task-Oriented Dialogue/Dialogue State Tracking

任务型对话/状态追踪

DSTC2

AreaFoodPriceJoint

90849272

Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems

Domain adaptation

领域适配

Multi-Domain Sentiment Dataset

average accuracy

79.15

Strong Baselines for Neural Semi-supervised Learning under Domain Shift

Entity Linking

实体链接

AIDA CoNLL-YAGO

Micro-F1-strongMacro-F1-strong

86.6 89.4

End-to-End Neural Entity Linking

Information Extraction

信息抽取

ReVerb45K

PrecisionRecallF1

62.784.481.9

CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

Grammatical Error Correction

语法错误纠正

JFLEG

GLEU

61.5

Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation

Language modeling

语言模型

Penn Treebank

Validation perplexity Test perplexity

48.3347.69

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Lexical Normalization

词汇规范化

LexNorm2015

F1PrecisionRecall

86.39 93.53 80.26

MoNoise: Modeling Noise Using a Modular Normalization System

Machine translation

机器翻译

WMT 2014 EN-DE

BLEU

35.0

Understanding Back-Translation at Scale

Multimodal Emotion Recognition

多模态情感识别

IEMOCAP

Accuracy

76.5

Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling

Multimodal Metaphor Recognition

多模态隐喻识别

verb-noun pairs adjective-noun pairs

F1

0.750.79

Black Holes and White Rabbits: Metaphor Identification with Visual Features

Multimodal Sentiment Analysis

多模态情感分析

MOSI

Accuracy

80.3

Context-Dependent Sentiment Analysis in User-Generated Videos

Named entity recognition

命名实体识别

CoNLL 2003

F1

93.09

Contextual String Embeddings for Sequence Labeling

Natural language inference

自然语言推理

SciTail

Accuracy

88.3

Improving Language Understanding by Generative Pre-Training

Part-of-speech tagging

词性标注

Penn Treebank

Accuracy

97.96

Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings

Question answering

问答

CliCR

F1

33.9

CliCR: A Dataset of Clinical Case Reports for Machine Reading Comprehension

Word segmentation

分词

VLSP 2013

F1

97.90

A Fast and Accurate Vietnamese Word Segmenter

Word Sense Disambiguation

词义消歧

SemEval 2015

F1

67.1

Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison

Text classification

文本分类

AG News

Error rate

5.01

Universal Language Model Fine-tuning for Text Classification

Summarization

摘要

Gigaword

ROUGE-1ROUGE-2ROUGE-L

37.0419.0334.46

Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization

Sentiment analysis

情感分析

IMDb

Accuracy

95.4

Universal Language Model Fine-tuning for Text Classification

Semantic role labeling

语义角色标注

OntoNotes

F1

85.5

Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling

Semantic parsing

语义解析

LDC2014T12

F1 NewswireF1 Full

0.710.66

AMR Parsing with an Incremental Joint Model

Semantic textual similarity

语义文本相似度

SentEval

MRPCSICK-RSICK-ESTS

78.6/84.40.88887.878.9/78.6

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Relationship Extraction

关系抽取

New York Times Corpus

P@10%P@30%

73.659.5

RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information

Relation Prediction

关系预测

WN18RR

H@10H@1MRR

59.0245.3749.83

Predicting Semantic Relations using Global Graph Properties

0 人点赞