本文原载于微信公众号:AI部落联盟(AI_Tribe),AI研习社经授权转载。欢迎关注 AI部落联盟 微信公众号、知乎专栏 AI部落、及 AI研习社博客专栏。
对于初学NLP的人,了解NLP的各项技术非常重要;对于想进阶的人,了解各项技术的评测指标、数据集很重要;对于想做学术和研究的人,了解各项技术在对应的评测数据集上达到SOTA效果的Paper非常重要,因为了解评测数据集、评测指标和目前最好的结果是NLP研究工作的基础。因此,本文整理了常见的32项NLP任务以及对应的评测数据、评测指标、目前的SOTA结果以及对应的Paper。
1. 先来看下按粒度对NLP任务进行划分:词粒度、短语粒度、句子粒度、篇章粒度以及对应的一些主要任务。以便于初学者能明确这些NLP基础任务之间的关系。
2. 再来看看周明老师按基础任务、核心任务对NLP的划分(http://zhigu.news.cn/2017-06/08/c_129628590.htm)。
3. 再来看看王海峰老师在AAAI2017的关于百度NLP的keynote(http://www.aaai.org/Conferences/AAAI/2017/aaai17inpractice.php),主要是让大家明白NLP的基础、技术、应用之间的关系。
4. 常见的32项NLP任务以及对应的评测数据、评测指标、目前的SOTA结果以及对应的Paper。
任务 | 描述 | corpus/dataset | 评价指标 | SOTA结果 | Papers |
---|---|---|---|---|---|
Chunking | 组块分析 | Penn Treebank | F1 | 95.77 | A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks |
Common sense reasoning | 常识推理 | Event2Mind | cross-entropy | 4.22 | Event2Mind: Commonsense Inference on Events, Intents, and Reactions |
Parsing | 句法分析 | Penn Treebank | F1 | 95.13 | Constituency Parsing with a Self-Attentive Encoder |
Coreference resolution | 指代消解 | CoNLL 2012 | average F1 | 73 | Higher-order Coreference Resolution with Coarse-to-fine Inference |
Dependency parsing | 依存句法分析 | Penn Treebank | POSUASLAS | 97.395.4493.76 | Deep Biaffine Attention for Neural Dependency Parsing |
Task-Oriented Dialogue/Intent Detection | 任务型对话/意图识别 | ATIS/Snips | accuracy | 94.1 97.0 | Slot-Gated Modeling for Joint Slot Filling and Intent Prediction |
Task-Oriented Dialogue/Slot Filling | 任务型对话/槽填充 | ATIS/Snips | F1 | 95.288.8 | Slot-Gated Modeling for Joint Slot Filling and Intent Prediction |
Task-Oriented Dialogue/Dialogue State Tracking | 任务型对话/状态追踪 | DSTC2 | AreaFoodPriceJoint | 90849272 | Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems |
Domain adaptation | 领域适配 | Multi-Domain Sentiment Dataset | average accuracy | 79.15 | Strong Baselines for Neural Semi-supervised Learning under Domain Shift |
Entity Linking | 实体链接 | AIDA CoNLL-YAGO | Micro-F1-strongMacro-F1-strong | 86.6 89.4 | End-to-End Neural Entity Linking |
Information Extraction | 信息抽取 | ReVerb45K | PrecisionRecallF1 | 62.784.481.9 | CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information |
Grammatical Error Correction | 语法错误纠正 | JFLEG | GLEU | 61.5 | Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation |
Language modeling | 语言模型 | Penn Treebank | Validation perplexity Test perplexity | 48.3347.69 | Breaking the Softmax Bottleneck: A High-Rank RNN Language Model |
Lexical Normalization | 词汇规范化 | LexNorm2015 | F1PrecisionRecall | 86.39 93.53 80.26 | MoNoise: Modeling Noise Using a Modular Normalization System |
Machine translation | 机器翻译 | WMT 2014 EN-DE | BLEU | 35.0 | Understanding Back-Translation at Scale |
Multimodal Emotion Recognition | 多模态情感识别 | IEMOCAP | Accuracy | 76.5 | Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling |
Multimodal Metaphor Recognition | 多模态隐喻识别 | verb-noun pairs adjective-noun pairs | F1 | 0.750.79 | Black Holes and White Rabbits: Metaphor Identification with Visual Features |
Multimodal Sentiment Analysis | 多模态情感分析 | MOSI | Accuracy | 80.3 | Context-Dependent Sentiment Analysis in User-Generated Videos |
Named entity recognition | 命名实体识别 | CoNLL 2003 | F1 | 93.09 | Contextual String Embeddings for Sequence Labeling |
Natural language inference | 自然语言推理 | SciTail | Accuracy | 88.3 | Improving Language Understanding by Generative Pre-Training |
Part-of-speech tagging | 词性标注 | Penn Treebank | Accuracy | 97.96 | Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings |
Question answering | 问答 | CliCR | F1 | 33.9 | CliCR: A Dataset of Clinical Case Reports for Machine Reading Comprehension |
Word segmentation | 分词 | VLSP 2013 | F1 | 97.90 | A Fast and Accurate Vietnamese Word Segmenter |
Word Sense Disambiguation | 词义消歧 | SemEval 2015 | F1 | 67.1 | Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison |
Text classification | 文本分类 | AG News | Error rate | 5.01 | Universal Language Model Fine-tuning for Text Classification |
Summarization | 摘要 | Gigaword | ROUGE-1ROUGE-2ROUGE-L | 37.0419.0334.46 | Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization |
Sentiment analysis | 情感分析 | IMDb | Accuracy | 95.4 | Universal Language Model Fine-tuning for Text Classification |
Semantic role labeling | 语义角色标注 | OntoNotes | F1 | 85.5 | Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling |
Semantic parsing | 语义解析 | LDC2014T12 | F1 NewswireF1 Full | 0.710.66 | AMR Parsing with an Incremental Joint Model |
Semantic textual similarity | 语义文本相似度 | SentEval | MRPCSICK-RSICK-ESTS | 78.6/84.40.88887.878.9/78.6 | Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning |
Relationship Extraction | 关系抽取 | New York Times Corpus | P@10%P@30% | 73.659.5 | RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information |
Relation Prediction | 关系预测 | WN18RR | H@10H@1MRR | 59.0245.3749.83 | Predicting Semantic Relations using Global Graph Properties |