为让AI更会聊天,Facebook又开源了,我们先来看下 ParlAI 的 3 大特色:
- 集成了大量的公开数据集---从公开领域闲聊到专业的视觉问题问答一应俱全;
- 海量参考模型,应有尽有;
- 无缝衔接亚马逊 Mechanical Turk 系统,完成数据收集、训练和人工评估。
该项目目前已经提供了 100 多个流行数据集,可使用相同的 API 进行调用,其中包括 PersonaChat、DailyDialog、维基百科向导、Empathetic Dialogues、SQuAD、MS MARCO、QuAC、HotpotQA、QACNN 和 QADailyMail、CBT、BookTest、bAbI Dialogue 任务、Ubuntu 对话、OpenSubtitles、图像聊天、VQA、VisDial 和 CLEVR 等等。
我们可以参阅相关的论文来了解 ParlAI 的情况
“ParlAI:A Dialog Research Software Platform”,arXiv:1705.06476。
安装ParlAI
ParlAI 目前需要 Python3.7 和 Pytorch 1.6 或更高版本,核心模块的依赖项在 requirements.txt 中列出,包含(在 parlai/agents 中)的一些模型有额外的要求。强烈建议您在 venv 或 conda 环境中安装 ParlAI。
首先我们新建一个目录,如下所示:
代码语言:javascript复制 mkdir parlAI
接下来我们进行下载和安装:
代码语言:javascript复制git clone https://github.com/facebookresearch/ParlAI.git
cd ParlAI; python setup.py develop
这时可以看到输出如下:
代码语言:javascript复制Cloning into 'ParlAI'...
remote: Enumerating objects: 47, done.
remote: Counting objects: 100% (47/47), done.
remote: Compressing objects: 100% (44/44), done.
remote: Total 24850 (delta 18), reused 10 (delta 3), pack-reused 24803
Receiving objects: 100% (24850/24850), 28.80 MiB | 1.36 MiB/s, done.
Resolving deltas: 100% (17462/17462), done.
可以看出安装速度还是很快的。
之后进入目录,执行测试代码,这个测试是在 1k 训练样本的 BabI 任务上随机输出 10 条任务 1 的样例结果:
代码语言:javascript复制python examples/display_data.py -t babi:task1k:1
中间可能还有很多依赖需要安装,这里就不一一说明了,大家请自行处理吧。
正常运行的结果,我们来看一下:
代码语言:javascript复制[ optional arguments: ]
[ display_ignore_fields: agent_reply ]
[ max_display_len: 1000 ]
[ num_examples: 10 ]
[ Main ParlAI Arguments: ]
[ batchsize: 1 ]
[ datapath: /home/xxx/parlAI/ParlAI/data ]
[ datatype: train:stream ]
[ download_path: /home/xxx/parlAI/ParlAI/downloads ]
[ hide_labels: False ]
[ image_mode: raw ]
[ init_opt: None ]
[ multitask_weights: [1] ]
[ numthreads: 1 ]
[ show_advanced_args: False ]
[ task: babi:task1k:1 ]
[ ParlAI Model Arguments: ]
[ dict_class: None ]
[ init_model: None ]
[ model: None ]
[ model_file: None ]
[ PytorchData Arguments: ]
[ batch_length_range: 5 ]
[ batch_sort_cache_type: pop ]
[ batch_sort_field: text ]
[ numworkers: 4 ]
[ pytorch_context_length: -1 ]
[ pytorch_datapath: None ]
[ pytorch_include_labels: True ]
[ pytorch_preprocess: False ]
[ pytorch_teacher_batch_sort: False ]
[ pytorch_teacher_dataset: None ]
[ pytorch_teacher_task: None ]
[ shuffle: False ]
[ ParlAI Image Preprocessing Arguments: ]
[ image_cropsize: 224 ]
[ image_size: 256 ]
[ Current ParlAI commit: e8d0a75d291c7bb4b4e5565d60a899f794c10963 ]
[creating task(s): babi:task1k:1]
[building data: /home/xxx/parlAI/ParlAI/data/bAbI]
[ downloading: http://parl.ai/downloads/babi/babi.tar.gz to /home/xxx/parlAI/ParlAI/data/bAbI/babi.tar.gz ]
Downloading babi.tar.gz: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 19.2M/19.2M [00:05<00:00, 3.29MB/s]
unpacking babi.tar.gz
[loading fbdialog data:/home/xxx/parlAI/ParlAI/data/bAbI/tasks_1-20_v1-2/en-valid-nosf/qa1_train.txt]
[loading fbdialog data:/home/xxx/parlAI/ParlAI/data/bAbI/tasks_1-20_v1-2/en-valid-nosf/qa1_train.txt]
[babi:task1k:1]: Mary moved to the bathroom.
John went to the hallway.
Where is Mary?
[labels: bathroom]
[label_candidates: office|hallway|kitchen|bathroom|bedroom|...and 1 more]
~~
[babi:task1k:1]: Daniel went back to the hallway.
Sandra moved to the garden.
Where is Daniel?
[labels: hallway]
[label_candidates: office|hallway|kitchen|bathroom|bedroom|...and 1 more]
~~
[babi:task1k:1]: John moved to the office.
Sandra journeyed to the bathroom.
Where is Daniel?
[labels: hallway]
[label_candidates: office|hallway|kitchen|bathroom|bedroom|...and 1 more]
~~
[babi:task1k:1]: Mary moved to the hallway.
Daniel travelled to the office.
Where is Daniel?
[labels: office]
[label_candidates: office|hallway|kitchen|bathroom|bedroom|...and 1 more]
~~
[babi:task1k:1]: John went back to the garden.
John moved to the bedroom.
Where is Sandra?
[labels: bathroom]
[label_candidates: office|hallway|kitchen|bathroom|bedroom|...and 1 more]
- - - - - - - - - - - - - - - - - - - - -
~~
[babi:task1k:1]: Sandra travelled to the office.
Sandra went to the bathroom.
Where is Sandra?
[labels: bathroom]
[label_candidates: office|hallway|kitchen|bathroom|bedroom|...and 1 more]
~~
[babi:task1k:1]: Mary went to the bedroom.
Daniel moved to the hallway.
Where is Sandra?
[labels: bathroom]
[label_candidates: office|hallway|kitchen|bathroom|bedroom|...and 1 more]
~~
[babi:task1k:1]: John went to the garden.
John travelled to the office.
Where is Sandra?
[labels: bathroom]
[label_candidates: office|hallway|kitchen|bathroom|bedroom|...and 1 more]
~~
[babi:task1k:1]: Daniel journeyed to the bedroom.
Daniel travelled to the hallway.
Where is John?
[labels: office]
[label_candidates: office|hallway|kitchen|bathroom|bedroom|...and 1 more]
~~
[babi:task1k:1]: John went to the bedroom.
John travelled to the office.
Where is Daniel?
[labels: hallway]
[label_candidates: office|hallway|kitchen|bathroom|bedroom|...and 1 more]
- - - - - - - - - - - - - - - - - - - - -
~~
[ loaded 180 episodes with a total of 900 examples ]
当然,这个测试是英文的,你可以尝试按照他这个思路梳理一个中文的,对于一般的小任务已经足够了。
最后附上 GitHub 项目地址:https://github.com/facebookresearch/ParlAI,感兴趣的小伙伴快去学习吧。