机器视觉问答开源项目介绍

2018-07-20 16:45:48 浏览数 (1)

keras中文doc之三结尾给出了一个非常简单的vqa视觉问答的程序demo，我们今天看一个复杂的tensorflow版本的VQA。

https://github.com/JamesChuanggg/VQA-tensorflow

Tensorflow Implementation of Deeper LSTM normalized CNN for Visual Question Answering

此tensorflow版本的VQA精度达到原torch程序版本：

This current code can get 58.16 on Open-Ended and 63.09 on Multiple-Choice on test-standard split.

效果：

但是代码只有400多行，喜欢的朋友可以看起来。

相对于此版本的VQA，改进版本的VQA增加了注意力及层级关系

https://github.com/jiasenlu/HieCoAttenVQA

Hierarchical Question-Image Co-Attention for Visual Question Answering

注意力效果如图：

注意力在视频中的应用可以参考：

https://github.com/tsenghungchen/SA-tensorflow

阅读原文看完整代码。

0 人点赞