UPDF AI

Attention on Attention: Architectures for Visual Question Answering (VQA)

Jasdeep Singh,Vincent Ying,Alex Nutkiewicz

2018 · ArXiv: 1803.07724
arXiv.org · 27 Citations

TLDR

This work builds upon the model which placed first in the VQA Challenge by developing thirteen new attention mechanisms and introducing a simplified classifier, outperforming the existing state-of theart single model's validation score.

Abstract

Visual Question Answering (VQA) is an increasingly popular topic in deep learning research, requiring coordination of natural language processing and computer vision modules into a single architecture. We build upon the model which placed first in the VQA Challenge by developing thirteen new attention mechanisms and introducing a simplified classifier. We performed 300 GPU hours of extensive hyperparameter and architecture searches and were able to achieve an evaluation score of 64.78%, outperforming the existing state-of-the-art single model's validation score of 63.15%.