NLP Research - Visual Question Answering (VQA)
Abstract
Visual question answering (VQA) systems aim at responding to natural language questions about visual content with a valid answer. Despite the agreement or majority voting among crowd-workers, a significant portion of visual questions have been observed to be subjective and/or ambiguous. Previous work has analyzed many VQA examples from popular datasets and found that people provide multiple different answers in about half of the questions. This makes the evaluation of open-ended VQA tasks far more challenging. To address this challenge, we propose Alternative Answer Sets (AAS) for such visual questions curated using existing NLP tools and techniques. We then modify best VQA solvers to support multiple plausible answers for a visual question and show the performance improvement over the GQA and VQA datasets.
For The Record...
I wanted the paper title to be "My Way or the Highway" which points out the issue of VQA datsets having only one label, but the other members had never heard that saying before and did not like it (English is not their primary language).