Abstract
Question answering is an important task for autonomous agents and virtual
assistants alike and was shown to support the disabled in efficiently
navigating an overwhelming environment. Many existing methods focus on
observation-based questions, ignoring our ability to seamlessly combine
observed content with general knowledge. To understand interactions with a
knowledge base, a dataset has been introduced recently and keyword matching
techniques were shown to yield compelling results despite being vulnerable to
misconceptions due to synonyms and homographs. To address this issue, we
develop a learning-based approach which goes straight to the facts via a
learned embedding space. We demonstrate state-of-the-art results on the
challenging recently introduced fact-based visual question answering dataset,
outperforming competing methods by more than 5\%.
Users
Please
log in to take part in the discussion (add own reviews or comments).