In open-domain dialogue intelligent agents should exhibit the use of
knowledge, however there are few convincing demonstrations of this to date. The
most popular sequence to sequence models typically "generate and hope" generic
utterances that can be memorized in the weights of the model when mapping from
input utterance(s) to output, rather than employing recalled knowledge as
context. Use of knowledge has so far proved difficult, in part because of the
lack of a supervised learning benchmark task which exhibits knowledgeable open
dialogue with clear grounding. To that end we collect and release a large
dataset with conversations directly grounded with knowledge retrieved from
Wikipedia. We then design architectures capable of retrieving knowledge,
reading and conditioning on it, and finally generating natural responses. Our
best performing dialogue models are able to conduct knowledgeable discussions
on open-domain topics as evaluated by automatic metrics and human evaluations,
while our new benchmark allows for measuring further improvements in this
important research direction.