Аннотация
Reference is the crucial property of language that allows us to connect
linguistic expressions to the world. Modeling it requires handling both
continuous and discrete aspects of meaning. Data-driven models excel at the
former, but struggle with the latter, and the reverse is true for symbolic
models.
We propose a fully data-driven, end-to-end trainable model that, while
operating on continuous multimodal representations, learns to organize them
into a discrete-like entity library. We also introduce a referential task to
test it, cross-modal tracking. Our model beats standard neural network
architectures, but is outperformed by some parametrizations of Memory Networks,
another model with external memory.
Пользователи данного ресурса
Пожалуйста,
войдите в систему, чтобы принять участие в дискуссии (добавить собственные рецензию, или комментарий)