Abstract
Neural networks (NNs) are able to perform tasks that rely on compositional
structure even though they lack obvious mechanisms for representing this
structure. To analyze the internal representations that enable such success, we
propose ROLE, a technique that detects whether these representations implicitly
encode symbolic structure. ROLE learns to approximate the representations of a
target encoder E by learning a symbolic constituent structure and an embedding
of that structure into E's representational vector space. The constituents of
the approximating symbol structure are defined by structural positions ---
roles --- that can be filled by symbols. We show that when E is constructed to
explicitly embed a particular type of structure (string or tree), ROLE
successfully extracts the ground-truth roles defining that structure. We then
analyze a GRU seq2seq network trained to perform a more complex compositional
task (SCAN), where there is no ground truth role scheme available. For this
model, ROLE successfully discovers an interpretable symbolic structure that the
model implicitly uses to perform the SCAN task, providing a comprehensive
account of the representations that drive the behavior of a frequently-used but
hard-to-interpret type of model. We verify the causal importance of the
discovered symbolic structure by showing that, when we systematically
manipulate hidden embeddings based on this symbolic structure, the model's
resulting output is changed in the way predicted by our analysis. Finally, we
use ROLE to explore whether popular sentence embedding models are capturing
compositional structure and find evidence that they are not; we conclude by
discussing how insights from ROLE can be used to impart new inductive biases to
improve the compositional abilities of such models.
Users
Please
log in to take part in the discussion (add own reviews or comments).