Abstract
Retrieval-augmented language models (LMs) use non-parametric memory to
substantially outperform their non-retrieval counterparts on perplexity-based
evaluations, but it is an open question whether they achieve similar gains in
few- and zero-shot end-task accuracy. We extensively study one such model, the
k-nearest neighbor LM (kNN-LM), showing that the gains marginally transfer. The
main challenge is to achieve coverage of the verbalizer tokens that define the
different end-task class labels. To address this challenge, we also introduce
kNN-Prompt, a simple and effective kNN-LM with automatically expanded fuzzy
verbalizers (e.g. to expand terrible to also include silly and other
task-specific synonyms for sentiment classification). Across nine diverse
end-tasks, using kNN-Prompt with GPT-2 large yields significant performance
boosts over strong zero-shot baselines (13.4% absolute improvement over the
base LM on average). We also show that other advantages of non-parametric
augmentation hold for end tasks; kNN-Prompt is effective for domain adaptation
with no further training, and gains increase with the size of the retrieval
model.
Users
Please
log in to take part in the discussion (add own reviews or comments).