The paper discusses the capabilities of large pre-trained language models and their limitations in accessing and manipulating knowledge. The authors introduce retrieval-augmented generation (RAG) models that combine pre-trained parametric and non-parametric memory for language generation. The study explores the effectiveness of RAG models in various NLP tasks and compares them with other architectures.
The documentation details the SubQuestionQueryEngine in the LlamaIndex library. This query engine breaks down complex queries into multiple sub-questions, which are then directed to their target query engine for execution. The responses from the sub-questions are synthesized to produce the final response.