@brusilovsky

PaperLM: A Pre-trained Model for Hierarchical Examination Paper Representation Learning

, , , , , , , and . Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, page 2178-2187. ACM, (October 2023)
DOI: 10.1145/3583780.3615003

Abstract

Representation learning of examination papers is significantly crucial for online education systems, as it benefits various applications such as estimating paper difficulty and examination paper retrieval. Previous works mainly explore the representation learning of individual questions in an examination paper, with limited attention given to the examination paper as a whole. In fact, the structure of examination papers is strongly correlated with paper properties such as paper difficulty, which existing paper representation methods fail to capture adequately. To this end, we propose a pre-trained model namely PaperLM to learn the representation of examination papers. Our model integrates both the text content and hierarchical structure of examination papers within a single framework by converting the path of the Examination Organization Tree (EOT) into embedding. Furthermore, we specially design three pre-training objectives for PaperLM, namely EOT Node Relationship Prediction (ENRP), Question Type Prediction (QTP) and Paper Contrastive Learning (PCL), aiming to capture features from text and structure effectively. We pre-train our model on a real-world examination paper dataset, and then evaluate the model with three down-stream tasks: paper difficulty estimation, examination paper retrieval, and paper clustering. The experimental results demonstrate the effectiveness of our method.

Description

PaperLM: A Pre-trained Model for Hierarchical Examination Paper Representation Learning | Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Links and resources

Tags

community

  • @brusilovsky
  • @dblp
@brusilovsky's tags highlighted