Article,

Enhancing information retrieval through concept-based language modeling and semantic smoothing

L. Said Lhadj, M. Boughanem, and K. Amrouche.
Journal of the Association for Information Science & Technology, 67 (12): 2909--2927 (December 2016)
DOI: 10.1002/asi.23553

Abstract

Traditionally, many information retrieval models assume that terms occur in documents independently. Although these models have already shown good performance, the word independency assumption seems to be unrealistic from a natural language point of view, which considers that terms are related to each other. Therefore, such an assumption leads to two well-known problems in information retrieval ( IR), namely, polysemy, or term mismatch, and synonymy. In language models, these issues have been addressed by considering dependencies such as bigrams, phrasal-concepts, or word relationships, but such models are estimated using simple n-grams or concept counting. In this paper, we address polysemy and synonymy mismatch with a concept-based language modeling approach that combines ontological concepts from external resources with frequently found collocations from the document collection. In addition, the concept-based model is enriched with subconcepts and semantic relationships through a semantic smoothing technique so as to perform semantic matching. Experiments carried out on TREC collections show that our model achieves significant improvements over a single word-based model and the Markov Random Field model (using a Markov classifier).

BibTeX key: said_lhadj_enhancing_2016
entry type: article
year: 2016
month: dec
journal: Journal of the Association for Information Science & Technology
number: 12
pages: 2909--2927
volume: 67
issn: 23301635
DOI: 10.1002/asi.23553

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{said_lhadj_enhancing_2016, abstract = {Traditionally, many information retrieval models assume that terms occur in documents independently. Although these models have already shown good performance, the word independency assumption seems to be unrealistic from a natural language point of view, which considers that terms are related to each other. Therefore, such an assumption leads to two well-known problems in information retrieval ( IR), namely, polysemy, or term mismatch, and synonymy. In language models, these issues have been addressed by considering dependencies such as bigrams, phrasal-concepts, or word relationships, but such models are estimated using simple n-grams or concept counting. In this paper, we address polysemy and synonymy mismatch with a concept-based language modeling approach that combines ontological concepts from external resources with frequently found collocations from the document collection. In addition, the concept-based model is enriched with subconcepts and semantic relationships through a semantic smoothing technique so as to perform semantic matching. Experiments carried out on TREC collections show that our model achieves significant improvements over a single word-based model and the Markov Random Field model (using a Markov classifier).}, added-at = {2018-11-04T17:02:36.000+0100}, author = {Said Lhadj, Lynda and Boughanem, Mohand and Amrouche, Karima}, biburl = {https://www.bibsonomy.org/bibtex/2f89eb1fc1e612d68f7e8adf2c5f967d1/lepsky}, doi = {10.1002/asi.23553}, interhash = {29e477806d6ecc54446f87fd11c1c1e8}, intrahash = {f89eb1fc1e612d68f7e8adf2c5f967d1}, issn = {23301635}, journal = {Journal of the Association for Information Science \& Technology}, keywords = {information_retrieval}, month = dec, number = 12, pages = {2909--2927}, timestamp = {2018-11-07T09:14:29.000+0100}, title = {Enhancing information retrieval through concept-based language modeling and semantic smoothing}, volume = 67, year = 2016 }

BibSonomy

Enhancing information retrieval through concept-based language modeling and semantic smoothing

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on