Author of the publication

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark.

O. Sainz, J. Campos, I. García-Ferrero, J. Etxaniz, O. de Lacalle, and E. Agirre. EMNLP (Findings), page 10776-10787. Association for Computational Linguistics, (2023)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

Elisabeth Ander

Christina Ander

Konstanze Ander

Albert Ander

Christiane Ander

Other publications of authors with the same name

Improving Code Generation by Training with Natural Language Feedback.A. Chen, J. Scheurer, T. Korbak, J. Campos, J. Chan, S. Bowman, K. Cho, and E. Perez. CoRR, (2023)Unsupervised Domain Adaption for Neural Information Retrieval.C. Dominguez, J. Campos, E. Agirre, and G. Azkune. CoRR, (2023)Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning.J. Campos, K. Cho, A. Otegi, A. Soroa, E. Agirre, and G. Azkune. COLING, page 2561-2571. International Committee on Computational Linguistics, (2020)Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems.J. Deriu, D. Tuggener, P. von Däniken, J. Campos, Á. Rodrigo, T. Belkacem, A. Soroa, E. Agirre, and M. Cieliebak. EMNLP (1), page 3971-3984. Association for Computational Linguistics, (2020)Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque.A. Otegi, A. Gonzalez-Agirre, J. Campos, A. Soroa, and E. Agirre. LREC, page 436-442. European Language Resources Association, (2020)Training Language Models with Language Feedback at Scale.J. Scheurer, J. Campos, T. Korbak, J. Chan, A. Chen, K. Cho, and E. Perez. CoRR, (2023)Learning from Natural Language Feedback.J. Scheurer, J. Campos, J. Chan, A. Chen, K. Cho, and E. Perez. CoRR, (2022)IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named Entity Recognition Using Knowledge Bases.I. García-Ferrero, J. Campos, O. Sainz, A. Salaberria, and D. Roth. SemEval@ACL, page 1335-1346. Association for Computational Linguistics, (2023)DoQA - Accessing Domain-Specific FAQs via Conversational QA.J. Campos, A. Otegi, A. Soroa, J. Deriu, M. Cieliebak, and E. Agirre. ACL, page 7302-7314. Association for Computational Linguistics, (2020)State-of-the-Art in Language Technology and Language-centric Artificial Intelligence.R. Agerri, E. Agirre, I. Aldabe, N. Aranberri, J. Arriola, A. Atutxa, G. Azkune, J. Campos, A. Casillas, A. Estarrona and 18 other author(s). European Language Equality, Springer, (2022)

BibSonomy

Disambiguation of "Campos, Jon Ander"

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark.

Please choose a person to relate this publication to

Elisabeth Ander

Christina Ander

Konstanze Ander

Albert Ander

Christiane Ander

Other publications of authors with the same name

Disambiguation

BibSonomy

Disambiguation of "Campos, Jon Ander"

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark.

Please choose a person to relate this publication to

Elisabeth Ander

Christina Ander

Konstanze Ander

Albert Ander

Christiane Ander

Other publications of authors with the same name

Disambiguation

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark.