копировать удалить добавить публикацию в буфер
Запись сообщества
посмотреть историю данной записи
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Pollice Verso at SemEval-2024 Task 6: The Roman Empire Strikes Back

K. Kobs, J. Pfister, и A. Hotho. Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), стр. 1529--1536. Mexico City, Mexico, Association for Computational Linguistics, (июня 2024)

Аннотация

We present an intuitive approach for hallucination detection in LLM outputs that is modeled after how humans would go about this task. We engage several LLM ``experts'' to independently assess whether a response is hallucinated. For this we select recent and popular LLMs smaller than 7B parameters. By analyzing the log probabilities for tokens that signal a positive or negative judgment, we can determine the likelihood of hallucination. Additionally, we enhance the performance of our ``experts'' by automatically refining their prompts using the recently introduced OPRO framework. Furthermore, we ensemble the replies of the different experts in a uniform or weighted manner, which builds a quorum from the expert replies. Overall this leads to accuracy improvements of up to 10.6 p.p. compared to the challenge baseline. We show that a Zephyr 3B model is well suited for the task. Our approach can be applied in the model-agnostic and model-aware subtasks without modification and is flexible and easily extendable to related tasks.

Линки и ресурсы

ключ BibTeX: kobs-etal-2024-pollice
тип записи: inproceedings
адрес: Mexico City, Mexico
название книги: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
год: 2024
месяц: 06
страницы: 1529--1536
издательство: Association for Computational Linguistics
url: https://aclanthology.org/2024.semeval-1.219

тэги

@dmir- тэги данного пользователя выделены

Цитировать эту публикацию

@inproceedings{kobs-etal-2024-pollice, abstract = {We present an intuitive approach for hallucination detection in LLM outputs that is modeled after how humans would go about this task. We engage several LLM {``}experts{''} to independently assess whether a response is hallucinated. For this we select recent and popular LLMs smaller than 7B parameters. By analyzing the log probabilities for tokens that signal a positive or negative judgment, we can determine the likelihood of hallucination. Additionally, we enhance the performance of our {``}experts{''} by automatically refining their prompts using the recently introduced OPRO framework. Furthermore, we ensemble the replies of the different experts in a uniform or weighted manner, which builds a quorum from the expert replies. Overall this leads to accuracy improvements of up to 10.6 p.p. compared to the challenge baseline. We show that a Zephyr 3B model is well suited for the task. Our approach can be applied in the model-agnostic and model-aware subtasks without modification and is flexible and easily extendable to related tasks.}, added-at = {2024-07-01T03:27:07.000+0200}, address = {Mexico City, Mexico}, author = {Kobs, Konstantin and Pfister, Jan and Hotho, Andreas}, biburl = {https://www.bibsonomy.org/bibtex/2e2234c8b039b7b82e96fa381736be0ae/dmir}, booktitle = {Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)}, editor = {Ojha, Atul Kr. and Do{\u{g}}ru{\"o}z, A. Seza and Tayyar Madabushi, Harish and Da San Martino, Giovanni and Rosenthal, Sara and Ros{\'a}, Aiala}, interhash = {7e73d61fab4f1359294e4365e97d39d8}, intrahash = {e2234c8b039b7b82e96fa381736be0ae}, keywords = {myown motiv author:hotho author:pfister from:janpf}, month = {06}, pages = {1529--1536}, publisher = {Association for Computational Linguistics}, timestamp = {2024-07-01T03:27:07.000+0200}, title = {Pollice Verso at {S}em{E}val-2024 Task 6: The {R}oman Empire Strikes Back}, url = {https://aclanthology.org/2024.semeval-1.219}, year = 2024 }

искать в

Метаданные

Последнее изменение 4 месяцев назад
Создан 4 месяцев назад

Комментарии и рецензии
(0)

Комментарии, или рецензии отсутствуют. Вы можете их написать!