bookmark

Can large language models replace humans in systematic reviews? Evaluating GPT‐4's efficacy in screening and extracting data from peer‐reviewed and grey literature in multiple languages - Khraisha - Research Synthesis Methods - Wiley Online Library

https://onlinelibrary.wiley.com/doi/10.1002/jrsm.1715

Description

When screening full-text literature using highly reliable prompts, GPT-4's performance was more robust, reaching “human-like” levels. Although our findings indicate that, currently, substantial caution should be exercised if LLMs are being used to conduct systematic reviews, they also offer preliminary evidence that, for certain review tasks delivered under specific conditions, LLMs can rival human performance.

Preview

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

BibSonomy