bookmark

Can large language models replace humans in systematic reviews? Evaluating GPT‐4's efficacy in screening and extracting data from peer‐reviewed and grey literature in multiple languages - Khraisha - Research Synthesis Methods - Wiley Online Library


Description

When screening full-text literature using highly reliable prompts, GPT-4's performance was more robust, reaching “human-like” levels. Although our findings indicate that, currently, substantial caution should be exercised if LLMs are being used to conduct systematic reviews, they also offer preliminary evidence that, for certain review tasks delivered under specific conditions, LLMs can rival human performance.

Preview

Tags

Users

  • @sssftlibrary

Comments and Reviews