The introduction of OpenAI Codex sparked a surge of interest in the impact of generative AI models on computing education practices. Codex is also the underlying model for GitHub Copilot, a plugin which makes AI-generated code accessible to students through auto-completion in popular code editors. Research in this area, particularly on the educational implications, is nascent and has focused almost exclusively on introductory programming (or CS1) questions. Very recent work has shown that Codex performs considerably better on typical CS1 exam questions than most students. It is not clear, however, what Codex’s limits are with regard to more complex programming assignments and exams. In this paper, we present results detailing how Codex performs on more advanced CS2 (data structures and algorithms) exam questions taken from past exams. We compare these results to those of students who took the same exams under normal conditions, demonstrating that Codex outscores most students. We consider the implications of such tools for the future of undergraduate computing education.
Description
My AI Wants to Know if This Will Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises | Proceedings of the 25th Australasian Computing Education Conference
%0 Conference Paper
%1 Finnie_Ansley_2023
%A Finnie-Ansley, James
%A Denny, Paul
%A Luxton-Reilly, Andrew
%A Santos, Eddie Antonio
%A Prather, James
%A Becker, Brett A.
%B Proceedings of the 25th Australasian Computing Education Conference
%D 2023
%I ACM
%K llm programming
%P 97-104
%R 10.1145/3576123.3576134
%T My AI Wants to Know if This Will Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises
%U http://dx.doi.org/10.1145/3576123.3576134
%X The introduction of OpenAI Codex sparked a surge of interest in the impact of generative AI models on computing education practices. Codex is also the underlying model for GitHub Copilot, a plugin which makes AI-generated code accessible to students through auto-completion in popular code editors. Research in this area, particularly on the educational implications, is nascent and has focused almost exclusively on introductory programming (or CS1) questions. Very recent work has shown that Codex performs considerably better on typical CS1 exam questions than most students. It is not clear, however, what Codex’s limits are with regard to more complex programming assignments and exams. In this paper, we present results detailing how Codex performs on more advanced CS2 (data structures and algorithms) exam questions taken from past exams. We compare these results to those of students who took the same exams under normal conditions, demonstrating that Codex outscores most students. We consider the implications of such tools for the future of undergraduate computing education.
@inproceedings{Finnie_Ansley_2023,
abstract = {The introduction of OpenAI Codex sparked a surge of interest in the impact of generative AI models on computing education practices. Codex is also the underlying model for GitHub Copilot, a plugin which makes AI-generated code accessible to students through auto-completion in popular code editors. Research in this area, particularly on the educational implications, is nascent and has focused almost exclusively on introductory programming (or CS1) questions. Very recent work has shown that Codex performs considerably better on typical CS1 exam questions than most students. It is not clear, however, what Codex’s limits are with regard to more complex programming assignments and exams. In this paper, we present results detailing how Codex performs on more advanced CS2 (data structures and algorithms) exam questions taken from past exams. We compare these results to those of students who took the same exams under normal conditions, demonstrating that Codex outscores most students. We consider the implications of such tools for the future of undergraduate computing education.
},
added-at = {2023-12-06T05:48:53.000+0100},
author = {Finnie-Ansley, James and Denny, Paul and Luxton-Reilly, Andrew and Santos, Eddie Antonio and Prather, James and Becker, Brett A.},
biburl = {https://www.bibsonomy.org/bibtex/27bb667054316628962ba18676923e9c4/brusilovsky},
booktitle = {Proceedings of the 25th Australasian Computing Education Conference},
collection = {ACE ’23},
description = {My AI Wants to Know if This Will Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises | Proceedings of the 25th Australasian Computing Education Conference},
doi = {10.1145/3576123.3576134},
interhash = {d0e857c4e8712b413cf723adbe74a38a},
intrahash = {7bb667054316628962ba18676923e9c4},
keywords = {llm programming},
month = jan,
pages = {97-104},
publisher = {ACM},
series = {ACE ’23},
timestamp = {2023-12-06T05:48:53.000+0100},
title = {My AI Wants to Know if This Will Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises},
url = {http://dx.doi.org/10.1145/3576123.3576134},
year = 2023
}