Journal of the European Society for Gynaecological Endoscopy

esge_logo

Artificial Intelligence, the ChatGPT Large Language Model: Assessing the Accuracy of Responses to the Gynaecological Endoscopic Surgical Education and Assessment (GESEA) Level 1-2 knowledge tests

M. Pavone1,2,3,4, L. Palmieri1, N. Bizzarri1, A. Rosati1, F. Campolo1, C. Innocenzi1, C. Taliento5,6, S. Restaino7, U. Catena1, G. Vizzielli7, C. Akladios8, M.M. Ianieri9, J. Marescaux3, R. Campo10, F. Fanfani1, G. Scambia1

1 UOC Ginecologia Oncologica, Dipartimento di Scienze per la salute della Donna e del Bambino e di Sanità Pubblica, Fondazione Policlinico Universitario A. Gemelli, IRCCS, 00168, Rome, Italy
2 IHU Strasbourg, Institute of Image-Guided Surgery, 67000 Strasbourg, France
3 IRCAD, Research Institute against Digestive Cancer (IRCAD) France, 67000 Strasbourg, France
4 ICube, Laboratory of Engineering, Computer Science and Imaging, Department of Robotics, Imaging, Teledetection and Healthcare Technologies, University of Strasbourg, CNRS, UMR 7357, Strasbourg, France
5 Department of Obstetrics and Gynecology, University Hospital Ferrara, 44121 Ferrara, Italy
6 Department of Obstetrics and Gynaecology, University Hospitals Leuven, 3000 Leuven, Belgium
7 Department of Medical Area (DAME), University of Udine, Clinic of Obstetrics and Gynecology, “Santa Maria Della Misericordia” University Hospital, Azienda Sanitaria Universitaria Friuli Centrale, 33100 Udine, Italy
8 University Hospitals of Strasbourg, Department of Gynecologic Surgery, 67091 Strasbourg, France
9 Gynecology and Breast Care Center, Mater Olbia Hospital, Olbia, Italy
10 Life Expert Centre, Schipvaartstraat 4, 3000 Leuven, Belgium

Keywords:

ChatGPT, Artificial intelligence, GESEA, laparoscopy, hysteroscopy, digital surgery


Published online: Dec 18 2024

https://doi.org/10.52054/FVVO.16.4.047

Abstract

Background: In 2022, OpenAI launched ChatGPT 3.5, which is now widely used in medical education, training, and research. Despite its valuable use for the generation of information, concerns persist about its authenticity and accuracy. Its undisclosed information source and outdated dataset pose risks of misinformation. Although it is widely used, AI-generated text inaccuracies raise doubts about its reliability. The ethical use of such technologies is crucial to uphold scientific accuracy in research.

Objective: This study aimed to assess the accuracy of ChatGPT in doing GESEA tests 1 and 2.

Materials and Methods: The 100 multiple-choice theoretical questions from GESEA certifications 1 and 2 were presented to ChatGPT, requesting the selection of the correct answer along with an explanation. Expert gynaecologists evaluated and graded the explanations for accuracy.

Main outcome measures: ChatGPT showed a 59% accuracy in responses, with 64% providing comprehensive explanations. It performed better in GESEA Level 1 (64% accuracy) than in GESEA Level 2 (54% accuracy) questions.

Conclusions: ChatGPT is a versatile tool in medicine and research, offering knowledge, information, and promoting evidence-based practice. Despite its widespread use, its accuracy has not been validated yet. This study found a 59% correct response rate, highlighting the need for accuracy validation and ethical use considerations. Future research should investigate ChatGPT’s truthfulness in subspecialty fields such as gynaecologic oncology and compare different versions of chatbot for continuous improvement.

What is new? Artificial intelligence (AI) has a great potential in scientific research. However, the validity of outputs remains unverified. This study aims to evaluate the accuracy of responses generated by ChatGPT to enhance the critical use of this tool.