Artificial Intelligence, the ChatGPT Large Language Model: Assessing the Accuracy of Responses to the Gynaecological Endoscopic Surgical Education and Assessment (GESEA) Level 1-2 knowledge tests
PDF
Cite
Share
Request
Original Articles
VOLUME: 16 ISSUE: 4
P: 449 - 456
December 2024

Artificial Intelligence, the ChatGPT Large Language Model: Assessing the Accuracy of Responses to the Gynaecological Endoscopic Surgical Education and Assessment (GESEA) Level 1-2 knowledge tests

Facts Views Vis ObGyn 2024;16(4):449-456
1. UOC Ginecologia Oncologica, Dipartimento di Scienze per la salute della Donna e del Bambino e di Sanità Pubblica, Fondazione Policlinico Universitario A. Gemelli, IRCCS, 00168, Rome, Italy
2. IHU Strasbourg, Institute of Image-Guided Surgery, 67000 Strasbourg, France
3. IRCAD, Research Institute against Digestive Cancer (IRCAD) France, 67000 Strasbourg, France
4. ICube, Laboratory of Engineering, Computer Science and Imaging, Department of Robotics, Imaging, Teledetection and Healthcare Technologies, University of Strasbourg, CNRS, UMR 7357, Strasbourg, France
5. Department of Obstetrics and Gynecology, University Hospital Ferrara, 44121 Ferrara, Italy
6. Department of Obstetrics and Gynaecology, University Hospitals Leuven, 3000 Leuven, Belgium
7. Department of Medical Area (DAME), University of Udine, Clinic of Obstetrics and Gynecology, “Santa Maria Della Misericordia” University Hospital, Azienda Sanitaria Universitaria Friuli Centrale, 33100 Udine, Italy
8. University Hospitals of Strasbourg, Department of Gynecologic Surgery, 67091 Strasbourg, France
9. Gynecology and Breast Care Center, Mater Olbia Hospital, Olbia, Italy
10. Life Expert Centre, Schipvaartstraat 4, 3000 Leuven, Belgium
No information available.
No information available
PDF
Cite
Share
Request

Abstract

Background

In 2022, OpenAI launched ChatGPT 3.5, which is now widely used in medical education, training, and research. Despite its valuable use for the generation of information, concerns persist about its authenticity and accuracy. Its undisclosed information source and outdated dataset pose risks of misinformation. Although it is widely used, AI-generated text inaccuracies raise doubts about its reliability. The ethical use of such technologies is crucial to uphold scientific accuracy in research.

Objective

This study aimed to assess the accuracy of ChatGPT in doing GESEA tests 1 and 2.

Materials and Methods

The 100 multiple-choice theoretical questions from GESEA certifications 1 and 2 were presented to ChatGPT, requesting the selection of the correct answer along with an explanation. Expert gynaecologists evaluated and graded the explanations for accuracy.

Main outcome measures

ChatGPT showed a 59% accuracy in responses, with 64% providing comprehensive explanations. It performed better in GESEA Level 1 (64% accuracy) than in GESEA Level 2 (54% accuracy) questions.

Conclusions

ChatGPT is a versatile tool in medicine and research, offering knowledge, information, and promoting evidence-based practice. Despite its widespread use, its accuracy has not been validated yet. This study found a 59% correct response rate, highlighting the need for accuracy validation and ethical use considerations. Future research should investigate ChatGPT’s truthfulness in subspecialty fields such as gynaecologic oncology and compare different versions of chatbot for continuous improvement.

What is new? Artificial intelligence (AI) has a great potential in scientific research. However, the validity of outputs remains unverified. This study aims to evaluate the accuracy of responses generated by ChatGPT to enhance the critical use of this tool.

Keywords:
ChatGPT, Artificial intelligence, GESEA, laparoscopy, hysteroscopy, digital surgery