Performance of ChatGPT in Israeli Hebrew OBGYN national residency examinations

Adiel Cohen,Roie Alter,Naama Lessans,Raanan Meyer,Yoav Brezinov,Gabriel Levin

Archives of gynecology and obstetrics（2023）

引用 0|浏览2

暂无评分

摘要

Purpose Previous studies of ChatGPT performance in the field of medical examinations have reached contradictory results. Moreover, the performance of ChatGPT in other languages other than English is yet to be explored. We aim to study the performance of ChatGPT in Hebrew OBGYN-‘Shlav-Alef’ (Phase 1) examination. Methods A performance study was conducted using a consecutive sample of text-based multiple choice questions, originated from authentic Hebrew OBGYN-‘Shlav-Alef’ examinations in 2021–2022. We constructed 150 multiple choice questions from consecutive text-based-only original questions. We compared the performance of ChatGPT performance to the real-life actual performance of OBGYN residents who completed the tests in 2021–2022. We also compared ChatGTP Hebrew performance vs. previously published English medical tests. Results In 2021–2022, 27.8% of OBGYN residents failed the ‘Shlav-Alef’ examination and the mean score of the residents was 68.4. Overall, 150 authentic questions were evaluated (one examination). ChatGPT correctly answered 58 questions (38.7%) and reached a failed score. The performance of Hebrew ChatGPT was lower when compared to actual performance of residents: 38.7% vs. 68.4%, p < .001. In a comparison to ChatGPT performance in 9,091 English language questions in the field of medicine, the performance of Hebrew ChatGPT was lower (38.7% in Hebrew vs. 60.7% in English, p < .001). Conclusions ChatGPT answered correctly on less than 40% of Hebrew OBGYN resident examination questions. Residents cannot rely on ChatGPT for the preparation of this examination. Efforts should be made to improve ChatGPT performance in other languages besides English.

查看译文

关键词

ChatGPT,Hebrew,OBGYN,Performance,Test

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要