ISSN 2458-7834
 

Original Article 


Accuracy and Error Analysis of ChatGPT-4o, DeepSeek-R1, and Gemini 2.0 in the European Board of Hand Surgery Examination

Numan Mercan, Ahmet Yurteri, Ebubekir Eravşar, Ahmet Yıldırım.


Abstract
Objectives: In this study, the accuracy rates of the answers given by three different large language models (LLMs) (ChatGPT-4o, DeepSeek-R1, and Gemini 2.0) to the multiple-choice questions (MCQs) asked in the European Board of Hand Surgery (EBHS) exam and the reasons for the wrong answers were examined. It was hypothesized that the DeepSeek-R1 model would show a higher accuracy rate than the other two models based on reported differences in training datasets.
Materials and Methods: 10 different exams published in The Journal of Hand Surgery (European Volume) (between 2022- 2024) and 150 true/false MCQs were examined in the study. The MCQs divided into five subheadings according to the content of the questions, and these were anatomy, trauma, systemic-chronic diseases, microsurgery, and congenital disorders. The error reasons for the wrong answers of the models were divided into four groups, and these were data-related, semantic, algorithmic, and logical errors.
Results: ChatGPT-4o had a correct answer rate of 74%, DeepSeek-R1 76.7%, and Gemini 2.0 73.3%, and no significant difference was observed between these rates (p = 0.572). The models gave the same answer for 103 out of 150 MCQs, and 84.5% of these answers were correct. In the evaluation of wrong answers, it was seen that the most frequent type of error was data-related.
Conclusion: There was no significant difference in accuracy rates, content-based subcategories, or error types among the three LLMs. Data-related errors indicate gaps in training, but approximately 75% accuracy in this exam suggests that further error analysis could enhance future model performance.

Key words: artificial intelligence; board exam; ChatGPT; DeepSeek; error analysis; Gemini; hand surgery; large language models


 
ARTICLE TOOLS
Abstract
PDF Fulltext
How to cite this articleHow to cite this article
Citation Tools
Related Records
 Articles by Numan Mercan
Articles by Ahmet Yurteri
Articles by Ebubekir Eravşar
Articles by Ahmet Yıldırım
on Google
on Google Scholar


How to Cite this Article
Pubmed Style

Mercan N, Yurteri A, Eravşar E, Yıldırım A. Accuracy and Error Analysis of ChatGPT-4o, DeepSeek-R1, and Gemini 2.0 in the European Board of Hand Surgery Examination. Hand Microsurg. 2025; 14(3): 97-105. doi:10.5455/handmicrosurg.249704


Web Style

Mercan N, Yurteri A, Eravşar E, Yıldırım A. Accuracy and Error Analysis of ChatGPT-4o, DeepSeek-R1, and Gemini 2.0 in the European Board of Hand Surgery Examination. https://handmicrosurgeryjournal.com/?mno=249704 [Access: January 28, 2026]. doi:10.5455/handmicrosurg.249704


AMA (American Medical Association) Style

Mercan N, Yurteri A, Eravşar E, Yıldırım A. Accuracy and Error Analysis of ChatGPT-4o, DeepSeek-R1, and Gemini 2.0 in the European Board of Hand Surgery Examination. Hand Microsurg. 2025; 14(3): 97-105. doi:10.5455/handmicrosurg.249704



Vancouver/ICMJE Style

Mercan N, Yurteri A, Eravşar E, Yıldırım A. Accuracy and Error Analysis of ChatGPT-4o, DeepSeek-R1, and Gemini 2.0 in the European Board of Hand Surgery Examination. Hand Microsurg. (2025), [cited January 28, 2026]; 14(3): 97-105. doi:10.5455/handmicrosurg.249704



Harvard Style

Mercan, N., Yurteri, . A., Eravşar, . E. & Yıldırım, . A. (2025) Accuracy and Error Analysis of ChatGPT-4o, DeepSeek-R1, and Gemini 2.0 in the European Board of Hand Surgery Examination. Hand Microsurg, 14 (3), 97-105. doi:10.5455/handmicrosurg.249704



Turabian Style

Mercan, Numan, Ahmet Yurteri, Ebubekir Eravşar, and Ahmet Yıldırım. 2025. Accuracy and Error Analysis of ChatGPT-4o, DeepSeek-R1, and Gemini 2.0 in the European Board of Hand Surgery Examination. Hand and Microsurgery, 14 (3), 97-105. doi:10.5455/handmicrosurg.249704



Chicago Style

Mercan, Numan, Ahmet Yurteri, Ebubekir Eravşar, and Ahmet Yıldırım. "Accuracy and Error Analysis of ChatGPT-4o, DeepSeek-R1, and Gemini 2.0 in the European Board of Hand Surgery Examination." Hand and Microsurgery 14 (2025), 97-105. doi:10.5455/handmicrosurg.249704



MLA (The Modern Language Association) Style

Mercan, Numan, Ahmet Yurteri, Ebubekir Eravşar, and Ahmet Yıldırım. "Accuracy and Error Analysis of ChatGPT-4o, DeepSeek-R1, and Gemini 2.0 in the European Board of Hand Surgery Examination." Hand and Microsurgery 14.3 (2025), 97-105. Print. doi:10.5455/handmicrosurg.249704



APA (American Psychological Association) Style

Mercan, N., Yurteri, . A., Eravşar, . E. & Yıldırım, . A. (2025) Accuracy and Error Analysis of ChatGPT-4o, DeepSeek-R1, and Gemini 2.0 in the European Board of Hand Surgery Examination. Hand and Microsurgery, 14 (3), 97-105. doi:10.5455/handmicrosurg.249704





Most Viewed Articles
Most Accessed Articles

  • Giant intramuscular lipoma of arm: A case report and review of the literature
    Muzaffer Durmus, Ahmet Demirhan Dal, Abdul Kerim Yapici, Sedat Avsar, Yalcin Bayram
    Hand Microsurg. 2014; 3(3): 87-90
    » Abstract » doi: 10.5455/handmicrosurg.173820

  • Cubital tunnel syndrome due to heterotrophic ossification caused by radial head fracture: A case report
    Seyitali Gumustas, Haci Bayram Tosun, Ismail Agir, Abuzer Uludag
    Hand Microsurg. 2014; 3(1): 24-28
    » Abstract » doi: 10.5455/handmicrosurg.1577

  • Munchausen’s syndrome or pure self-mutilation? A case of self-inflicted tendon injury
    Burak Kaya, Servet Elcin Alpat, Mehmet Sonmez, Cem Cerkez, Savas Serel
    Hand Microsurg. 2014; 3(3): 83-86
    » Abstract » doi: 10.5455/handmicrosurg.172607

  • Radial nerve grafting in high energy humeral fractures
    Yusuf Gürbüz, Tahir Sadık Sügün, Kemal Özaksar, Murat Kayalar, Tulgar Toros, İbrahim Kaplan
    Hand Microsurg. 2012; 1(2): 60-64
    » Abstract » doi: 10.2399/emd.12.58077

  • The utility of onion extract gel containing topical allantoin and heparin after surgical treatment of upper extremity burn scars
    Mehmet Ihsan Okur, Alpagan Mustafa Yildirim, Bilsev Ince
    Hand Microsurg. 2014; 3(3): 74-79
    » Abstract » doi: 10.5455/handmicrosurg.172436

  • Most Downloaded
    Top Downloaded Articles

  • Love sign’s love for glomus tumor
    Ankit Shukla, Varun Verma, Roshni Shukla, Rajesh Chaudhary, Saurab Sharma, Sajid Mohammad
    Hand Microsurg. 2018; 7(2): 105-108
    » Abstract » doi: 10.5455/handmicrosurg.279749

  • Bone island and hand involvement – A short review
    Ganesh Singh Dharmshaktu, Binit Singh
    Hand Microsurg. 2018; 7(2): 93-97
    » Abstract » doi: 10.5455/handmicrosurg.258665

  • Lipofibromatous hamartoma of the digital nerve: A case report
    Emin Sir, Alper Aksoy
    Hand Microsurg. 2018; 7(1): 45-48
    » Abstract » doi: 10.5455/handmicrosurg.255406

  • Synovial chondromatosis of the wrist
    Murat Kayalar, Abuzer Uludag, Basak Doganavsargil Yakut, Ahmet Savran
    Hand Microsurg. 2018; 7(2): 109-111
    » Abstract » doi: 10.5455/handmicrosurg.269392

  • The three bony point relationship of the elbow-Why is there still a lack of consensus?
    Supreeth Nekkanti, Dama Kondaiah Naidu, Jasmine Sebastin, Kamal Prasad, Mohan Yeshwanth, Archana Meka
    Hand Microsurg. 2018; 7(2): 79-82
    » Abstract » doi: 10.5455/handmicrosurg.291476

  • Most Cited Articles
    Most Cited Articles

  • Ulnar nerve palsy after closed forearm fracture: a case report
    Levent Küçük, Oğuz Özdemir, Erhan Coşkunol
    Hand Microsurg. 2012; 1(1): 30-32
    » Abstract » doi: 10.2399/emd.12.09709
    Cited : 2 times [Click to see citing articles]

  • Evaluation of upper extremity traumatic amputations by means of etiology, demographics and therapy
    Dağhan Dağdelen, Dilgem Mammadov, Erkan Yüce, Kamuran Zeynep Sevim Aytuğ, Ayşin Karasoy Yeşilada, Semra Hacıkerim Karşıdağ
    Hand Microsurg. 2012; 1(3): 95-98
    » Abstract » doi: 10.2399/emd.12.24633
    Cited : 1 time [Click to see citing article]