2025 Poster Presentations
P169: ASSESSING THE DIAGNOSTIC POWER AND MANAGEMENT RECOMMENDATIONS OF CHATGPT-4 VISION ON ORBITAL FRACTURE IN CT SCAN
Omar Sadat, MD; Kareem Ibrahim-Bacha, BS; Diane Wang, MD; Richard Cui, PhD; John N Nguyen, MD; West Virginia University
Purpose: Facial CT scan is essential in the evaluation and management of fractures. ChatGPT-4 Vision (GPT-4v), a multimodal large language model that allows a user to upload an image as input and engage in a conversation with the model, shows promising result when utilized for diagnosing distal radius fractures, providing patient information, and assisting with the decision-making process. We here in evaluate the performance of ChatGPT-4 Vision in the analysis of facial CT for diagnosis of orbital fracture along with its recommended management, compared with the assessments and recommendations from oculofacial plastic surgeons.
Methods: Nineteen cases of various orbital floor fractures with CT images were obtained from open-source online image search, and cases including two or more views (axial and coronal) were included. Each case was assessed for identification of fractures, laterality, size of fractures, likelihood of extraocular muscles entrapment, and treatment recommendations including medical and surgical management. ChatGPT-4 Vision was given a prompt to assess the CT images along with making recommendations (Figure 1). Separately, an attending physician and a fellow who are blinded to the image collection were asked to assess the images and to make recommendations purely on the CT images. Performance of CT analysis by GPT-4v was compared to surgeons with radiologist’s interpretation served as gold-standard.
Results: GPT-4v and surgeons correctly identified the presence of an orbital fracture in all nineteen cases. GPT-4v’s ability to accurately identify laterality of fracture was found to be 47.37%, significantly lower than surgeons at 100% (χ2=24.26, p<0.01). Identification of fractured bone was found to be 100% across GPT-4v and surgeons. Inferior rectus entrapment (5/19 cases) was found to be 73.68% by GPT-4v, significantly lower than surgeons at 100% (χ2=8.60, p<0.01). Size of fracture was accurately described 63.16% by GPT-4v, also significantly lower vs surgeons at 100% (χ2=15.96, p<0.01). Lastly, recommendation of surgical management was accurately depicted 68.42% of the time by GPT-4v which was significantly lower than 94.75% by surgeons (χ2=7.27, p<0.01).
Conclusion: GPT-4v is able to accurately identify the presence of fractured bone on CT images, but has significant deficiencies compared to oculofacial trauma surgeons in its ability to identify fracture laterality, likelihood of muscle entrapment, and correctly recommend surgical intervention.