2025 Poster Presentations
P474: AUTOMATED OPERATIVE WORKFLOW RECOGNITION IN VESTIBULAR SCHWANNOMA RESECTION: DEVELOPMENT AND PRECLINICAL EVALUATION OF A DEEP LEARNING NEURAL NETWORK (IDEAL STAGE 0)
Simon C Williams, Mr1; Dorothée Duvaux, Miss2; Adrito Das, Mr2; Siddharth Sinha, Mr1; Hugo Layard Horsfall, Mr1; Jonathan P Funnell, Mr1; John G Hanrahan, Mr1; Danyal Z Khan, Mr1; William R Muirhead, Mr1; Neil Kitchen, Mr1; Francisco Vasconcelos, Dr2; Sophia Bano, Dr2; Danail Stoyanov, Prof2; Patrick Grover, Mr1; Hani J Marcus, Mr1; 1The National Hospital For Neurology and Neurosurgery; 2Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS)
Background and Objectives: Paradigm shifts in the operative management of vestibular schwannoma (VS) have resulted in a significant reduction in morbidity and mortality associated with surgical excision, but despite this resection of VS remains a high risk operation. Furthermore, the low volume, high complexity nature of VS resection, coupled with increasing transfer of expertise to centres of excellence, has brought concerns regarding training opportunities. Artificial intelligence (AI) provides opportunities to address these concerns - of particular note to surgeons is the opportunity for AI to interpret and process operative video. Machine learning (ML) in surgical video analysis offers promising prospects for training, audit, decision support, and prognostication in surgery. The past decade has seen key advances in ML-based operative workflow analysis, whereby ML platforms predict the phase and step of an operation, though existing applications mostly feature shorter surgeries (<2hrs). This study aimed to develop and evaluate a ML model capable of automated operative workflow recognition for vestibular schwannoma resection. In doing so, this study furthers previous research in this field by applying workflow prediction platforms to lengthy (median >5hrs duration), data heavy surgeries.
Methods: An operative video dataset of twenty-one microscopic retrosigmoid vestibular schwannoma resections was collected at a single institution over a three-year period, and underwent phase and step annotation according to a workflow previously agreed by expert consensus (Approach, Excision, and Closure phases, and Debulking or Dissection steps within the Excision phase) (Figure 1). Annotations were used to train a ML model consisting of a convolutional neural network (CNN) followed by a recurrent neural network (RNN) (Figure 2). 5-fold cross-validation was used and performance metrics (accuracy, precision, recall, F1 score) were assessed for phase and step prediction tasks.
Results: Median operative video time was 5 hours 18 minutes (IQR 3hr21min–6hr1min). The ‘Tumour Excision’ phase accounted for the majority of each case (median 4hr23min), whilst ‘Approach and Exposure’ (28min) and ‘Closure’ (17min) comprised shorter phases. The ML model accurately predicted operative phases (accuracy 81%, weighted F1 0.83) and dichotomised steps (accuracy 86%, weighted F1 0.86), but yielded reduced accuracy when predicting individual steps (accuracy 59%, weighted F1 0.58).
Conclusion: This study demonstrates that our CNN-RNN model can accurately predict the surgical phases and intra-phase steps in retrosigmoid vestibular schwannoma resection. Despite this, there remains room for improvement in individual step classification. This work is of particular significance within the context of unique ML challenges: first, the analysis of extensive datasets, in contrast to previous clinical applications of computer vision unanimously conducted on shorter duration procedures; and second, the navigation of surgeries lacking a linear progression of steps with a specific phase. Future applications of ML in low volume complex operations should prioritise collaborative video sharing to overcome early technical barriers to clinical translation.
Figure 1: Typical operative Images for each surgical phase, and a typical time plot showcasing the duration of each phase and the interchange between steps within the ‘Tumour Excision’ phase.
Figure 2: Overview of video processing architecture incorporating ResNet50 and Long Short-Term Memory platforms layers.