• Skip to main content
  • Skip to header right navigation
  • Skip to site footer

  • Twitter
  • YouTube
NASBS

NASBS

North American Skull Base Society

  • Home
  • About
    • Mission Statement
    • Bylaws
    • NASBS Board of Directors
    • Committees
      • Committee Interest Form
    • NASBS Policy
    • Donate Now to the NASBS
    • Contact Us
  • Meetings
    • 2026 Annual Meeting
    • Abstracts
      • 2026 Call for Abstracts
      • NASBS Poster Archives
      • 2025 Abstract Awards
    • 2026 Recap
    • NASBS Summer Course
    • Meetings Archive
    • Other Skull Base Surgery Educational Events
  • Resources
    • Member Survey Application
    • NASBS Travel Scholarship Program
    • Research Grants
    • Fellowship Registry
    • The Rhoton Collection
    • Webinars
      • Research Committee Workshop Series
      • ARS/AHNS/NASBS Sinonasal Webinar
      • Surgeon’s Log
      • Advancing Scholarship Series
      • Trials During Turnover: Webinar Series
    • NASBS iCare Pathway Resources
    • Billing & Coding White Paper
  • Membership
    • Join NASBS
    • Membership Directory
    • Multidisciplinary Teams of Distinction
    • NASBS Mentorship Program
  • Fellowship Match
    • NASBS Neurosurgery Skull Base Fellowship Match Programs
    • NASBS Neurosurgery Skull Base Fellowship Match Application
  • Journal
  • Login/Logout

2026 Poster Presentations

2026 Poster Presentations

 

← Back to Previous Page

 

P500: ARTIFICIAL INTELLIGENCE ALIGNMENT WITH EXPERT CONSENSUS IN VESTIBULAR SCHWANNOMA MANAGEMENT: A MULTI-PLATFORM EVALUATION
Shreya Vinjamuri1; KiChang Kang, MD2; Jay Trivedi1; Fox Ryker3; Anish Sathe, MD1; Roger Murayi, MD1; James Evans, MD1; 1Thomas Jefferson University Hospital; 2Montefiore Hospital; 3PCOM

Treatment paradigms for vestibular schwannomas (VS) have evolved with advances in imaging and surgical technology, yet clinical guidelines remain non-standardized, leading to variability in outcomes. Artificial intelligence (AI) shows promise for guiding clinical decision-making, but evaluation of how AI alignment with expert consensus changes over time as these technologies rapidly evolve is necessary.

Methods: We evaluated four AI platforms (Google Bard/Gemini, GPT-4/GPT4o, SciteAI, and DeepSeek) across 2023 and 2025 datasets to assess agreement with expert consensus on VS management from Carlson et al. (2020). We tested 103 expert consensus statements across six categories: Hearing Preservation (Radiosurgery/Microsurgery), Tumor Control and Imaging Surveillance, Preferred Treatment, Operative Considerations, and Complications. Each statement was presented in two formats: direct consensus statements (Prompt 1) and rephrased yes/no questions (Prompt 2) to evaluate framing effects. Two independent evaluators categorized responses as "Agree," "Disagree," or "Neutral/Insufficient Information," with neutral responses grouped with disagreements for statistical analysis. We calculated accuracy as the proportion of "Agree" responses and used Cohen's Kappa statistics to assess within-AI agreement (between prompts) and between-AI agreement.

Results: Overall AI agreement with expert consensus improved significantly from 2023 to 2025. In 2023, agreement ranged from 45.6% to 92.2%, with GPT-4 showing highest accuracy (88.8%) and SciteAI lowest (57.3%). In 2025, agreement improved to 90.3% to 100%, with GPT4o and DeepSeek achieving perfect accuracy (100%), while Google Gemini reached 93.2% and SciteAI 91.7%. Within-AI consistency (Kappa values) increased from 0.039-0.371 in 2023 to 0.297-1.000 in 2025. Between-AI agreement remained low in 2023 (Kappa: -0.118 to 0.142) but improved moderately in 2025 (Kappa: 0.000 to 0.540). Category-specific analysis revealed consistent improvements across all treatment domains, with GPT4o showing near perfect agreement in all categories by 2025.

AI platforms demonstrated variable but generally improving agreement with expert consensus on VS management, with significant enhancements between 2023 and 2025 iterations. While newer models showed higher accuracy, inter-platform agreement remained limited, suggesting continued variability in AI interpretation of medical literature. These findings highlight the need for comprehensive evaluation of AI tools before clinical implementation and underscore the importance of prompt standardization in AI research. The study provides a baseline for tracking AI progress in neurosurgical decision-making as these technologies continue evolving.

* Re-Analysis currently being conducted to add information regarding 2025 Claude AI, Open Evidence, etc. local models

View Poster

 

← Back to Previous Page

Copyright © 2026 North American Skull Base Society · Managed by BSC Management, Inc · All Rights Reserved