Large language model-driven analysis and report generation of endoscopy videos - a pilot study

Return

This media is currently not available.

D. Massimi

D. Luca

T. Rizkala

M. Spadaccini

Y. Mori

M. Menini

G. Antonelli

K. Khalaf

D. Von renteln

P. Sharma

K. Douglas

M. Bretthauer

C. Castoro

A. Repici

C. Hassan

R. Bisschops

Poster Abstract

Aims

Multimodal large language models (MLLMs) can automatically analyze clinical video, but evidence from full esophagogastroduodenoscopy (EGD) and the impact of on‑screen computer‑aided detection/diagnosis (CAD) overlays on MLLM behavior remains unclear. We tested whether an MLLM can produce clinically adequate EGD reports and whether a CAD overlay changes performance.

Methods

We analyzed five complete EGD videos with Gemini 2.5 Pro in paired versions: 1) clean video and 2) the same video with a CAD overlay. Five blinded endoscopists rated report adequacy in three domains. MLLM accuracy for landmarks/lesions was further assessed by two blinded expert endoscopists using time‑window rule (a model detection counted as correct if it occurred within ±2 seconds of the expert‑annotated timestamp).

Results

In this retrospective pilot study, five archived diagnostic EGD procedures from five patients were available as full-length videos. Across five raters, MLLM Completeness was judged adequate in 56.0% (14/25 ratings) with Clean‑Video versus 48.0% (12/25 ratings) with Overlay‑Video (p=0.500). Visualization was identical (36.0% [9/25 ratings] for both; p=1.000). Lesions characteristics was identical (16.0% [4/25] for both; p=1.00). For the Landmark agreement the overall accuracy of the MLLM with Clean-Video vs Overlay-Video was: 0.55 [95% CI 0.43–0.67] vs 0.33 [0.23–0.46], p=0.029; sensitivity 0.53 [0.40–0.66] vs 0.35 [0.24–0.49], p=0.122; specificity 0.67 [0.35–0.88] vs 0.22 [0.06–0.55], p=0.125.

Conclusions

In its current form, Gemini 2.5 Pro cannot report upper endoscopy findings appropriately for clinical use, and substantial task-specific optimization and validation are required before deployment.

Download the app

The congress at your fingertips

Aims

Methods

Results

Conclusions