Aims
Capsule Endoscopy (CE) is a minimally invasive exam that has emerged as a central modality for evaluating small bowel inflammatory disorders, particularly Crohn’s disease. Precise identification of ulcers and erosions is fundamental for assessing disease activity, informing therapeutic decisions, and monitoring treatment response. Although CE provides unparalleled visualization of mucosal pathology, manual interpretation remains time-consuming and susceptible to notable interobserver variability. Advances in artificial intelligence, especially convolutional neural networks, now present a timely opportunity to enhance diagnostic accuracy, reduce variability, and improve workflow efficiency. Continued development and rigorous clinical validation are warranted to realize their full potential in routine practice. Our aim is to develop and validate a robust AI model capable of detecting and distinguishing small bowel ulcers and erosions across multiple CE systems and international clinical centers.
Methods
We performed a prospective, multicenter study between January 2021 and September 2025, including six centers in Portugal, Spain, Brazil, Uruguay, Australia, and the United States. A total of 423 complete CE examinations were analyzed. AI reports generated by a deep learning model were analyzed by an expert and the AI-assisted report was compared with standard-of-care (SoC) interpretations, using an expert consensus panel as the reference standard. Diagnostic performance was evaluated using sensitivity, specificity, overall accuracy, and area under the receiver operating characteristic curve (AUC-ROC). Reading time was also recorded for the AI-assisted reading.
Results
Expert consensus identified ulcers or erosions in 127 patients (30.0%). Compared with the reference standard, SoC achieved 68.5% sensitivity, 90.8% specificity, and 83.7% accuracy. AI-assisted reading demonstrated higher sensitivity (88.2%) with slightly lower specificity (79.7%), resulting in comparable overall accuracy (82.4%). The AI model achieved a higher AUC-ROC than SoC (83.9% vs. 79.6%). AI identified lesions in 112 patients, outperforming SoC, which detected lesions in 87 patients. These results were consistent across devices and centres. Mean AI-assisted reading time was 318 seconds per exam.
Conclusions
The AI-based software improved the detection of small bowel ulcers and erosions compared with standard care, showing higher sensitivity, comparable overall accuracy, and a meaningful reduction in reading time. This validation demonstrates that artificial intelligence can deliver consistent diagnostic performance across different devices and clinical settings while markedly accelerating case review. Shorter reading times have direct implications for clinical scalability, enabling higher throughput, reducing workload, and expanding access to capsule endoscopy in routine practice. These findings support further integration and evaluation of artificial intelligence to optimize endoscopic assessment and strengthen the management of inflammatory bowel disease.