This media is currently not available.
Automated detection and segmentation of early gastric cancer in Western endoscopic images using a fine-tuned deep learning model
Poster Abstract

Aims

Endoscopic detection of gastric cancer is operator-dependent, and almost all existing artificial intelligence (AI) systems have been developed in high-incidence East Asian settings. We aimed to develop and preliminarily evaluate an AI model for detection and segmentation of gastric cancer in a Western population, where this pathology is low-incidence, using endoscopic submucosal dissection (ESD) histopathology as reference standard. The model was fine-tuned from a previously established semi-supervised network for Barrett’s neoplasia (Meinikheim et al., Endoscopy 2024). Transfer learning is appropriate because both esophageal and gastric cancers share key endoscopic characteristics and multimodal imaging features across WLI, NBI, and TXI. The feature representations learned from large-scale Barrett’s datasets therefore provide a strong initialization. Primary outcomes were Dice similarity coefficient for tumor segmentation and image-level detection sensitivity.

Methods

In this retrospective single-centre study at a tertiary-care hospital in Germany (University Hospital Augsburg), we included 84 patients with histologically confirmed gastric adenocarcinoma (T1a or higher) treated by ESD. From these patients, 827 endoscopic images (Olympus systems; WLI, NBI, TXI, with or without indigo chromoendoscopy) showing visible gastric cancer were extracted. Tumor extent was delineated on still images by experts informed by the corresponding ESD specimens and full pathology reports. All data were de-identified before analysis.

A convolutional neural network segmentation model was initialized with weights from the Barrett AI system, which had been pre-trained on 55,273 endoscopic images from 557 patients with Barrett’s esophagus and related neoplasia, and then supervisedly fine-tuned for gastric cancer segmentation. Data were split patient-wise into five train/validation folds (80/20 per fold) to avoid information leakage between patients, and a separate model was trained for each fold. For segmentation, performance was assessed using the Dice coefficient. For image-level detection, a tumor was considered detected in a given image if the entire predicted lesion region overlapped the ground-truth mask with Dice ≥75%; detection performance was summarized as sensitivity at this threshold and as the area under a recall–overlap curve obtained by varying the Dice threshold from 0 to 1.

Results

Across the five patient-wise folds, tumor Dice averaged 82.93% with a standard deviation of 1.39% [80.91–84.37%]. Image-level detection sensitivity at Dice ≥75% averaged 94.86% with a standard deviation of 3.12% [90.23–97.59%]. The area under the recall–overlap curve averaged 0.912 with a standard deviation of 0.009. These results indicate a consistent and robust performance across all validation images.

Conclusions

This preliminary single-centre study shows that an AI model fine-tuned from a semi-supervised Barrett’s esophagus system can accurately segment and reliably detect gastric cancer in Western multimodal endoscopic data using ESD-based histopathology as ground truth. The model provided robust pixel-level delineation and high image-level detection sensitivity across a variety of imaging modes, indicating that pathology-anchored AI for gastric cancer is feasible even in Western cohorts where this pathology is low-incidence. However, because the dataset included only images with visible cancer, the reported performance does not reflect specificity, and results should not be extrapolated to screening or mixed-pathology settings. Future work should include external validation in multicentre Western cohorts, assessment on images containing no lesions or other pathologies, and evaluation on full-length endoscopic videos to better approximate clinical performance.

To our knowledge, this is among the first endoscopic AI systems for gastric cancer developed in a Western population and one of the few to leverage ESD specimen–based ground truth for training and validation.