Aims
MAPS III recommends using AI where available to enhance exam quality, but surveillance decisions remain based on Kimura–Takemoto, EGGIM, OLGA/OLGIM, and histology/risk factors—not on AI outputs. Many gastric AI systems show strong performance, yet most are offline and rarely report patient-level outcomes aligned with MAPS.
Methods
Systematic search of PubMed/Embase (through Oct-2025) for adult EGD-AI studies targeting: (i) CAG detection/classification, (ii) IM detection or EGGIM scoring, (iii) OLGA/OLGIM prediction, versus histology or expert endoscopy. Because patient-level 2×2 data were uncommon, we performed a narrative synthesis, extracting per-patient accuracy metrics when available and mapping outputs to MAPS tasks.
Results
We screened 214 records, assessed 32 full texts, and included 20 studies. Evidence clustered into three MAPS-relevant areas:
1. CAG (per-patient): accuracy ≈91%, sensitivity ≈88%, specificity ≈94% vs histology.
2. IM/EGGIM (risk stratification): per-patient accuracy 88% (95%CI ~80–96) with sensitivity 100% for EGGIM ≥ 5 (no high-risk cases missed) and specificity ~85%; per-image accuracy ~87% on large NBI datasets.
3. Atrophy/histologic risk: AI correlated with Kimura–Takemoto and OLGA/OLGIM (AUC ~0.67–0.75), supporting proof-of-concept more than deployable staging.
Most studies were offline, and <15% reported MAPS-ready, patient-level outputs (confusion matrices or explicit CAG/EGGIM/OLGA reporting); real-time validation and routine VCE/standardized scoring were inconsistent.
Conclusions
Gastric AI already reaches MAPS-relevant accuracy, but it must support—not replace—Kimura–Takemoto, EGGIM, and OLGA/OLGIM. Adoption hinges on multicentre real-time trials proving: (i) non-inferiority in missing high-risk disease (EGGIM ≥ 5 / advanced OLGA–OLGIM), (ii) better consistency of MAPS scoring, and (iii) standardized patient-level reporting aligned with MAPS III.