Evaluating ChatGPT-5 for the Classification of Colorectal Polyp Images: A Comparative Analysis with Two Expert Endoscopists

This media is currently not available.

A. Ait errami

I. Zouaki

L. Fatima ezzahra

O. Nacir

K. Krati

Poster Abstract

Aims

Accurate characterization of colorectal polyps is essential for estimating neoplastic risk and guiding appropriate therapeutic management. However, endoscopic interpretation varies among clinicians, particularly in the classification of polyp morphology and surface patterns. Artificial intelligence (AI) systems may assist clinicians by improving diagnostic consistency.

Methods

This single-center, retrospective observational study evaluated the agreement between ChatGPT-5 and two expert endoscopists in interpreting colonoscopic images of colorectal polyps. A total of 45 images were analyzed. ChatGPT-5 was prompted to assign Paris and NICE classifications and to predict neoplasia. The primary outcome was the level of agreement between ChatGPT-5 and experts across classification systems. Secondary outcomes included diagnostic accuracy for neoplasia prediction.

Results

ChatGPT-5 demonstrated moderate agreement with endoscopists for neoplasia prediction (κ = 0.42), but lower agreement for Paris (κ = 0.31) and NICE classification (κ = 0.39). Diagnostic performance showed an overall accuracy of 71.1 %, sensitivity 76.0 %, specificity 66.7 %, PPV 72.0 %, NPV 71.0 %, and an AUC of 0.74. Notably, ChatGPT-5 struggled with flat lesions and subtle vascular patterns, leading to discrepancies in classification compared with experts.

Conclusions

ChatGPT-5 demonstrates moderate accuracy in the visual interpretation of colorectal polyp images but remains inferior to expert assessment, particularly for detailed morphological classifications. While its performance indicates potential as an adjunctive support tool, significant limitations persist, and further refinement is required before integration into routine colonoscopic decision-making.

Download the app

The congress at your fingertips

Aims

Methods

Results

Conclusions