Aims
Artificial intelligence (AI)–assisted polyp characterization during colonoscopy may support real-time decisions on resection strategy and surveillance intervals. Several systems are now available, but head-to-head comparative data remain limited.
Our aim was to compare the diagnostic performance of two CADx systems (MagentiQ and PolypBrain) for differentiating neoplastic from non-neoplastic colorectal polyps during colonoscopy, using histology as the gold standard.
Methods
We performed a retrospective single-centre analysis of 62 colorectal polyps with available histology and AI-based predictions from both systems. Histopathology was classified as neoplastic (adenomas, sessile serrated adenomas/lesions, adenocarcinoma, high-grade neoplasia) or non-neoplastic (hyperplastic and juvenile polyps, inflammatory/IBD-related changes, normal mucosa). For MagentiQ, outputs labelled “Neo” / “Neo–Neo” were considered neoplastic; “Non-Neo”, “Inconclusive” or “not found” were considered non-neoplastic. For PolypBrain, any output containing “adenoma” was classified as neoplastic; all other outputs (including “too small / too bloody / not classified”) were considered non-neoplastic. We calculated sensitivity, specificity, positive and negative predictive values (PPV, NPV) and overall accuracy, and compared paired classifier performance using McNemar’s test.
Results
Histology identified 46/62 (74.2%) lesions as neoplastic. For MagentiQ, sensitivity was 67.4% (31/46), specificity 68.8% (11/16), PPV 86.1% (31/36), NPV 42.3% (11/26), and accuracy 67.7% (42/62). For PolypBrain, sensitivity was 73.9% (34/46), specificity 62.5% (10/16), PPV 85.0% (34/40), NPV 45.5% (10/22), and accuracy 71.0% (44/62). PolypBrain correctly classified 14 lesions that MagentiQ misclassified, whereas MagentiQ correctly classified 12 lesions misclassified by PolypBrain (McNemar p = 0.85).
Conclusions
Both AI systems achieved high PPV but only moderate sensitivity for identifying neoplastic colorectal polyps. PolypBrain showed slightly higher sensitivity and overall accuracy at the expense of somewhat lower specificity, while overall performance differences were not statistically significant. These findings support the potential of AI-assisted polyp characterization but highlight the need for the opinion of experts as AI systems remain suboptimal.