Aims
To establish objective metrics for retrospective CADe evaluation and use them to directly compare Fujifilm CAD EYE with Maia Labs ColoMAIA-2 on identical colonoscopy recordings.
Methods
This retrospective single-center study analyzed video data from 150 colorectal cancer screening colonoscopies (1945 minutes) performed by three expert endoscopists (>10 years’ experience). A total of 189 clinically significant polyps were identified in the recordings. CAD EYE operated in real time during procedures, while ColoMAIA-2 was applied offline. Outcome measures included: 1) detection delay from first polyp appearance; 2) delay-specific sensitivities at 100–5000 ms; 3) False alarms, expressed as false detections per minute; 4) percentage of video time with false alarms. False alarms were assessed only during active mucosal inspection (excluding polyp examinations and polypectomies). Bootstrap resampling was used to compute confidence intervals and p-values.
Results
Across all short delays, ColoMAIA-2 achieved substantially higher sensitivity than CAD EYE. At 100 ms, sensitivity was 47.6% vs. 16.9%; at 200 ms, 67.2% vs. 33.9%; at 300 ms, 78.8% vs. 45.5%. At the clinically meaningful 500-ms delay, ColoMAIA-2 reached a sensitivity 86.8% (CI 81.5–91.5) compared with 62.4% (CI 55.6–69.3) for CAD EYE (p<0.001). Mean detection delay was significantly shorter with ColoMAIA-2 (382 ms, CI 260–522) than with CAD EYE (814 ms, CI 650–985; p<0.001). False-alarm frequency differed markedly. ColoMAIA-2 generated only 3.09 false alarms/min and 1.44% false-alarm time (all p<0.001), whereas CAD EYE produced 37.4 false alarms/min and 4.67% of video time with false alarms. This represents roughly a twelve-fold reduction in individual false-alarm events. Qualitatively, CAD EYE produced a large number of false alarms in the presence of specific stool residue and blood.
Conclusions
In identical colonoscopy recordings, ColoMAIA-2 delivered markedly faster polyp alerting and dramatically fewer false alarms compared with CAD EYE. These differences concern parameters known to influence adenoma detection and procedural flow. The proposed metrics (delay-specific sensitivities, false-alarms) provide a robust framework for clinically meaningful retrospective benchmarking of CADe systems.