Aims
Capsule endoscopy (CE) is an accepted modality to investigate the small bowel (SB). Reading and reporting of CE however can be time consuming. Previous studies have showed that even in the hands of experts, reader attention span drops after just one study. Artificial intelligence has exploded in the field of medicine including within CE. The TOP100 is a feature on the CE software which identifies the 100 most significant images from the entire video which may help to reduce physician reporting time. Previous studies1-3 have shown good diagnostic performance, especially for active bleeding and inflammatory lesions, and good agreement between standard human reading (SR) and TOP100. The aim of this study is to evaluate the agreement in findings and patient management between the same reader and different readers with SR and TOP100.
Methods
A retrospective analysis was conducted to compare SR of an entire video with TOP100 images identified by the software. The two experienced physician experts and senior trainee had all read >500 lifetime of CE videos. The 2 experts who reported the original video were asked to review TOP100 images which was in a short video format in a blinded fashion. The senior trainee was asked to read TOP100 which was subsequently compared to the expert standard read. Comparisons were made on significant findings and main diagnosis and subsequent suggested management. Secondary outcomes were a comparison of reading time between standard read and TOP100 reads and interobserver variability of CE readers between the standard human versus TOP100 reads. A comparison was also made between TOP100 by the trainee compared to the physician read.
Results
A total of 50 CE videos were reviewed by each expert with TOP100. Findings were recorded and patient management was suggested based on TOP100 findings and the CE indication. Each expert reviewed 25 videos of suspected SB bleeding and 25 videos of known or suspected inflammatory bowel disease (IBD). Both expert readers had a substantial intra-observer agreement (K 0.61-0.80) between SR and TOP100 for ulcers and active bleeding. One reader had only a fair (K 0.37, p <0.01) intra-observer agreement for angiodysplasias while the other reader had a moderate agreement (K 0.68, p <0.01) for the same category. Agreement on patient management based on TOP100 readings was similar for the two experts (75% and 78%). The inter-observer agreement between the senior trainee and the expert readers when using TOP100 was substantial for ulcers (K 0.90, p <0.01), angiodysplasias (K 0.61, p <0.01) and with an overall diagnostic yield (K 0.69, p <0.01). Agreement was moderate for active bleeding (K 0.54, p <0.01) and erosions (K 0.58, p <0.01). The mean reading time per CE video with TOP100 was 1m48s vs 40min for SR.
Conclusions
The TOP 100 has shown moderate intra-observer agreement compared to standard reading for the final CE diagnosis and subsequent change in management of the two expert readers. On comparison between the TOP100 read of the senior trainee and the expert read- there was a moderate to substantial agreement, suggesting that the TOP100 could be an adjunct to trainees in reporting of CE. Further studies with a larger pool of differing experience of trainees would help substantiate our study findings.