Aims
Small bowel capsule endoscopy preparation quality is typically graded qualitatively and is largely subjective and subject to inter-observer variability. Quantitative scoring systems, such as the Brotz scale, exist but are cumbersome to implement. Deep learning models are being developed to grade quality from capsule video frames. However, these tools will be limited to commercial software and may not be readily accessible. We aimed to develop a novel, open access, reliable clinical tool to objectively grade capsule endoscopy cleanliness by analysing the colour progress bar displayed by the capsule reader’s software.
Methods
We developed an AI algorithm deployed via ChatGPT that processes screenshots of the colour progress bar that is automatically generated by capsule reader software. This bar represents the dominant colour of each video frame throughout the small bowel transit and we hypothesised that analysis of these colours could provide an objective surrogate for preparation quality. To train the algorithm, a human reader labelled multiple individual frames from real studies as having either good or poor mucosal visibility. For each labelled frame, the dominant mucosal colour was sampled, creating a library of colours associated with good or poor views. Using this library, the algorithm defined rules describing which colours typically represent good and poor visibility. When a screenshot of a colour bar is uploaded, the algorithm applies these rules to classify each small portion of the bar as good or poor. While several metrics can be generated from this, we evaluated the performance of the overall fraction of poor-classified portions (the PoorFrac score, range 0–1). We selected a cohort of small bowel capsule studies with both human adequacy grading and Brotz-score adequacy grading (inadequate was defined as a Brotz score < 10). Each colour bar was processed by the algorithm and their PoorFrac scores were generated. The relationship between the PoorFrac score and both human grade and Brotz-score were assessed using Pearson’s r, point-biserial Correlation, and ROC analysis.
Results
A total of 54 studies were analysed, with a human grading of adequate in 32 (59%) and inadequate in 22 (41%). Median Brotz score for this cohort was 12 (range 2–18). There was a strong correlation between human adequacy grading and Brotz score (r = 0.68, 95% CI 0.45–0.83; p < 0.001). The AI-automated PoorFrac score was strongly negatively correlated with the Brotz score (r = −0.69; 95% CI −0.83 to −0.46; p < 0.001) and strongly positively correlated with the human adequacy grade (r = 0.59; 95% CI 0.38 to 0.74; p < 0.001). A PoorFrac cutoff of 0.28 yielded an AUC of 0.836, with 70% sensitivity and 88% specificity for identifying inadequate studies (Brotz score < 10) on ROC analysis.
Conclusions
Our AI algorithm offers a simple method to objectively assess capsule endoscopy preparation quality from the colour progress bar. The score from this approach correlated well with the established Brotz scale. Pending further refinement and validation of the algorithm, this could be a highly useful and widely accessible clinical tool.