Aims
Contrast-enhanced endoscopic ultrasound (CE-EUS) is increasingly used for characterizing pancreatic and gastrointestinal lesions, yet its qualitative interpretation remains highly operator-dependent. Standardized frameworks for reading CE-EUS are lacking, and the reproducibility of key enhancement features across centers and operator profiles is largely unknown. This study evaluated interobserver agreement (IOA) for qualitative CE-EUS parameters among European endosonographers with different levels of experience and explored operator- and video-level factors associated with agreement.
Methods
This multicenter retrospective study included 24 endosonographers from 16 European centers who independently evaluated 60 anonymized CE-EUS video clips representing solid, cystic, and mixed lesions. Participants assessed lesion type, arterial enhancement, wash-in and wash-out timing, contrast homogeneity, and overall video quality. Fleiss’ kappa (κ) quantified IOA. Univariate and multivariate logistic regression analyses examined associations between IOA and operator expertise, annual EUS volume, confidence with CE-EUS, familiarity with Pentax equipment, lesion type, and video quality. Qualitative interpretations were also compared with quantitative time–intensity curve (TIC) analysis as the reference standard.
Results
IOA varied markedly across CE-EUS parameters. The highest agreement was observed for lesion type (κ = 0.47; 95% CI 0.44–0.50), while arterial enhancement (κ = 0.38; 95% CI 0.35–0.41), wash-in time (κ = 0.27; 95% CI 0.24–0.30), wash-out time (κ = 0.21; 95% CI 0.18–0.24), and contrast homogeneity (κ = 0.24; 95% CI 0.21–0.27) all achieved only fair agreement. Video quality reached the lowest reproducibility (κ = 0.14; 95% CI 0.11–0.17). At univariate analysis, higher agreement for wash-in and wash-out timing was associated with operator expertise (β = 0.149, p < 0.001; β = 0.111, p = 0.0026) and high-volume centers (β = 0.136, p < 0.001). At multivariate analysis, expertise remained an independent predictor of agreement for wash-in (p = 0.040) and video quality (p = 0.011). Confidence with CE-EUS modestly improved agreement for wash-out time (p < 0.001). No operator- or video-level covariates significantly influenced agreement for arterial enhancement, contrast homogeneity, or lesion type. Compared with TIC-derived quantitative measures, accuracy of qualitative CE-EUS interpretation ranged widely: video quality 79.6%, wash-in 62.2%, contrast homogeneity 54.8%, wash-out 54.6%, arterial enhancement 42.9%, and lesion type 35.6%. Accuracy did not differ significantly between experts and non-experts (all p > 0.05).
Conclusions
Across a broad European cohort, qualitative CE-EUS interpretation demonstrated only fair or slight agreement for dynamic enhancement parameters, regardless of operator experience. These findings indicate that the visual, qualitative nature of CE-EUS inherently limits reproducibility and that expertise alone does not sufficiently mitigate variability. The lack of strong predictors of agreement underscores the need for structured interpretive frameworks and objective quantification tools, such as real-time TIC analysis or AI-assisted vascular profiling, to support more consistent CE-EUS evaluation and facilitate its integration into standardized diagnostic algorithms.