Predictive performance of radiomic models based on features extracted from pretrained deep networks

Objectives: In radiomics, generic texture and morphological features are often used for modeling. Recently, features extracted from pretrained deep networks have been used as an alternative. However, extracting deep features involves several decisions, and it is unclear how these affect the resulting models. Therefore, in this study, we considered the influence of such choices on the predictive performance.

Methods: On ten publicly available radiomic datasets, models were trained using feature sets that differed in terms of the utilized network architecture, the layer of feature extraction, the used set of slices, the use of segmentation, and the aggregation method. The influence of these choices on the predictive performance was measured using a linear mixed model. In addition, models with generic features were trained and compared in terms of predictive performance and correlation.

Results: No single choice consistently led to the best-performing models. In the mixed model, the choice of architecture (AUC + 0.016; p < 0.001), the level of feature extraction (AUC + 0.016; p < 0.001), and using all slices (AUC + 0.023; p < 0.001) were highly significant; using the segmentation had a lower influence (AUC + 0.011; p = 0.023), while the aggregation method was insignificant (p = 0.774). Models based on deep features were not significantly better than those based on generic features (p > 0.05 on all datasets). Deep feature sets correlated moderately with each other (r = 0.4), in contrast to generic feature sets (r = 0.89).

Conclusions: Different choices have a significant effect on the predictive performance of the resulting models; however, for the highest performance, these choices should be optimized during cross-validation.


Citation style:
Could not load citation form.


Use and reproduction:
This work may be used under a
CC BY 4.0 LogoCreative Commons Attribution 4.0 License (CC BY 4.0)