Ultrasound-based deep learning in the establishment of a breast lesion risk stratification system: a multicenter study

作者:Gu, Yang; Xu, Wen; Liu, Ting; An, Xing; Tian, Jiawei; Ran, Haitao; Ren, Weidong; Chang, Cai; Yuan, Jianjun; Kang, Chunsong; Deng, Youbin; Wang, Hui; Luo, Baoming; Guo, Shenglan; Zhou, Qi; Xue, Ensheng; Zhan, Weiwei; Zhou, Qing; Li, Jie; Zhou, Ping; Chen, Man; Gu, Ying; Chen, Wu; Zhang, Yuhong; Li, Jianchu; Cong, Longfei; Zhu, Lei; Wang, Hongyan*; Jiang, Yuxin*
来源:European Radiology, 2023, 33(4): 2954-2964.
DOI:10.1007/s00330-022-09263-8

摘要

Objectives To establish a breast lesion risk stratification system using ultrasound images to predict breast malignancy and assess Breast Imaging Reporting and Data System (BI-RADS) categories simultaneously. Methods This multicenter study prospectively collected a dataset of ultrasound images for 5012 patients at thirty-two hospitals from December 2018 to December 2020. A deep learning (DL) model was developed to conduct binary categorization (benign and malignant) and BI-RADS categories (2, 3, 4a, 4b, 4c, and 5) simultaneously. The training set of 4212 patients and the internal test set of 416 patients were from thirty hospitals. The remaining two hospitals with 384 patients were used as an external test set. Three experienced radiologists performed a reader study on 324 patients randomly selected from the test sets. We compared the performance of the DL model with that of three radiologists and the consensus of the three radiologists. Results In the external test set, the DL model achieved areas under the receiver operating characteristic curve (AUCs) of 0.980 and 0.945 for the binary categorization and six-way categorizations, respectively. In the reader study set, the DL BI-RADS categories achieved a similar AUC (0.901 vs. 0.933, p = 0.0632), sensitivity (90.98% vs. 95.90%, p = 0.1094), and accuracy (83.33% vs. 79.01%, p = 0.0541), but higher specificity (78.71% vs. 68.81%, p = 0.0012) than those of the consensus of the three radiologists. Conclusions The DL model performed well in distinguishing benign from malignant breast lesions and yielded outcomes similar to experienced radiologists. This indicates the potential applicability of the DL model in clinical diagnosis.

  • 单位
    山东大学; 上海交通大学; 中山大学; 1; 复旦大学; 哈尔滨医科大学; 吉林大学; 华中科技大学; 中国医学科学院北京协和医院; 武汉大学; 中国医科大学; 西安交通大学