In a recent prospective investigation, detailed in the Lancet Digital Health journal, a comprehensive evaluation was conducted on the potential of artificial intelligence (AI) to serve as a viable alternative for the secondary examination of screening mammography. The study encompassed data from a total of 55,581 women who underwent mammography screening at a hospital in Sweden between April 1, 2021, and June 9, 2022. The primary goal was to assess AI’s effectiveness, specifically Lunit Insight MMG version 1.1.6, Lunit, when used as a secondary reader for mammography examinations. The study authors undertook various comparisons, including double reading by two radiologists, double reading by one radiologist alongside AI, solitary AI reading, and triple reading involving two radiologists and AI.
Among the participants, the study identified 6,002 cases with abnormal mammography findings, leading to the recall of 1,716 women for further investigation following consensus discussions. Within this group, 269 cases were ultimately diagnosed as breast cancer, with 200 patients presenting invasive breast cancer and 63 cases identified as ductal carcinoma in situ.
Comparatively, the study found that double reading by two radiologists detected 250 cases of breast cancer, while double reading with one radiologist and AI identified 261 cases. Single AI reading successfully diagnosed 246 of the breast cancer cases, demonstrating non-inferiority to double radiologist reading. The most notable result was seen in the triple reading approach involving two radiologists and AI, which detected breast cancer in 269 cases, earning it the superior status among the strategies examined by the researchers.
The integration of AI as a secondary reader in mammography led to a 21 percent increase in the detection of abnormal findings, as highlighted in the study. It is important to note, however, that subsequent consensus discussions, which factored in the patient’s medical history along with the review of mammography and AI findings, managed to reduce the recall rate by 4 percent in comparison to double reading by two radiologists.
The study’s lead author, Dr. Karin Dembrower, Head Physician in the Department of Breast Radiology at Capio Sankt Gorans Hospital in Stockholm, Sweden, and her colleagues, emphasized the effectiveness of consensus discussions in mitigating the impact of the higher abnormal interpretation rate associated with AI and one radiologist. In a hypothetical screening population of 100,000 women, replacing one radiologist with AI would result in saving 100,000 radiologist reads, albeit increasing consensus discussions by 1,562. Even if these discussions took five times longer than an independent read, the reduction in workload would be significant.
The researchers also noted that the 11 radiologists involved in the study had a median experience of 17 years in the field. Despite the slightly higher breast cancer detection rate achieved through the triple reading approach compared to double reading with radiologists, the study authors emphasized the accompanying 50 percent increase in consensus discussions and a 5 percent higher recall rate. They cautioned that the additional workload for radiologists and heightened concerns for patients must be balanced against the incremental increase in cancer detection.
In terms of study limitations, the researchers acknowledged that basing the threshold for AI abnormality detection on retrospective study data may not be optimal. They suggested that subsequent calibration might be necessary to establish a viable abnormality threshold in clinical practice. Furthermore, the single-arm paired design of the study prevented a direct comparison of interval breast cancer rates between the various reader strategies assessed in the research.