Correcting the Record: No NEJM 2024 RCT Exists for Lunit INSIGHT MMG
A persistent search query circulates among radiologists and health system researchers: a "NEJM 2024 RCT" evaluating Lunit INSIGHT MMG in breast cancer screening. That publication does not exist. No randomized controlled trial of Lunit INSIGHT MMG has appeared in the New England Journal of Medicine or its affiliated journals, and no such trial result was identified in the primary literature as of mid-2026.
The confusion likely stems from two sources. First, the MASAI trial — the only published full RCT in breast cancer screening AI — appeared in The Lancet in January 2026 and attracted wide attention. It used Transpara AI (ScreenPoint Medical), not Lunit. Second, Lunit INSIGHT MMG's prospective evidence is genuinely significant — published in high-impact journals and covering large populations — but neither study is an RCT, and neither appeared in NEJM.
The actual Lunit evidence base comprises the ScreenTrustCAD prospective paired-reader study (Lancet Digital Health, 2023, n=55,581) and the AI-STREAM preliminary analysis (Nature Communications, 2025, n=24,543). Both are prospective, multi-reader designs with meaningful clinical findings — but they are not RCTs, and their generalizability is bounded by single-country settings and specific equipment configurations. The remainder of this article covers what those studies actually show.
Clinical Problem: Workforce Shortages, Missed Cancers, and the Case for AI in Mammography Screening

Organized breast cancer screening programs face a structural tension. The clinical case for double reading — having two radiologists independently review each mammogram before consensus — is supported by evidence showing it increases cancer detection rates compared with single reading. But double reading is resource-intensive, and radiologist workforce shortages are acute in many health systems. Programs that cannot sustain double reading default to single reading, often by general radiologists rather than breast imaging specialists.
This creates several compounding problems that AI-assisted screening is designed to address:
- Interval cancers — tumors diagnosed between scheduled screens — remain a persistent quality indicator. Reducing interval cancer rates requires either more sensitive reading or earlier detection at the screening visit itself.
- Performance variability between breast imaging specialists and general radiologists is well documented. General radiologists reading lower volumes of mammograms tend to show lower cancer detection rates and higher recall rates than specialist readers.
- Workload pressure on specialist radiologists reduces reading time per case and may affect detection sensitivity, particularly for subtle findings.
- Double-reading programs, while clinically valuable, require approximately twice the radiologist time per case — a resource most screening programs cannot sustain as screening volumes grow.
AI-CAD systems like Lunit INSIGHT MMG are positioned to address these gaps — either by replacing one radiologist in a double-read workflow, augmenting a single reader's performance, or eventually triaging cases by risk level. Whether they actually deliver on that positioning is an evidence question, not a marketing one.
What Is Lunit INSIGHT MMG: AI Approach, Output, and Regulatory Status
Lunit INSIGHT MMG is a deep learning computer-aided detection and diagnosis (CADe/x) system for 2D digital screening mammography. It analyzes standard mammographic views and generates a per-lesion abnormality score on a 0–100 scale, with higher scores indicating greater likelihood of malignancy. The system also produces lesion localization heatmaps that overlay on the mammographic image, allowing radiologists to identify which regions the algorithm has flagged and at what confidence level.
The system is designed to integrate with existing PACS infrastructure and radiology viewers. It supports both single-read and double-read workflow configurations, functioning as a concurrent second-reader or as a CAD overlay during standard reading sessions.
| Authorization | Status | Notes |
|---|---|---|
| FDA 510(k) | Cleared — K211678 (2021) | CADe/x for screening mammography |
| CE Mark | Confirmed | European conformity for medical device use |
| Health Canada | Authorized (2022) | Canadian market authorization |
| Australia (BreastScreen NSW) | National program deployment | Operational in national screening program |
| Sweden | National program deployment | Operational in organized screening |
| Iceland, Singapore, Saudi Arabia, Qatar | National program deployment | Confirmed as of Q2 2026 |
Evidence Quality Overview: What Study Types Support This Product
As of mid-2026, the published evidence base for Lunit INSIGHT MMG includes two prospective studies, one large retrospective simulation study, and one ongoing real-world evaluation trial. No randomized controlled trial specific to this product has been published.
| Study | Design | N | Setting | Publication | Status |
|---|---|---|---|---|---|
| ScreenTrustCAD | Prospective paired-reader non-inferiority | 55,581 | Single Swedish center, double-read program | Lancet Digital Health, 2023 | Published — full results |
| AI-STREAM | Prospective multicenter single-read cohort | 24,543 | Six Korean academic hospitals | Nature Communications, 2025 | Preliminary analysis — final results pending post-2026 |
| Danish Population Simulation | Retrospective simulation | 249,402 | Danish national mammography database | Radiology AI, 2024 | Published — simulation only |
| NCT06232070 | Real-world evaluation trial | Not disclosed | Ongoing | Not yet published | Active — no results available |
The two prospective studies differ in workflow context: ScreenTrustCAD examined AI in a double-read program (replacing one of two radiologists), while AI-STREAM examined AI as a CAD assistant in a single-read program. These are not equivalent deployment scenarios, and findings from one do not directly transfer to the other.
ScreenTrustCAD Trial (Lancet Digital Health, 2023): Double-Read Setting Evidence
ScreenTrustCAD is the foundational prospective study for Lunit INSIGHT MMG in double-read screening programs. The trial enrolled 55,581 women attending a population-based mammography screening program in Stockholm, Sweden, using Philips mammography equipment. Radiologists participating in the study had a median of 17 years of breast imaging experience — a notably experienced cohort relative to many real-world screening programs.
The study design was a prospective paired-reader non-inferiority trial. Each mammogram was read under three conditions: standard double reading by two radiologists, double reading by one radiologist plus Lunit INSIGHT MMG, and standalone AI reading. The primary outcome was cancer detection rate (CDR); recall rate was a key secondary outcome.
| Reading Configuration | Cancers Detected | CDR (per 1,000) | Recall Rate | Workload vs. Standard |
|---|---|---|---|---|
| Two radiologists (standard double reading) | 250 | ~4.50 | 2.93% | Baseline |
| One radiologist + Lunit AI | 261 | ~4.68 | 2.80% | ~50% reduction in radiologist reads |
| Standalone Lunit AI | Non-inferior to 2-radiologist reading | — | Not reported as superior | ~100% radiologist reads eliminated (not viable) |
The primary finding was that one radiologist plus Lunit AI was not only non-inferior but statistically superior to two-radiologist double reading in cancer detection rate: 261 versus 250 screen-detected cancers, a relative proportion of 1.04 (95% CI 1.00–1.09; p=0.017 for superiority). Recall rate was simultaneously 4% lower (2.80% vs. 2.93%). Replacing one radiologist with AI would, in a population of 100,000 screened women, eliminate approximately 100,000 radiologist reads while increasing consensus discussions by approximately 1,562 cases.
Standalone AI performance was non-inferior to two-radiologist double reading in CDR but was not superior. The authors noted that standalone AI raises unresolved questions around medical-legal responsibility, public acceptability, and the absence of a human clinical decision-maker — factors that preclude standalone deployment in current regulatory and clinical frameworks.
AI-STREAM Preliminary Analysis (Nature Communications, 2025): Single-Read Setting Evidence
AI-STREAM is the first large-scale, multicenter prospective study of AI-CAD in a single-read mammography setting. The preliminary analysis enrolled 24,543 women across six Korean academic hospitals, with Lunit INSIGHT MMG version 1.1.7.1 deployed at a positive threshold of ≥10. The study compared cancer detection rates and recall rates with and without AI-CAD assistance for breast radiologists reading in their standard single-read workflow.
The primary prospective finding for breast radiologists was a 13.8% higher cancer detection rate with AI-CAD (5.70 per 1,000 screened) compared with reading without AI (5.01 per 1,000; p<0.001). Critically, this improvement was achieved without a statistically significant change in recall rate (p=0.564) — a finding that addresses the central concern that AI assistance would increase false positives and unnecessary callbacks.
| Reader Group / Condition | CDR (per 1,000) | Change vs. No AI | Recall Rate Change | Data Type |
|---|---|---|---|---|
| Breast radiologists — no AI | 5.01 | Baseline | Baseline | Prospective primary outcome |
| Breast radiologists — with Lunit AI-CAD | 5.70 | +13.8% (p<0.001) | No significant change (p=0.564) | Prospective primary outcome |
| General radiologists — no AI (simulation) | ~4.76 | Baseline | ~6.31% | Retrospective simulation sub-study |
| General radiologists — with Lunit AI-CAD (simulation) | ~6.02 | +26.4% | ~6.89% (significant increase) | Retrospective simulation sub-study |
| Standalone Lunit AI | 5.21 | Non-inferior to specialists (p=0.752) | Significantly higher than specialists | Prospective comparison arm |
The standalone AI arm showed CDR non-inferior to breast specialist radiologists (5.21 vs. 5.01 per 1,000; p=0.752), but with significantly higher recall rates than specialists. This pattern — adequate cancer detection but excess recalls — is consistent with ScreenTrustCAD's standalone findings and reinforces that standalone AI is not currently viable as a sole reader in organized screening programs.
Real-World Deployment Evidence: Danish Population-Wide Simulation

A retrospective simulation study published in Radiology AI (2024) applied Lunit INSIGHT MMG v1.1.7.1 to 249,402 Danish mammograms to model how different AI integration positions within the double-read workflow would affect accuracy and workload compared with standard two-radiologist double reading.
Three AI-integrated configurations were modeled:
| Configuration | Description | Workload Reduction | Key Accuracy Finding |
|---|---|---|---|
| AIfirst | AI replaces the first radiologist; second radiologist reviews all cases plus AI output | ~49% | No significant difference in CDR, sensitivity, or specificity vs. standard double reading; higher arbitration rate |
| AIsecond | AI replaces the second radiologist; first radiologist reviews all cases, AI provides second opinion | ~49% | Recall rate reduced, but sensitivity decreased by 1.58% (p<0.001) — a meaningful accuracy penalty |
| AItriage | AI triages cases to single or double reading based on risk score; high-risk cases get two readers | ~50% | Higher CDR, sensitivity, and PPV than standard double reading |
Known Limitations Across the Evidence Base
The Lunit INSIGHT MMG evidence base is among the strongest for any commercial mammography AI system as of mid-2026, but it carries specific limitations that clinicians and administrators must weigh before deployment decisions.
- Single-country generalizability: ScreenTrustCAD was conducted at a single Swedish center using Philips mammography equipment. AI-STREAM was conducted at six Korean academic hospitals. Neither study has been replicated across diverse healthcare systems, equipment vendors, or screening program structures.
- Radiologist experience dependency: ScreenTrustCAD radiologists had a median 17 years of breast imaging experience — a highly specialized cohort. AI performance in the context of less experienced readers may differ from what ScreenTrustCAD demonstrated.
- Threshold calibration uncertainty: ScreenTrustCAD aimed for a 2% CDR increase but observed 4–6%. Retrospective calibration does not reliably predict prospective operating points. Ongoing calibration in clinical use is likely necessary.
- Automation bias — experience-dependent risk: A Korean Journal of Radiology editorial (June 2026) synthesizing MASAI and AI-STREAM findings noted that experienced radiologists in MASAI showed no automation bias (specificity unchanged), while general radiologists in AI-STREAM showed a statistically significant recall rate increase — suggesting that automation bias risk is reader-experience-dependent and higher in lower-volume or less specialized settings.
- Overdiagnosis concern — increased DCIS detection: Both ScreenTrustCAD and AI-STREAM showed increased detection of ductal carcinoma in situ (DCIS) with AI assistance. Increased DCIS detection raises unresolved questions about overdiagnosis — the detection and treatment of cancers that would not have caused clinical harm during a patient's lifetime. Long-term follow-up data sufficient to address this question are not yet available.
- No published mortality endpoint: Neither ScreenTrustCAD nor AI-STREAM has reported breast cancer mortality data. CDR and recall rate are process measures, not mortality outcomes. Whether improved CDR with AI translates to reduced mortality from breast cancer remains undemonstrated for this product.
- AI-STREAM is a preliminary analysis: Final AI-STREAM results with 2-year follow-up and National Cancer Registry linkage are expected after 2026. The preliminary findings should not be treated as the study's definitive conclusions.
Clinical Implications by Setting: How Workflow Design and Reader Experience Shape AI Value
The evidence does not support a single universal recommendation for how Lunit INSIGHT MMG should be deployed. The clinical implications differ substantially based on screening program structure, reader experience, and where AI is positioned in the reading workflow.
| Deployment Scenario | Evidence Source | Expected Benefit | Key Risk to Monitor |
|---|---|---|---|
| Double-read program: AI replaces second radiologist | ScreenTrustCAD (Lancet Digital Health 2023) | Maintained or superior CDR, reduced recall rate, ~50% workload reduction | Threshold calibration drift; arbitration rate increase |
| Single-read program: breast specialist with AI-CAD | AI-STREAM prospective arm (Nature Communications 2025) | 13.8% higher CDR, no significant recall rate change | DCIS overdetection; final AI-STREAM results pending |
| Single-read program: general radiologist with AI-CAD | AI-STREAM simulation sub-study (Nature Communications 2025) | 26.4% CDR increase (simulation) | Significant recall rate elevation; automation bias risk; simulation data only — not prospectively validated |
| Standalone AI as sole reader | ScreenTrustCAD, AI-STREAM | CDR non-inferior to specialists in both studies | Significantly higher recall rates; unresolved medical-legal responsibility; not viable as sole reader in current frameworks |
For programs deploying AI alongside general radiologists, the KJR 2026 editorial recommends educational programs for correct AI use and systematic post-deployment monitoring of recall rates. The recall rate elevation observed in the AI-STREAM general radiologist simulation — from 6.31% to 6.89% — may reflect automation bias: readers accepting AI flags without sufficient independent assessment. This risk is not theoretical; it appears to be reader-experience-dependent and requires active management.
Deployment Stage and Ongoing Research
Lunit INSIGHT MMG is in broad clinical use across multiple national screening programs as of Q2 2026, including BreastScreen NSW in Australia, organized screening programs in Sweden and Iceland, and national programs in Singapore, Saudi Arabia, and Qatar. The product holds FDA 510(k) clearance (K211678), CE mark, and Health Canada authorization, and is reported to be deployed across more than 4,800 medical institutions globally.
A real-world evaluation trial (ClinicalTrials.gov: NCT06232070) is active but has not published results. No interim findings, enrollment targets, or primary endpoints from this trial are available for characterization at this time.
Final AI-STREAM results — including 2-year follow-up data linked to the National Cancer Registry — are expected after 2026. Those results will be critical for assessing whether the CDR improvements observed in the preliminary analysis translate to clinically meaningful outcomes, and whether the DCIS overdetection signal resolves into a net benefit or raises persistent overdiagnosis concerns.
- Evidence gaps that remain before mortality-endpoint conclusions can be drawn: no published randomized controlled trial specific to Lunit INSIGHT MMG; no long-term follow-up data resolving the DCIS overdetection question; no prospective validation of general radiologist performance with AI-CAD (current data are simulation-only); no published data from the NCT06232070 real-world evaluation.
- The MASAI trial (The Lancet, January 2026) — using Transpara AI, not Lunit — remains the only published full RCT in breast cancer screening AI and provides the only available RCT-level evidence that AI-supported screening can reduce interval cancer rates. That evidence does not transfer directly to Lunit INSIGHT MMG without product-specific RCT data.
- Institutions deploying Lunit INSIGHT MMG in organized screening programs should implement prospective performance monitoring from the outset, with attention to CDR, recall rate, DCIS detection proportion, and reader-level automation bias indicators — particularly where general radiologists are the primary readers.

Comments
Join the discussion with an anonymous comment.