AI for Diabetic Retinopathy Screening: RCT Evidence, FDA-Cleared Devices, and Real-World Deployment Gaps

Systematic review

A structured analysis of the randomized controlled trial evidence, diagnostic accuracy meta-analysis, and real-world performance data for autonomous AI diabetic retinopathy screening systems — covering the three FDA-cleared devices, key heterogeneity drivers, and unresolved deployment gaps as of Q2 2026. Intended for clinicians, researchers, and health system professionals evaluating or monitoring this technology.

The Diabetic Retinopathy Screening Care Gap

Diabetic retinopathy (DR) is the leading cause of preventable blindness in working-age adults in the United States. Annual fundus examination is the standard of care for people with diabetes, yet adherence to that recommendation falls below 50% nationally — and in some populations, care-gap closure rates are as low as 15%. The failure is not primarily clinical; it is structural. Access barriers — transportation, specialist availability, appointment wait times, and fragmented communication between primary care and ophthalmology — drive most of the gap.

The burden is not evenly distributed. Racial and ethnic minorities, adults covered by Medicaid, and patients receiving care in federally qualified health centers (FQHCs) or rural primary care settings face disproportionately higher rates of unscreened DR and vision loss from preventable disease. This equity dimension shapes much of the current research agenda for AI-based screening interventions.

Autonomous AI screening systems represent one intervention approach: moving the diagnostic step to the point of primary care contact, eliminating the referral-to-specialist bottleneck for the initial screening decision. For readers seeking a broader view of how AI is being applied across medical specialties, the site's overview of AI research evidence across medicine provides multi-specialty context. This brief stays anchored to the specific task of autonomous AI DR screening using point-of-care fundus photography.

How Autonomous AI Is Applied to This Screening Task

The clinical workflow for autonomous AI DR screening is distinct from traditional AI-assisted reading, where an algorithm flags images for physician review. In the autonomous model, no ophthalmologist or reading physician is involved in the screening decision. A medical assistant or trained clinic staff member operates a nonmydriatic fundus camera at the point of primary care. The camera captures retinal images, which are transmitted to the AI system. Within approximately 60 seconds, the system returns one of three outputs:

  • Diabetic eye disease (DED) detected — refer to an eye care provider.
  • DED not detected — rescreen in 12 months.
  • Insufficient image quality — the exam cannot be interpreted; the image is ungradable.

The referral or no-referral decision is issued by the AI system, not by a clinician reviewing the image. This is the defining regulatory and clinical feature of these devices: diabetic retinopathy screening is currently the only FDA-cleared indication where an autonomous AI system can issue a diagnosis without clinician interpretation.

Four-step workflow diagram showing point-of-care fundus imaging, AI analysis under 60 seconds, binary branching to no-DR rescreen or DR-detected referral, and follow-up appointment.
The autonomous AI DR screening workflow. No physician interprets the fundus image; the AI output drives the referral decision directly.

FDA Clearance Status: Three Authorized Devices

As of Q2 2026, three autonomous AI systems hold FDA authorization for DR screening in the United States. Each reached market through a different regulatory pathway and carries a distinct evidentiary profile. No head-to-head trial has compared these three systems on the same patient population; cross-study performance comparisons are methodologically non-equivalent due to differing populations, reference standards, and study enrichment strategies.

FDA-cleared autonomous AI diabetic retinopathy screening devices as of Q2 2026. Performance figures for LumineticsCore are from the published pivotal trial (Abramoff et al., npj Digital Medicine 2018). AEYE-DS figures are manufacturer-reported. No head-to-head trial comparing these devices exists.
DeviceManufacturerPathway & DatePivotal Trial PerformanceLabeled PopulationReimbursement
LumineticsCore (formerly IDx-DR)Digital Diagnostics Inc.De Novo DEN180001, April 2018Sensitivity 87.2% (95% CI 81.8–91.2%), specificity 90.7% (95% CI 88.3–92.7%), imageability 96.1% — n=900, 10 US primary care sitesAdults ≥22 yearsCPT 92229; Medicare and Medicaid reimbursement; HEDIS care gap closure; MIPS-eligible
EyeArtEyenuk Inc.510(k) K200667, 2020Published pivotal trial figures available; approximately 5 academic adopters in the USAdultsCPT 92229
AEYE-DSAEYE Health510(k), October 2022Manufacturer-reported sensitivity 93.0%, specificity 91.4% per clinical trial; one image per eye required; limited published independent peer-reviewed accuracy data in primary care settingsAdultsCPT 92229

LumineticsCore was the first autonomous AI diagnostic system authorized by the FDA in any field of medicine. Its pivotal trial enrolled 900 participants across 10 US primary care sites, with 28.6% African American and 16.1% Hispanic participants, using a level 1 prognostic reference standard (Wisconsin FPRC widefield stereoscopic photography plus macular OCT). The trial met all pre-specified superiority endpoints. Logistic regression found no significant effect of age, sex, race, or ethnicity on sensitivity; specificity was higher in subjects over 65.

FDA requires all subsequent 510(k) submissions for autonomous DR screening algorithms to demonstrate non-inferior performance to IDx-DR in a prospective, diverse cohort. National Medicare payment for CPT 92229 was $40.28 in 2024.

RCT Evidence: What Controlled Trials Show

Three randomized controlled trials are relevant to the current evidence base for autonomous AI DR screening. They address different clinical questions — care-gap closure, specialist productivity, and equity-focused FQHC deployment — and each carries specific generalizability constraints that must be understood before applying the findings to deployment decisions.

ACCESS Trial: Care-Gap Closure in a Pediatric Diabetes Cohort

The ACCESS trial (Wolf et al., Nature Communications 2024; NCT05131451) randomized 164 youth with diabetes, ages 8–21, at Johns Hopkins to autonomous AI examination at point-of-care (n=81) or augmented standard-care referral (n=83). The cohort was racially diverse: 35% Black, 47% Medicaid-covered. Diabetic eye exam completion was 100% (95% CI 95.5–100%) in the AI arm versus 22% (95% CI 14.2–32.4%) in the control arm — a 78 percentage point difference (p<0.001). Among the 25 AI-arm participants with abnormal results, 64% completed follow-through with an eye care provider versus 22% in the control arm (p<0.001).

B-PRODUCTIVE: Specialist Productivity in a Demand-Saturated Setting

The B-PRODUCTIVE cluster-RCT (Abramoff et al., npj Digital Medicine 2023; NCT05182580) cluster-randomized 105 clinic days at the Deep Eye Care Foundation in Bangladesh — 51 intervention days (494 patients) and 54 control days (499 patients). The primary endpoint was met: autonomous AI produced 40% higher specialist productivity (1.59 vs. 1.14 encounters per hour per specialist, p<0.001). Approximately 67% of intervention-arm patients with a 'DED absent' output completed their care without seeing a specialist. Complexity-adjusted specialist productivity increased by a factor of 2.65. Patient satisfaction was 100% for time to receive results.

DRES-POCAI: Active FQHC Trial (Protocol Published, Results Pending)

DRES-POCAI is an active randomized controlled trial testing autonomous AI DR screening in federally qualified health centers serving underserved populations. The trial protocol was published in JAMA Network Open in 2025, with recruitment beginning in June 2024. As of Q2 2026, full trial results have not been confirmed as publicly available. Readers and researchers should verify the current publication status before citing completed findings. DRES-POCAI represents the primary active evidence-generation effort for the equity-focused FQHC deployment context.

Summary of key RCTs in autonomous AI diabetic retinopathy screening. Each trial addresses a distinct clinical question; none are directly interchangeable for generalizing performance across the US primary care context.
TrialDesignN / SettingPrimary FindingKey Generalizability Limit
ACCESS (Nat Commun 2024)RCT164 youth, ages 8–21, Johns Hopkins100% exam completion in AI arm vs. 22% control (p<0.001); 64% vs. 22% follow-through after positive resultPediatric cohort; AI not FDA-labeled for ages <22; retina specialist overread required; adult gradability conditions differ
B-PRODUCTIVE (npj DM 2023)Cluster-RCT105 clinic days, Bangladesh, 3 specialists40% higher specialist productivity in AI arm (1.59 vs. 1.14 encounters/hour, p<0.001)Single health system; demand-saturated, no-appointment model; not generalizable to US scheduled outpatient clinics
DRES-POCAI (Protocol: JAMA Network Open 2025)RCT (active)FQHCs; recruitment started June 2024Results not confirmed as published as of Q2 2026Protocol only; verify results status before citing

Diagnostic Accuracy Synthesis: Meta-Analysis of 82 Studies

The most comprehensive synthesis of diagnostic accuracy for regulator-approved deep learning DR screening systems is a systematic review and meta-analysis by Wang et al. published in npj Digital Medicine. The article was published online on 19 December 2025, with a version-of-record date of 02 February 2026 (DOI 10.1038/s41746-025-02223-8; volume 9, article number 110). The search covered PubMed, Embase, and ClinicalTrials.gov through 3 April 2025.

The meta-analysis identified 82 studies covering 887,244 examinations across 25 devices and 28 countries. Hierarchical bivariate meta-analysis yielded patient-level pooled sensitivity of 0.93 (95% CI 0.91–0.95) and pooled specificity of 0.90 (95% CI 0.87–0.92).

Heterogeneity Drivers and What They Mean for Interpretation

Pooled accuracy figures mask substantial heterogeneity. The meta-analysis identified four primary drivers:

  • DR severity threshold: Any-DR screening thresholds (detecting even mild disease) increase false-positive burden significantly compared to more-than-mild thresholds.
  • National income level: Low-income country settings showed different performance profiles, with higher false-positive burden. Dilated-pupil protocols, portable cameras, and adjudicated reference standards improved specificity.
  • Image gradability: Low image gradability significantly increased false-positive burden; studies with higher ungradable rates showed materially different operating characteristics.
  • Vendor involvement: Non-vendor-involved studies showed approximately 7 percentage points higher sensitivity but approximately 10 percentage points lower specificity compared to vendor-involved studies at the patient level (p<0.01 for both). This divergence is a critical methodology caveat when interpreting device-specific performance figures from any single study.
Key heterogeneity drivers from Wang et al. (npj Digital Medicine, version of record 02 February 2026). Each factor has direct implications for how published accuracy figures should be applied to specific deployment decisions.
Heterogeneity FactorDirection of Effect on AccuracyImplication
Any-DR (vs. more-than-mild) thresholdIncreases false-positive rateThreshold selection directly affects downstream referral burden
Low-income country settingIncreases false-positive burdenPooled global figures may not reflect US deployment performance
Low image gradabilityIncreases false-positive burden; triggers safety-positive behaviorUngradable image rates in real-world US settings are a key deployment variable
Non-vendor study involvement+7 pp sensitivity, −10 pp specificity vs. vendor-involvedStudy sponsorship must be considered when evaluating cited performance figures
Dilated pupil protocolImproves specificityPivotal trial gradability (96.1%) reflects dilated-case inclusion; nondilated real-world rates are lower

Real-World Deployment Performance vs. Pivotal Trial Conditions

A systematic review of health system adoption evidence, published in Ophthalmology Science (Teng et al., September 2025; PMC12553049), reviewed published implementations of the three FDA-cleared systems and conducted interviews with ophthalmologists at academic health systems. The findings document a consistent and clinically significant gap between pivotal trial conditions and real-world deployment performance.

A patient at a nonmydriatic fundus camera in a community primary care clinic, with a medical assistant and a wall-mounted AI output panel showing no diabetic retinopathy detected.
Point-of-care autonomous AI DR screening in a community health setting. The AI output panel — not a physician — issues the referral decision. Real-world image gradability in nonmydriatic settings falls substantially below pivotal trial figures.

The Gradability Gap

The LumineticsCore pivotal trial reported imageability of 96.1% — meaning 96.1% of submitted images were gradable by the AI system. That figure was achieved in a trial that included pharmacologically dilated cases (23.6% of participants required dilation). In real-world nonmydriatic deployments at academic health systems, gradability ranges from 49% to 75%:

Nonmydriatic image gradability across published real-world implementations versus the LumineticsCore pivotal trial. The pivotal trial figure includes dilated cases and is not directly comparable to nondilated real-world settings. Source: Teng et al., Ophthalmology Science, September 2025.
InstitutionReported GradabilityNotes
Johns Hopkins49%Nonmydriatic; lowest reported in published implementations
Mayo Clinic55.1%Nonmydriatic
University of Iowa73.4%Nonmydriatic
Stanford71%Nonmydriatic
Temple University75%Nonmydriatic; highest reported
LumineticsCore Pivotal Trial96.1%Included dilated cases (23.6% of participants); not directly comparable to nondilated deployments

When images are ungradable, the AI system triggers a safety-positive response — it does not issue a negative (no DR) result. This behavior protects patients from false reassurance but creates a false-positive referral burden: patients with ungradable images are referred to eye care even though the AI has not detected disease. In deployments with 49% gradability, a substantial fraction of all screened patients will receive an ungradable result requiring follow-up.

Risk factors for ungradable results include Type 1 diabetes, active smoking, advanced age, and cataracts. In-practice sensitivity across published implementations ranges from 87% to 100%, and specificity from 60% to 91% — both wider ranges than pivotal trial figures suggest. No standardized post-market gradability reporting requirement currently exists.

Equity and Health Disparity Evidence

Two bodies of evidence address the equity implications of autonomous AI DR screening: the ACCESS trial's within-trial equity findings, and a subsequent retrospective analysis of specialist access patterns at Johns Hopkins.

The ACCESS trial cohort — 35% Black, 47% Medicaid-covered — achieved 100% diabetic eye exam completion in the AI arm with no documented racial or ethnic disparity in that completion rate. The trial demonstrated care-gap closure across demographic subgroups within the cohort. The generalizability caveat noted earlier applies: this was a pediatric cohort under investigational conditions, not an adult autonomous deployment.

A separate retrospective analysis by Leong, Wolf, Channa et al. (npj Digital Medicine, published 05 March 2026; version of record 13 April 2026; DOI 10.1038/s41746-026-02460-5) examined 3,745 patients referred to the Wilmer Eye Institute at Johns Hopkins via AI pathway or standard referral pathway between August 2020 and September 2022. After propensity-score weighting on social determinants of health and clinical variables, AI-assisted screening was associated with increased specialist eye care presentation by African-American patients (OR 1.15, 95% CI 1.02–1.29, p=0.022). No significant difference was observed for Medicaid coverage (OR 0.97, p=0.245).

FQHCs represent the active equity deployment frontier. DRES-POCAI is specifically designed to test whether autonomous AI DR screening can close care gaps in federally qualified health center populations — the underserved communities where the screening gap is most acute. Results from that trial, once published, will provide the most directly applicable equity evidence for the FQHC context.

Known Limitations of the Current Evidence Base

Clinicians, researchers, and procurement professionals should account for the following limitations when evaluating the autonomous AI DR screening evidence base:

  • Single-center pivotal trials: All three FDA-cleared devices were authorized based on single pivotal trials, not multi-site independent replication. External validation in diverse health system settings remains limited.
  • No head-to-head device comparison: No published trial has compared LumineticsCore, EyeArt, and AEYE-DS on the same patient population. Cross-study comparisons are non-equivalent due to different populations, reference standards, and enrichment strategies.
  • No FDA pediatric label for any device: All three devices are labeled for adults. The ACCESS trial's use of LumineticsCore in ages 8–21 was investigational with retina specialist overread. Pediatric autonomous use is not currently supported by FDA authorization.
  • Vendor involvement as a heterogeneity driver: The Wang et al. meta-analysis found a 7–10 percentage point divergence in sensitivity and specificity between vendor-involved and independent studies. Device-specific performance figures must be evaluated with knowledge of study sponsorship.
  • Observational equity data: The Leong et al. retrospective analysis is single-institution, non-causal, and explicitly exploratory. No prospective equity-focused RCT results are yet available for US adult populations in primary care or FQHC settings.
  • DRES-POCAI results not yet confirmed published: As of Q2 2026, only the trial protocol has been confirmed published in JAMA Network Open 2025. Full results should be verified before citing.
  • No incidental finding detection: Current autonomous AI models are not trained to detect incidental retinal findings — such as choroidal melanoma or retinal detachment — that a human tele-retinal grader would identify. This represents a defined liability gap in the autonomous workflow.
  • Off-the-shelf generative AI models do not meet FDA thresholds: An evaluation of GPT-4o, GPT-4o-mini, Grok, and Gemini on the Messidor-2 dataset (level 3 reference standard) found the highest AUC was 0.83, compared to retina specialist AUC of approximately 0.94. None of the tested models met FDA regulatory thresholds for autonomous screening (>85% sensitivity, >82.5% specificity against a level 1 prognostic standard in a prospective trial). Performance on a level 3 reference standard typically overestimates level 1 performance by 10–30 percentage points.

Active Trials, Open Evidence Questions, and Reimbursement Context

Open Evidence Questions

  • DRES-POCAI results: The FQHC-focused RCT remains the most important pending evidence for equity-focused deployment decisions. Results status should be verified against the primary JAMA Network Open record before citing.
  • Head-to-head device trials: No trial comparing LumineticsCore, EyeArt, and AEYE-DS on the same patient population has been published or, to the knowledge of this record's sources, registered. Comparative adoption decisions currently must rely on cross-study inference with all the methodological limitations that entails.
  • Post-market gradability surveillance: The Wang et al. meta-analysis explicitly calls for standardized gradability reporting requirements. No such post-market standard currently exists. Health systems cannot systematically compare ungradable image rates across institutions or devices without a reporting framework.
  • Pediatric authorization: The ACCESS trial demonstrated proof-of-concept care-gap closure in youth, but no device holds FDA authorization for autonomous use in patients under 22. A pediatric indication would require a dedicated regulatory submission with a prospective pediatric trial.

Reimbursement and Utilization Context

All three FDA-cleared devices bill under CPT 92229, which covers point-of-care autonomous AI imaging for DR screening. National Medicare payment for CPT 92229 was $40.28 in 2024. LumineticsCore was the first of the three to secure Medicare and Medicaid reimbursement and to qualify for HEDIS care gap closure and MIPS quality measure credit.

A review of CPT 92229 utilization in a database comprising approximately 40% of all US CMS claims identified 15,097 total claims from 2021 through 2023, increasing year-over-year. Most claims originated from ZIP codes associated with metropolitan academic centers; top-volume cities included Philadelphia, Dayton (Ohio), and New Orleans. These figures are from a claims sample database and should be treated as indicative of trends rather than definitive national utilization counts. LumineticsCore's manufacturer reports use in over 1,000 US clinics, a figure that has not been independently verified.

On the question of liability: per AMA guidance and expert legal opinion cited in the clinical literature, liability for autonomous AI misdiagnosis in this context is held to fall on the device manufacturer rather than the ordering clinician or health system. This is a materially different liability allocation than exists for AI-assisted (non-autonomous) tools where a physician makes the final interpretive decision. Clinicians and health systems should verify this liability framework with their legal counsel before deployment, as this area of law continues to develop.

Discussion

Professional commentary from clinicians, researchers, and policy professionals is welcome. Please ground discussion in published evidence or clinical experience.

Comments

Join the discussion with an anonymous comment.

Loading comments...