The ED Chest X-Ray Challenge: Volume, Availability Gaps, and Triage Scale Limitations

Chest X-ray is one of the highest-volume imaging studies ordered in emergency departments. It is frequently the first imaging step for patients presenting with dyspnea, chest pain, fever, or trauma — conditions where time-to-diagnosis directly affects clinical outcomes. The volume pressure is compounded by a structural problem: radiologist availability is uneven across the day, and off-hours coverage in many health systems is provided by a reduced number of on-call readers handling a compressed queue.

Conventional triage systems such as the Korean Triage and Acuity Scale (KTAS) or the Emergency Severity Index (ESI) stratify patients using presenting symptoms, vital signs, and clinical assessment. These scales were not designed to incorporate imaging findings. A patient may be assigned a moderate acuity score on presentation and then found to have a significant radiographic abnormality — but the imaging result arrives after the triage decision has already shaped bed assignment and care sequencing.

This creates two distinct unmet needs. First, radiologists need a mechanism to ensure that the most urgent studies in a high-volume queue are read first — a workflow efficiency problem. Second, emergency physicians need imaging information integrated into triage decision-making earlier in the patient encounter — a clinical decision support problem. These are related but fundamentally different requirements, and the AI tools designed to address them are not interchangeable.

Two Distinct AI Roles: Worklist Prioritization vs. Adverse Event Prediction

The most consequential planning error in evaluating AI chest X-ray tools for the ED is treating these two functions as a single capability. They are not. Each role has its own evidence base, its own FDA-cleared products, and its own set of deployment requirements.

Diagram showing two parallel pathways from a central chest X-ray AI icon: left branch leads to a worklist prioritization panel, right branch leads to an adverse event risk gauge for ED clinical decision support.
AI chest X-ray triage in the ED operates across two independent clinical functions with separate evidence bases and deployment requirements.

Role 1 — Worklist Prioritization: The AI system analyzes incoming chest X-rays and reorders the radiologist reading queue by urgency. Studies flagged as likely to contain critical or urgent findings are elevated to the top of the worklist. The radiologist still reads and reports every study; the AI changes the sequence, not the clinical decision. This is the Computer-Aided Triage (CADt) model.

Role 2 — Adverse Event Prediction: The AI system generates a risk score from the chest X-ray image that predicts the probability of a major adverse cardiopulmonary event — hospitalization, ED revisit, or death. This score is intended to inform the emergency physician's triage decision, functioning as a supplementary input alongside conventional symptom-based acuity scales. The end-user is the ED clinician, not the radiologist.

The two clinical roles of AI chest X-ray triage in the ED differ in end-user, function, integration point, and supporting evidence.
DimensionRole 1: Worklist PrioritizationRole 2: Adverse Event Prediction
Primary end-userRadiologistEmergency physician
FunctionReorders reading queue by urgencyGenerates risk score for triage decision support
Clinical outputStudy prioritization signalProbability of major adverse cardiopulmonary event
Integration pointPACS / radiology worklistED triage workflow / EHR
Key metricTurnaround time (TAT) reductionAUC vs. conventional triage scale
Primary evidenceProspective real-world study (n=20,944)Retrospective cohort study (n=3,576)
FDA-cleared exampleLunit INSIGHT CXR Triage (K211733)No dedicated clearance for adverse event prediction role identified

Evidence for Role 1 — Worklist Prioritization: Real-World TAT Reduction and Its Limits

Split-panel illustration showing a chest X-ray with an amber AI heat map overlay on the left, and a stylized ED radiology worklist with urgency-ordered case cards on the right, connected by a blue data-flow line.
AI worklist triage reorders the radiologist reading queue based on detected urgency signals — the radiologist reads every study, but the sequence changes.

The strongest prospective evidence for AI CXR worklist prioritization comes from a real-world study at Changi General Hospital in Singapore involving 20,944 chest X-rays evaluated using Lunit INSIGHT CXR Triage. The study reported a 77% reduction in turnaround time across all patient subgroups. For urgent CXRs, the system achieved 82% sensitivity at 99% specificity. For normal CXRs, sensitivity reached 89% with 93% specificity. Subgroup analyses across age groups, genders, and four ethnic groups showed consistently high accuracy, with sensitivity, specificity, and AUC above 84% in each subgroup — a meaningful finding for health systems serving diverse patient populations.

The study used an observational design with 43 radiologists blinded to AI results, assessing a three-tier classification of normal, non-urgent, and urgent. The prospective design is a significant methodological strength relative to the retrospective studies that dominate this literature.

Why Workflow Parameters Predict Time Savings Better Than Model AUC

A 77% TAT reduction is a striking figure, but its applicability to any given ED depends heavily on local workflow conditions — not on the AI model's performance characteristics alone. Research from the FDA's Center for Devices and Radiological Health and the University of Chicago, using the QuCAD computational modeling framework applied to CT pulmonary angiography exams for pulmonary embolism, provides a methodological framework for understanding this dependency.

The QuCAD framework demonstrates that the primary determinants of AI triage time savings are not the model's AUC but rather: the number of radiologists on shift, the rate at which new exams arrive (inter-arrival time), how long radiologists take to read each study, and the prevalence of the target condition in the imaging population. When radiologist capacity is high relative to exam volume — as is more likely off-hours when both exam volume and staffing are reduced — AI-driven queue reordering produces smaller absolute gains. Adding a single radiologist to a shift can reduce AI-driven TAT savings to near zero.

This has a direct implication for ED deployment planning: the same AI tool may deliver meaningful TAT benefit during peak daytime hours when exam volume is high and the queue is genuinely congested, while producing minimal benefit during overnight shifts when the queue is shorter and coverage is already thin. Evaluating AI CXR triage on average TAT across all hours will obscure this asymmetry.

Broader evidence on AI efficiency in medical imaging reinforces caution. A 2024 systematic review and meta-analysis in npj Digital Medicine examined 48 real-world clinical AI imaging studies and found that while 67% of studies reporting time-for-task outcomes showed reductions, three separate meta-analyses of 12 comparable studies showed no statistically significant efficiency effects after AI implementation, with high heterogeneity across studies. More than half of the included studies had relevant conflicts of interest. The review's conclusion — that robust inferences about real-world AI efficiency benefits in imaging remain limited — is important context for any single-center TAT finding, including the Changi General Hospital result.

Evidence for Role 2 — Adverse Event Prediction: AI vs. KTAS for Major Cardiopulmonary Events

The evidence for using AI chest X-ray analysis to predict major adverse cardiopulmonary events in ED patients comes primarily from a 2025 study published in Korean Journal of Radiology by Rhee et al. at Seoul National University Hospital. The study enrolled 3,576 ED patients presenting with acute cardiopulmonary symptoms and assessed whether AI analysis of their chest X-rays could predict a composite endpoint of hospitalization, ED revisit within 30 days, or death.

The study's primary finding was a substantial AUC advantage for AI over KTAS: 0.795 versus 0.610 (P<0.001). At a 15% score threshold, the AI system achieved 92.4% sensitivity and 38.4% specificity for predicting major adverse events. Critically, AI results remained an independent predictor of adverse events after statistical adjustment for KTAS score, with an adjusted odds ratio of 6.913 for scores at or above the 15% threshold. A combination model incorporating both AI and KTAS achieved AUC 0.799 — outperforming KTAS alone but similar to AI alone, suggesting that AI captures information largely orthogonal to what symptom-based triage measures.

Performance was consistent across fixed and portable scanner subgroups, and in patients presenting with fever or dyspnea — a finding relevant to EDs where portable imaging is common and patient acuity varies widely.

Performance comparison for predicting major adverse cardiopulmonary events in 3,576 ED patients (Rhee et al., Seoul National University Hospital, KJR 2025). Study used Lunit INSIGHT CXR v3.1.4.1, not the dedicated Triage product.
MetricAI (Lunit INSIGHT CXR)KTASCombination Model
AUC0.7950.6100.799
Sensitivity (at 15% threshold)92.4%
Specificity (at 15% threshold)38.4%
Independent predictor after KTAS adjustmentYes (adjusted OR 6.913)Reference

The study's authors draw an explicit boundary around these findings: AI chest X-ray analysis cannot replace conventional triage because critical emergencies such as acute coronary syndrome may present with a normal or near-normal chest X-ray. The AI score reflects what is visible on the image; it cannot detect conditions that are not radiographically apparent. The authors position AI as a supplementary tool that adds predictive value on top of — not instead of — symptom-based assessment.

FDA Regulatory Context: CADt Clearances with ED-Specific Indications

Two vendors hold FDA authorizations with direct relevance to AI chest X-ray triage in the ED. Understanding what each clearance specifically covers — and what it does not — is essential for procurement and deployment planning. For readers unfamiliar with the regulatory pathway, the 510(k) substantial equivalence framework for AI-enabled SaMD is explained in the site glossary. For the broader landscape of FDA-cleared radiology AI devices, see FDA-Cleared Radiology AI: Mapping the Landscape and the Clinical Evidence Gap.

Lunit INSIGHT CXR Triage — K211733

Lunit received FDA 510(k) clearance for Lunit INSIGHT CXR Triage in November 2021 under submission number K211733. The clearance specifically covers the sorting of emergency cases found on chest X-rays — making it one of the few chest X-ray AI tools with an ED-specific indication rather than a general radiology reading assistance indication. The product was introduced to the US market at RSNA 2021 in partnership with GE Healthcare, Philips, and FujiFilm.

Qure.ai qXR-Detect — 26 FDA-Cleared Indications

Qure.ai's qXR-Detect holds 26 FDA-cleared indications as of February 2026, including triage and notification indications for pneumothorax and pleural effusion — findings with direct ED urgency implications. The product is described as a CADe solution supporting emergency room physicians, family medicine practitioners, and radiologists across six anatomical regions. Notably, qXR-Detect is currently the only chest X-ray CADe device cleared with a Predetermined Change Control Plan (PCCP), which enables algorithm updates within defined parameters without requiring a new 510(k) submission — a meaningful consideration for long-term software maintenance in clinical deployment.

Evidence Quality and Study Limitations

The current evidence base for AI CXR triage in the ED has meaningful strengths — particularly the prospective design and scale of the Changi General Hospital study — but carries significant limitations that any deployment decision must account for.

A 2024 scoping review of 29 US ED AI triage studies found that all 29 studies used retrospective data. None of the studies in that review specifically examined chest X-ray AI as the triage input — they covered clinical data ML models broadly — but the pattern of retrospective-only validation is consistent with the wider AI ED triage literature. Prospective validation in diverse, real-world ED populations remains limited.

  • No RCT evidence demonstrates improved patient-level outcomes — mortality, ED length of stay — from AI CXR triage in the emergency department specifically. This gap must be explicitly acknowledged in any deployment justification.
  • Single-center study concentration limits generalizability. The Changi General Hospital study is a prospective exception, but it represents one institution in one health system.
  • Demographic bias risk: documented racial and gender performance disparities have been identified on public CXR datasets. The Changi study's subgroup analysis across ethnic groups is encouraging but does not resolve the broader concern across different training populations.
  • Portable vs. fixed scanner performance gap: while the KJR 2025 study found consistent performance across scanner types, this cannot be assumed to generalize across all AI CXR tools or all ED environments.
  • High false-positive burden at high sensitivity thresholds: achieving 92.4% sensitivity for adverse event prediction at a 15% score threshold comes with 61.6% false positives. Alert fatigue and unnecessary clinical escalation are real operational risks.
  • Disease prevalence effects: the magnitude of workflow benefit from worklist prioritization is sensitive to the prevalence of urgent findings in the local imaging population. A tool calibrated for a high-acuity tertiary ED may perform differently in a community ED with different case mix.
  • Meta-analytic null finding: despite 67% of individual studies reporting efficiency gains, three meta-analyses in the npj Digital Medicine 2024 review found no statistically significant effects across comparable studies, with high heterogeneity. Individual study results should be interpreted against this aggregate context.

Deployment Requirements: What Real-World ED Integration Actually Demands

FDA clearance and a favorable single-center study are not sufficient conditions for successful ED deployment. The npj Health Systems 2026 overview of AI chest radiography identifies PACS integration and EHR workflow connection as persistent barriers to routine clinical deployment, alongside generalizability concerns and black-box interpretability. Real-world deployment requires addressing each of these conditions before go-live.

Deployment requirements differ by AI role. Integration planning must be role-specific.
Deployment RequirementRole 1: Worklist PrioritizationRole 2: Adverse Event Prediction
Primary integration pointPACS / radiology information systemED triage workflow / EHR
Alert delivery mechanismRadiologist worklist reorderingED physician notification or EHR flag
Alert fatigue riskModerate (flagged studies at top of queue)High (92.4% sensitivity = many alerts)
Site calibration neededThreshold tuning by exam volume, staffingThreshold tuning by local adverse event prevalence
End-user trainingRadiologists and PACS administratorsED nurses, physicians, and triage staff
Post-deployment monitoringTAT tracking, override rate, drift surveillanceAlert response rate, clinical outcome tracking

PACS Integration and Technical Barriers

Connecting an AI CXR triage tool to an existing radiology workflow requires the AI output to be routed into the PACS worklist in real time. This is technically non-trivial: PACS systems vary considerably in their support for third-party AI integrations, and the latency between image acquisition and AI output must be low enough to influence reading order before the radiologist has already begun working through the queue manually. Health systems with older PACS infrastructure or fragmented imaging environments face higher integration overhead.

Alert Design and Threshold Calibration

The operating threshold for any AI CXR triage tool is a deployment decision, not a fixed product specification. Setting a high sensitivity threshold captures more urgent cases but generates more false positives. In a worklist prioritization context, over-flagging degrades the radiologist's trust in the system and may lead to de facto ignoring of priority signals. In an adverse event prediction context, high alert rates create alert fatigue for ED nurses and physicians and may prompt inappropriate escalation for patients who do not require it. Site-specific threshold calibration — based on local disease prevalence, staffing patterns, and clinical risk tolerance — is required before routine use.

Site-Specific Workflow Calibration

The workflow parameter modeling framework described earlier has a direct operational implication: health systems should model expected TAT benefit under their own staffing and volume conditions before committing to a worklist prioritization deployment. A system with two overnight on-call radiologists covering a low-volume ED may see minimal benefit from AI queue reordering. The same tool at a high-volume academic ED with daytime imaging peaks may produce substantial gains during those peak periods. Deployment planning that assumes the Changi General Hospital TAT result will replicate locally is a planning risk.

Post-Deployment Monitoring

AI model performance can drift as patient population characteristics, imaging equipment, and clinical practices change over time. Post-deployment monitoring should track AI alert accuracy against confirmed clinical outcomes, radiologist override rates (a proxy for trust calibration), and any changes in case mix that may shift the tool's operating conditions outside the range of its training data. The presence of a PCCP in Qure.ai's clearance reflects FDA recognition that ongoing algorithm updates are a feature of deployed AI SaMD — not a one-time validation event.

Clinical Takeaways for ED and Radiology Decision-Makers

  • Establish which role the tool fills before evaluation. Worklist prioritization and adverse event prediction are separate functions with separate evidence bases, separate FDA-cleared products, and separate integration requirements. The pre-deployment question is not 'which AI CXR tool is best' but 'which clinical problem are we solving and which role addresses it.'
  • The strongest prospective evidence for worklist prioritization (77% TAT reduction, n=20,944) comes from a single institution in Singapore. It is the best available real-world data, but its applicability depends on local workflow parameters — radiologist count, exam volume, and disease prevalence — not on the AI model's AUC alone.
  • TAT savings from AI worklist prioritization are likely to be significant during high-volume daytime hours and non-significant or minimal during off-hours when queue congestion is lower. Deployment planning should account for this asymmetry explicitly.
  • For adverse event prediction, AI CXR analysis significantly outperformed KTAS (AUC 0.795 vs. 0.610) as an independent predictor of major cardiopulmonary events in a 3,576-patient ED study. AI adds predictive information that KTAS does not capture — but it cannot substitute for conventional triage because CXR-negative emergencies exist.
  • The KJR 2025 adverse event prediction evidence used Lunit INSIGHT CXR v3.1.4.1 — the general interpretation tool — not Lunit INSIGHT CXR Triage (K211733), the FDA-cleared worklist prioritization product. These are distinct products and the evidence supporting each must be evaluated independently.
  • FDA clearance confirms regulatory authorization for a specific intended use. It does not confirm reimbursement coverage, real-world US ED deployment outcomes, or performance equivalence across health systems with different patient populations and imaging environments.
  • No RCT evidence currently demonstrates that AI CXR triage improves patient-level outcomes — mortality or ED length of stay — in the emergency department. The evidence supports workflow efficiency and predictive accuracy under specific conditions; patient outcome benefit remains unproven.
  • Successful deployment requires site-specific threshold calibration, PACS or EHR integration planning appropriate to the tool's role, clinician training for the relevant end-users (radiologists for Role 1, ED physicians and nurses for Role 2), and ongoing post-deployment performance monitoring.