AI Model Spots Hidden Heart Failure from ECGs

*Important notice: This news reports on an unedited version of the paper which has been accepted and is awaiting final editing. Therefore, the study should not be regarded as conclusive or treated as established information.

ECG analysis improved heart failure detection when deep learning used NT-proBNP-enhanced labels, identifying missed cases and preserved ejection fraction patterns across prospective and external validation cohorts in routine clinical datasets.

Study: Heart failure detection in electrocardiograms using Artificial Intelligence and pragmatic labelling. Image Credit: Nomads Image Lab/Shutterstock

In an article published in the journal Nature, researchers from Akershus University Hospital in Norway have developed a deep learning model that detects heart failure (HF) directly from electrocardiograms (ECGs). By combining routine diagnostic codes with a blood biomarker during training, the team constructed a smarter labelling strategy that meaningfully improves detection accuracy across the full clinical spectrum of HF.

A Condition Hidden in Plain Sight

HF affects an estimated 1-2% of the global adult population and is linked to high mortality and substantial societal costs. Its symptoms, breathlessness, fatigue, and fluid retention frequently mimic those of respiratory and other non-cardiac conditions, making accurate diagnosis difficult. Confirming HF typically demands echocardiography, invasive procedures, or specialist biomarker testing, resources that remain inaccessible in many healthcare settings.

The ECG, by contrast, is cheap, non-invasive, and available even in resource-scarce environments. However, ECGs have historically offered limited utility in HF diagnosis, used mainly to identify secondary causes like atrial fibrillation (AF) rather than HF itself. A further obstacle is the absence of a reliable diagnostic ground truth.

Administrative International Classification of Diseases, 10th revision (ICD-10) diagnostic codes, the most readily available data source, are known to have low validity for HF, as patients are frequently miscoded or missed entirely. Training a supervised deep learning model on such noisy labels risks teaching it to replicate those same diagnostic errors.

To address this, the researchers introduced a pragmatic labelling strategy that cross-references ICD-10 codes with N-terminal proB-type natriuretic peptide (NT-proBNP), a well-established blood biomarker for HF. NT-proBNP levels below 125 ng/L effectively rule out HF, while levels above 1,000 ng/L provide strong confirmation, a two-sided filter that substantially purifies both the positive and negative training labels.

Building the Model, Cleaning the Labels

The development dataset comprised 25,300 patients, drawing from ECG recordings collected at Akershus University Hospital between 2016 and 2022. After applying the pragmatic labelling strategy, 47,034 electrocardiograms remained, with 10,692 labelled as HF-positive. Three deep learning architectures were evaluated using five-fold cross-validation.

The InceptionTime model consistently outperformed the other two architectures, and the final model was assembled as an ensemble of five independently trained InceptionTime networks, with predictions averaged across all five members to improve stability and generalisation.

The model was evaluated against three progressively stricter labelling strategies. The first used ICD-10 codes alone, the second incorporated age and sex-adjusted NT-proBNP thresholds to validate diagnoses, and the third applied strict cut-offs (below 125 ng/L for non-HF and above 1,000 ng/L for confirmed HF) to produce the highest-certainty labels.

On the prospective test cohort of 43,727 patients, the model achieved area under the curve (AUC) values of 0.86, 0.91, and 0.96 under the three strategies, respectively. The model trained with NT-proBNP-enhanced labels significantly outperformed the ICD-10-only model under strategies two and three, confirming that cleaner training labels translate directly into better detection.

Validated Across Populations and Phenotypes

External validation on the MIMIC-IV dataset, comprising 161,352 patients from a United States (US) hospital system, yielded AUC values of 0.87, 0.90, and 0.96, closely mirroring the prospective results and demonstrating strong generalisability across different clinical and demographic settings.

Beyond broad HF detection, the model showed meaningful sensitivity to cardiac function subtypes, which are historically the hardest to identify. Among patients with preserved ejection fraction (EF) above 50%, the model distinguished normal diastolic function from grade 2 or 3 dysfunction with an AUC of 0.800, and from all other diastolic grades with an AUC of 0.828. It also responded systematically to the H2FPEF risk score, assigning progressively higher predicted risk to patients in higher score categories.

Download the PDF of this page here

A qualitative review by a senior cardiologist further found that among 30 high-risk patients flagged by the model despite having no formal diagnosis, 24 showed strong clinical signs of HF with preserved EF (HFpEF), suggesting the model is detecting genuinely undiagnosed cases that conventional labelling strategies completely miss.

Toward Scalable HF Screening

This study represents a meaningful step toward accessible, low-cost HF screening at scale. By demonstrating that a deep learning model trained on pragmatically labelled, routinely collected data can detect HF across its full phenotypic range, including the elusive HFpEF subtype, the work opens the door to earlier intervention in settings where specialist diagnostics are unavailable.

The authors acknowledge key limitations, including the predominance of Caucasian patients in both validation cohorts and the absence of a true gold-standard diagnosis. Future work aims to incorporate echocardiographic parameters into model training, explore additional biomarkers, and establish real-world clinical impact through prospective trials.

Journal Reference

Stenhede, E., Ravn, J., Schirmer, H., & Ranjbar, A. (2026). Heart failure detection in electrocardiograms using Artificial Intelligence and pragmatic labelling. Npj Digital Medicine. DOI:10.1038/s41746-026-02774-4, https://www.nature.com/articles/s41746-026-02774-4

Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Nandi, Soham. (2026, May 27). AI Model Spots Hidden Heart Failure from ECGs. AZoRobotics. Retrieved on May 27, 2026 from https://www.azorobotics.com/News.aspx?newsID=16414.

  • MLA

    Nandi, Soham. "AI Model Spots Hidden Heart Failure from ECGs". AZoRobotics. 27 May 2026. <https://www.azorobotics.com/News.aspx?newsID=16414>.

  • Chicago

    Nandi, Soham. "AI Model Spots Hidden Heart Failure from ECGs". AZoRobotics. https://www.azorobotics.com/News.aspx?newsID=16414. (accessed May 27, 2026).

  • Harvard

    Nandi, Soham. 2026. AI Model Spots Hidden Heart Failure from ECGs. AZoRobotics, viewed 27 May 2026, https://www.azorobotics.com/News.aspx?newsID=16414.

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Your comment type
Submit

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.

or

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.