Skip Navigation
Skip to contents

ACC : Acute and Critical Care

OPEN ACCESS
SEARCH
Search

Articles

Page Path
HOME > Acute Crit Care > Volume 40(2); 2025 > Article
Original Article
Rapid response system
Prospective external validation of a deep-learning-based early-warning system for major adverse events in general wards in South Korea
Acute and Critical Care 2025;40(2):197-208.
DOI: https://doi.org/10.4266/acc.000525
Published online: May 30, 2025

1AITRICS Corp., Seoul, Korea

2Division of Geriatrics, Department of Internal Medicine, Yonsei University College of Medicine, Seoul, Korea

3Department of Internal Medicine, Keimyung University Dongsan Hospital, Keimyung University School of Medicine, Daegu, Korea

4Division of Cardiology, Department of Internal Medicine, Keimyung University Dongsan Hospital, Keimyung University School of Medicine, Daegu, Korea

5Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Keimyung University Dongsan Hospital, Keimyung University School of Medicine, Daegu, Korea

6Department of Obstetrics and Gynecology, Keimyung University Dongsan Hospital, Keimyung University School of Medicine, Daegu, Korea

7Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Internal Medicine, Chuncheon Sacred Heart Hospital, Hallym University Medical Center, Chuncheon, Korea

Corresponding author: Ki-Byung Lee AITRICS Corp. and Department of Internal Medicine, Chuncheon Sacred Heart Hospital, Hallym University Medical Center, AP Tower 13F, 218, Teheran-ro, Gangnam-gu, Seoul 06221, Korea Tel: +82-2-569-5507 Fax: +82-2-569-5508 E-mail: hasej@aitrics.com
*These authors contributed equally to this work as co-first authors.
• Received: February 7, 2025   • Revised: March 25, 2025   • Accepted: April 15, 2025

© 2025 The Korean Society of Critical Care Medicine

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

  • 1,239 Views
  • 74 Download
  • 1 Crossref
prev next
  • Background
    Acute deterioration of patients in general wards often leads to major adverse events (MAEs), including unplanned intensive care unit transfers, cardiac arrest, or death. Traditional early warning scores (EWSs) have shown limited predictive accuracy, with frequent false positives. We conducted a prospective observational external validation study of an artificial intelligence (AI)-based EWS, the VitalCare - Major Adverse Event Score (VC-MAES), at a tertiary medical center in the Republic of Korea.
  • Methods
    Adult patients from general wards, including internal medicine (IM) and obstetrics and gynecology (OBGYN)—the latter were rarely investigated in prior AI-based EWS studies—were included. The VC-MAES predictions were compared with National Early Warning Score (NEWS) and Modified Early Warning Score (MEWS) predictions using the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), and logistic regression for baseline EWS values. False-positives per true positive (FPpTP) were assessed based on the power threshold.
  • Results
    Of 6,039 encounters, 217 (3.6%) had MAEs (IM: 9.5%, OBGYN: 0.26%). Six hours prior to MAEs, the VC-MAES achieved an AUROC of 0.918 and an AUPRC of 0.352, including the OBGYN subgroup (AUROC, 0.964; AUPRC, 0.388), outperforming the NEWS (0.797 and 0.124) and MEWS (0.722 and 0.079). The FPpTP was reduced by up to 71%. Baseline VC-MAES was strongly associated with MAEs (P<0.001).
  • Conclusions
    The VC-MAES significantly outperformed traditional EWSs in predicting adverse events in general ward patients. The robust performance and lower FPpTP suggest that broader adoption of the VC-MAES may improve clinical efficiency and resource allocation in general wards.
Acute deterioration of hospitalized patients in general wards poses a significant challenge, often resulting in unplanned transfers to intensive care units (ICUs), initiation of cardiopulmonary resuscitation (CPR), or even death before ICU transfer. These major adverse events (MAEs) increase mortality and impose substantial financial burdens on both patients and healthcare systems [1-3]. Previous national surveys indicated that 14%–28% of ICU admissions were unplanned transfers from general wards rather than originating from the emergency department (ED) or operating room [4]. Moreover, patients transferred to the ICU from general wards exhibit higher mortality rates than those admitted directly from the ED or following surgery [3].
The factors contributing to acute deterioration are complex and multifaceted, hindering early prediction and intervention; however, many MAEs are preceded by detectable physiological derangements within the previous 24 hours [5,6]. While some such events are inevitable owing to underlying conditions, many are preventable through closer monitoring and timely medical intervention. Observational studies suggest that early intervention can prevent approximately 44% of in-hospital adverse events and 15% of unplanned ICU transfers (UITs) [4,6].
The rapid response system (RRS) was established in the 1990s [7] to enhance surveillance and proactive interventions aimed at preventing deterioration of patients admitted to the general wards. Subsequently, various early warning scores (EWSs) were developed and validated for use together with the RRS when monitoring patient physiological signs and predicting impending adverse events. Although these scoring systems have demonstrated promise [8], their reliance on generic thresholds often fails to account for individual patient variability, resulting in suboptimal predictive performance [9,10]. In a randomized controlled trial by Haegdorens et al. [11], an RRS utilizing the National Early Warning Score (NEWS) did not significantly reduce the incidence of UITs, cardiac arrest, or unexpected death. Furthermore, concerns have been raised regarding the high false-alarm rates generated by these systems, which can disrupt the RRS workflow [9,12,13].
To address these limitations, there is growing interest in incorporating artificial intelligence (AI) technologies into EWSs [14]. The use of AI-based algorithms in EWS may improve predictive accuracy by analyzing large volumes of clinical data and detecting subtle patterns often overlooked by conventional rule-based methods [15]. These advanced systems offer tailored monitoring by adapting to the unique profiles of individual patients, potentially delivering more precise and timely alerts [16]. However, the performance and impact of AI-based EWSs in real-world clinical settings require validation in prospective clinical studies [17].
We developed an AI-based clinical decision support system, called the VitalCare - Major Adverse Event Score (VC-MAES), to predict in-hospital MAEs, including UITs, cardiac arrest, or death among patients admitted to the general wards. The VC-MAES is generated in real time by analyzing structured data from electronic health records (EHRs), facilitating prompt identification of and intervention for patients at high risk of clinical deterioration.
This study aimed to externally validate the performance of VC-MAES by prospectively collecting real-world medical data. Additionally, we sought to compare its predictive accuracy with those of two widely used and extensively studied traditional EWSs, the NEWS and Modified Early Warning Score (MEWS).
The study adhered to the ethical guidelines of the 1975 Declaration of Helsinki and was approved by the Institutional Review Board of Keimyung University School of Medicine (No. 2022-12-081), which waived the requirement for informed consent due to the retrospective nature of this study. The study was registered with the Clinical Research Information Service (CRIS) operated by the National Institute of Health under the Korea Disease Control and Prevention Agency (CRIS Registration No. KCT0008466).
Study Design and Setting
This prospective observational external validation study was conducted at Keimyung University Dongsan Hospital in Republic of Korea. The model was implemented in six medical-surgical general wards across two departments: internal medicine (IM) and obstetrics and gynecology (OBGYN). These two departments were selected as our primary aim was to validate the model's performance across markedly different patient populations, rather than simply splitting cases into medical versus surgical categories. Additionally, we sought to validate our AI-based EWS in OBGYN populations that are often underrepresented in such studies. This was a non-interventional study in which the model generated real-time predictions that were neither disclosed to healthcare providers nor used to guide clinical decision-making. The study period was from June 3, 2023, to January 31, 2024, and patients were followed until discharge or April 30, 2024, whichever occurred first.
Patient Selection
Patients were eligible for inclusion if they met the following criteria: (1) 19 years of age or older and (2) had five initial vital signs (systolic blood pressure, diastolic blood pressure, heart rate, respiratory rate, and body temperature) recorded in their EHR during hospitalization. Patients were excluded if they had been admitted to the labor and delivery unit or directly to the ICU from the ED or operating room. All other patients, including those with do-not-resuscitate (DNR) orders, were included.
Data Collection
Patient demographics and clinical information, including vital signs, chief complaints, admission and final diagnoses, code status, start and end times of surgery, medication orders, and laboratory data, were extracted from the EHR. Data addressing ICU admission and discharge times, time of death, and CPR initiation and termination times were also retrieved.
Outcomes and Objectives
The primary outcomes of interest were UITs, cardiac arrest, and death. Cardiac arrest was defined as the initiation of chest compressions, defibrillation, or both, as documented in the EHR [18]. We defined UITs as unplanned when they occurred unexpectedly from general wards, as opposed to pre-arranged events, such as scheduled postoperative admissions or direct transfers from the ED [11,19].
Algorithm: VC-MAES
The VC-MAES is a proprietary predictive system that applies a deep-learning approach to time-series clinical data. Building on a bidirectional long short-term memory (biLSTM) architecture, this binary classification model estimated the likelihood of MAEs in general ward patients 6 hours in advance. The VC-MAES uses two categories of input data: (1) dynamic features derived from hourly time-series data, including vital signs and laboratory test results, and (2) static features. While the biLSTM component processes time-series inputs, the static features are handled by the fully connected layers. The outputs from both networks were subsequently merged and passed through additional classification layers to obtain the final prediction. A schematic of the VC-MAES model architecture is shown in Supplementary Figure 1, and additional information regarding the classification model can be found in Sung et al. [20].
The model was trained using the entire inpatient data, comprising 334,185 hospitalizations of 209,825 adult patients between 2013 and 2017 at Severance Hospital, a 2,454-bed tertiary academic medical center in Seoul, Republic of Korea. The dataset included patients from more than 35 medical and surgical specialties. The model primarily uses five vital signs and patient ages to generate a VC-MAES ranging from 0 to 100, with higher scores indicating a greater risk of MAEs within the next 6 hours. When used prospectively, the score is updated whenever a new input feature is recorded in the EHR, ensuring that it reflects the most recent data. If available, the model can also incorporate optional variables, such as oxygen saturation, Glasgow Coma Scale (GCS), total bilirubin, lactate, creatinine, platelets, pH, sodium, potassium, hematocrit, white blood cell count, bicarbonate, and C-reactive protein, to provide a more comprehensive risk score.
The VC-MAES imputes missing values using the last-observation-carried-forward (LOCF) method. In this study, if no previous values were available, normal values were assigned (Supplementary Table 1). Because mental status assessments were only performed in the ICUs and not in the general wards at this institution, GCS scores were assigned a value of 15 to calculate the VC-MAES, following the model’s missing value imputation method. Similarly, for the NEWS and MEWS calculations, a score of 0 was assigned for the Alert, Voice, Pain, Unresponsive (AVPU) scale, corresponding to an “alert” level of consciousness.
Performance Evaluation and Statistical Analysis
We compared the demographic characteristics of patients in the non-event and event groups using chi-square tests for categorical variables and t-tests or Wilcoxon rank-sum tests for continuous variables, as applicable. The overall accuracy of the VC-MAES in predicting MAEs at different time intervals was assessed using the area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC) analyses. The AUPRC analysis was conducted because our dataset was highly imbalanced and focused on positive cases, measuring precision and recall. A larger AUPRC indicates better performance in detecting the positive class, which is crucial in imbalanced datasets aiming to identify rare events. A bootstrap approach with 1,000 replicate samples drawn with replacement was employed to compare the AUROC and AUPRC values of the predictive models, forming empirical distributions to calculate confidence intervals and derive P-values from the differences in performance metrics. Additionally, to evaluate the threshold-based performance for each relevant NEWS and MEWS cutoff, we calculated the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and number needed to evaluate (NNE). These metrics were used to benchmark the performance of the established models against VC-MAES. Last, we compared the performance of the EWSs while excluding DNR patients to avoid selection bias due to markedly different patient demographics, characteristics, and/or care.
In addition to the time-point analyses, we assessed the model using the patient episode definition [21], in which each hospital admission was treated as a single comprehensive case that drew on one or more measurements of vital signs and other clinical parameters obtained throughout the hospital stay. Specifically, the first recorded VC-MAES, NEWS, and MEWS results upon admission were used, and binary logistic regression was performed to examine the association between these EWSs and both in-hospital MAEs. Furthermore, we investigated whether the initial scores were associated with prolonged length of stay (pLOS) to explore broader clinical utility beyond early deterioration prediction, such as resource allocation and guiding early discharge planning. We defined pLOS as the 75th-percentile length-of-stay threshold for the entire cohort [22,23]. Demographic variables that were significantly associated with MAEs and pLOS in the univariate analyses were included as covariates in the multivariate logistic regression model. All statistical analyses were conducted using R (version 4.4.0) and Python (version 3.11.8). The pROC package in R was used for ROC curve analysis. Statistical significance was set at P<0.05.
Baseline Characteristics
This study initially included 6,478 screened encounters, comprising 4,846 unique patients admitted to the general ward. After applying the exclusion criteria, 439 cases were removed: 423 were directly admitted to the ICU from the ED or operating room, and 16 had insufficient data for score calculation. A flowchart showing the patient selection procedures is provided in Figure 1. Ultimately, 6,039 events (4,447 patients) with data for 272,493 time points were included in the analyses. Of these, 217 encounters (3.6%) experienced MAEs, including 102 UITs, 13 cardiac arrests, and 102 deaths. The majority of MAEs occurred in the IM subgroup, with 207 of 2,177 cases (9.5%) experiencing MAEs. In contrast, only 10 of the 3,862 OBGYN cases (0.26%), including 1,568 obstetric cases, experienced MAEs.
Table 1 summarizes the baseline demographic characteristics and laboratory values of the event and non-event groups. The median age of the study population was 52.0 years. Patients who experienced MAEs had a significantly higher median age of 76.0 years compared to those without MAEs (51.0 years, P<0.001). The sex distribution revealed that 78.90% of the participants were female, largely because nearly 64% of the study population was drawn from the OBGYN department. At admission, VC-MAES, NEWS, and MEWS results were significantly elevated in the MAE group, reflecting more severe clinical presentations (VC-MAES, 6.2 vs. 1.4; NEWS, 3.0 vs. 1.0; and MEWS, 2.0 vs. 1.0; all P<0.001).
In a subanalysis stratified by specialty, the IM subgroup (n=2,177) had a median age of 70.0 years, which was older than that of the overall cohort. Among IM patients who experienced MAEs (n=207), the median age increased further to 78.0 years (P<0.001). Additionally, IM patients with MAEs exhibited significantly elevated baseline VC-MAES, NEWS, and MEWS compared to IM patients without MAEs (VC-MAES, 6.3 vs. 3.9; NEWS, 3.0 vs. 1.0; and MEWS, 2.0 vs. 1.0; all P<0.001).
In contrast, the OBGYN subgroup (n=3,862) consisted predominantly of younger females (median age, 41.0 years), with only 10 MAEs reported. Those who experienced MAEs were significantly older (60.5 years) than their counterparts without MAEs and also presented with significantly higher VC-MAES, NEWS, and MEWS at admission (mean VC-MAES, 4.9 vs. 0.9; NEWS, 2.5 vs.1.0; MEWS, 2.5 vs.1.0; all P<0.001). Baseline demographic characteristics and laboratory values for both the non-event and event groups within the IM and OBGYN cohorts are presented in Supplementary Tables 2 and 3, respectively.
Model Performance
The VC-MAES tool consistently demonstrated superior performance over the NEWS and MEWS, as shown by both AUROC and AUPRC results across all evaluated time points. Six hours prior to the MAEs, the VC-MAES achieved an AUROC of 0.918 (95% CI, 0.912–0.924) and an AUPRC of 0.352 (95% CI, 0.330–0.374). In comparison, the NEWS yielded an AUROC of 0.797 (95% CI, 0.784–0.810) and an AUPRC of 0.124 (95% CI, 0.110–0.139), whereas the MEWS attained an AUROC of 0.722 (95% CI, 0.707–0.737) and an AUPRC of 0.079 (95% CI, 0.069–0.090; bootstrap test, P<0.001) (Figure 2). This trend remained consistent for the 12- and 24-hour intervals preceding the event. A comprehensive comparison of the results is presented in Table 2 and Figure 3.
For individual events, such as UIT, cardiac arrest, or death, the VC-MAES again outperformed both the NEWS and MEWS. For UIT, the VC-MAES achieved an AUROC of 0.890, compared with 0.698 for the NEWS and 0.627 for MEWS (bootstrap test, P<0.001). For prediction of death, the VC-MAES attained an AUROC of 0.994, outperforming both the NEWS (0.937) and the MEWS (0.850; bootstrap test, P<0.001). For cardiac arrest, the VC-MAES demonstrated an AUROC of 0.758 compared with 0.699 for the NEWS and 0.641 for the MEWS (bootstrap test, P<0.001).
In the IM subgroup (n=2,177), the VC-MAES outperformed the NEWS and MEWS, achieving an AUROC of 0.852 (95% CI, 0.841–0.862) and an AUPRC of 0.350 (95% CI, 0.327–0.375). In contrast, the NEWS produced an AUROC of 0.762 (95% CI, 0.748–0.775) and AUPRC of 0.119 (95% CI, 0.104–0.135), whereas the MEWS produced an AUROC of 0.697 (95% CI, 0.682–0.712) and AUPRC of 0.076 (95% CI, 0.067–0.086) (Figure 4A and B).
Similarly, in the OBGYN subgroup (n=3,862), the VC-MAES demonstrated the highest discriminative ability, with an AUROC of 0.964 (95% CI, 0.943–0.982) and an AUPRC of 0.388 (95% CI, 0.303–0.472). In contrast, the NEWS yielded an AUROC of 0.928 (95% CI, 0.891–0.961) and an AUPRC of 0.299 (95% CI, 0.225–0.385), whereas the MEWS attained an AUROC of 0.864 (95% CI, 0.816–0.912) and an AUPRC of 0.196 (95% CI, 0.139–0.267) (Figure 4B).
When excluding the 418 DNR patients, which resulted in 5,472 encounters, the VC-MAES model maintained superior performance with an AUROC of 0.862 (95% CI, 0.848–0.874) and an AUPRC of 0.032 (95% CI, 0.024–0.045). In comparison, the NEWS achieved an AUROC of 0.730 (95% CI, 0.702–0.758) and an AUPRC of 0.026 (95% CI, 0.018–0.036), while the MEWS demonstrated an AUROC of 0.700 (95% CI, 0.674–0.724) and an AUPRC of 0.020 (95% CI, 0.012–0.030).
Threshold-Based Performance and False-Alarm Reduction
Table 3 presents the performance metrics for the VC-MAES, NEWS, and MEWS across the key cutoff values. Compared with MEWS at a cutoff of 3 (specificity: 92.4%, sensitivity: 48.4%, PPV: 4.1%, NNE: 24.4), VC-MAES at a cutoff of 30 (specificity: 94.3%, sensitivity: 65.8%, PPV: 7.1%, NNE: 14.1) demonstrated improved specificity and sensitivity, with a 42% reduction in NNE. In terms of false-positives per true positive (FPpTP; calculated as NNE – 1), this improvement corresponds to a 44% reduction. Similarly, at a higher specificity range (approximately 97%), the MEWS at a cutoff of 4 (sensitivity: 36.5%, PPV: 7.3%, NNE: 13.7) was outperformed by the VC-MAES at a cutoff of 40 (sensitivity: 58.6%, PPV: 11.2%, NNE: 8.9), achieving a 35% reduction in NNE and a 38% reduction in FPpTP.
A similar trend was observed when comparing the VC-MAES and NEWS. At a specificity range of 96–98%, the NEWS at a cutoff of 5 (sensitivity: 45.2%, PPV: 7.8%, NNE: 12.8) was surpassed by the VC-MAES at a cutoff of 50 (sensitivity: 52.4%, PPV: 17.5%, NNE: 5.7), reflecting a 55% reduction in NNE and a 60% reduction in FPpTP. Even at the highest specificity level (≥99%), the NEWS at a cutoff of 7 (sensitivity: 28.5%, PPV: 16.3%, NNE: 6.1) was outperformed by the VC-MAES at a cutoff of 70 (sensitivity: 37.5%, PPV: 39.5%, NNE: 2.5). This represents a 59% reduction in NNE and a 71% reduction in FPpTP.
Associations between Baseline EWSs and MAEs or pLOS
Using the patient episode definition, we performed binary logistic regression with the first recorded (baseline) VC-MAES, NEWS, and MEWS to evaluate their associations with MAEs and pLOS. A pLOS was defined as a stay ≥7 days, corresponding to the 75th-percentile. Baseline VC-MAES showed a strong association with both MAEs and pLOS. Specifically, for MAEs, baseline VC-MAES had an odds ratio (OR) of 2.065 (95% CI, 1.871–2.280), whereas baseline NEWS and baseline MEWS had ORs of 1.051 (95% CI, 1.045–1.057) and 1.064 (95% CI, 1.054–1.073), respectively (Table 4).
After adjusting for age, sex, and body mass index, baseline VC-MAES continued to exhibit the strongest association with both MAEs and pLOS. For MAEs, baseline VC-MAES achieved an adjusted OR of 1.484 (95% CI, 1.333–1.651), surpassing those of the NEWS (adjusted OR, 1.032; 95% CI, 1.026–1.038) and MEWS (adjusted OR, 1.041; 95% CI, 1.032–1.052). Detailed comparisons of ORs and confidence intervals for pLOS are available in Supplementary Table 4.
In this prospective observational external validation study, we tested and validated the VC-MAES, a deep-learning-based EWS, in general ward patients, including a substantial proportion of OBGYN patients. Overall, the VC-MAES demonstrated superior predictive performance for MAEs, significantly outperforming both the NEWS and MEWS. Notably, this advantage was also pronounced in the OBGYN subgroup, highlighting its strong discriminative performance in a population seldom investigated in previous AI-based EWS studies. This finding underscores the potential applicability of the VC-MAES in both low- and high-risk patients admitted to the general wards, as demonstrated by its consistent performance in both the OBGYN and IM subgroups. Furthermore, the baseline VC-MAES at admission was a predictor of both MAEs and pLOS, suggesting its potential value as a prognostic and triage tool upon patient arrival.
External validation is essential to confirm the reliability and generalizability of AI models, as they frequently demonstrate diminished performance in external validation settings. This performance degradation often results from overfitting during training or shifts in the distribution of input features, a phenomenon referred to as “dataset shift” [24,25]. Recent external validation studies have demonstrated significant performance degradation in clinical deterioration prediction models [26,27], highlighting the challenges of maintaining model performance across settings and patient demographics. These findings emphasize the importance of rigorous external validation to identify potential biases and guarantee the robustness and generalizability of models across diverse clinical contexts.
The VC-MAES was originally trained using medical records from a tertiary academic medical center in the Republic of Korea. Although the current study was also conducted at a tertiary academic medical center in the Republic of Korea, this hospital was located in a province with distinct healthcare systems and patient demographics. Despite this substantial data shift, VC-MAES maintained strong performance, suggesting its potential for broader applicability across various clinical settings. In our study, the overall prevalence of MAEs was 3.6%, consistent with previously reported ranges of 3%–9% [28]. However, the incidences of MAEs differed significantly for the two subspecialties assessed: 0.26% for the OBGYN services and 9.5% for the IM services. Despite these differences in MAE incidence, the VC-MAES maintained strong performance, achieving an AUROC of 0.964 and an AUPRC of 0.388 in the OBGYN subgroup, respectively, and an AUROC of 0.852 and an AUPRC of 0.350 in the IM subgroup, demonstrating its applicability to both low- and high-risk patient populations. Notably, the VC-MAES achieved the best EWS in the OBGYN subgroup, mirroring findings from a recent multicenter study of an AI-based EWS by Churpek et al. [29], who reported a slightly higher performance among female patients than among male patients (AUROC, 0.844 vs. 0.824) and particularly strong results in obstetric encounters (AUROC, 0.909).
Previous studies have indicated that a significant barrier to healthcare provider adoption of AI-based algorithms is the concern about increased false alarms from continuous monitoring, which could lead to alarm fatigue and workflow disruption [30]. We found that the VC-MAES model could reduce false-positive MAE predictions by up to 71%, improving clinical efficiency by minimizing unnecessary evaluations. By curtailing superfluous alerts, healthcare facilities can optimize resource allocation, enabling providers to focus on critical patient care tasks.
This study has several limitations. First, as a non-interventional, single-center study, it was inherently susceptible to various biases and confounding variables. Nonetheless, the prospective design helped mitigate some biases commonly associated with retrospective validation studies, providing a useful foundation for future interventional research. Second, the validation datasets were derived from a single hospital in the Republic of Korea, which may limit the generalizability of our findings to other countries with different healthcare systems and patient demographics. Additionally, the study population was predominantly female, largely because of the substantial number of patients from the OBGYN department. This sex imbalance also limits the applicability of these results to other clinical settings. Third, although the VC‐MAES outperformed the NEWS and MEWS, the low incidence of cardiac events and overall adverse events in the OBGYN population limited further sub-analyses and more comprehensive verification of the model performance in that subgroup. Fourth, the model performance can be affected by the quality and completeness of EHR data, which vary across institutions. Although the VC‐MAES uses the LOCF method and normal‐value imputation for missing data, this approach may not fully capture the clinical context. For instance, in this study, most GCS scores were missing and were imputed to be normal (GCS=15) to maintain consistency in the analyses, which may have influenced the model’s overall performance. Finally, the study did not include an interventional component; therefore, we were unable to assess the impact of the VC-MAES on clinical workflow, patient outcomes, or physician engagement in a real-world environment.
In conclusion, our study demonstrated that the VC-MAES significantly outperformed traditional scoring systems in predicting MAEs in hospitalized patients. Despite challenges such as data shifts, the VC-MAES maintained strong performance across clinical settings, showing the potential to reduce false-positive predictions of MAEs and enhance patient outcomes. Future research should focus on validating these findings in diverse clinical settings to ensure the robustness and generalizability of the model. Interventional studies are required to assess the real-world impact of the VC-MAES on patient outcomes, clinical workflow, and physician engagement.
▪ In this prospective external validation study, VitalCare - Major Adverse Event Score (VC-MAES)—a deep-learning-based early warning system—demonstrated superior predictive accuracy for major adverse events compared with traditional early warning scores (National Early Warning Score and Modified Early Warning Score).
▪ Despite the markedly different demographic profiles of general ward patients in internal medicine and obstetrics and gynecology settings—representing high- and low-risk cohorts, respectively— VC-MAES demonstrated high predictive performance, underscoring its potential for broader generalizability.
▪ VC-MAES also reduced false-positives by up to 71%, suggesting the possibility of enhanced clinical efficiency; however, further studies are warranted to confirm its impact on workflow and patient outcomes.

CONFLICT OF INTEREST

TS, EYC, JHK, KHL, KJK, YJ, SH, JYW, BEA, EY, and KBL are employees of AITRICS. No other potential conflicts of interest relevant to this article were reported.

FUNDING

This study was financially and administratively supported by the Ministry of Health and Welfare, Korea Health Industry Development Institute, and Daegu Metropolitan City.

ACKNOWLEDGMENTS

None.

AUTHOR CONTRIBUTIONS

Conceptualization: TS, JHK, KJK, BEA, HC, KBL. Methodology: TS, EYC, KJK, HC, KBL. Formal analysis: EYC, KHL, EYH, EY, SH, ICK, SHP, CHC, GIY, YJ, JYW. Data curation: EYC, KHL, EYH, EY, SH, ICK, SHP, CHC, GIY, YJ, JYW. Visualization: EYC, SH. Project administration: YJ, JHK, BEA, KJK. Writing – original draft: TS. Writing – review & editing: EYC, JHK, KJK, BEA, HC, KBL, TS, KHL, EYH, EY, SH, ICK, SHP, CHC, GIY, YJ, JYW. All authors read and agreed to the published version of the manuscript.

Supplementary materials can be found via https://doi.org/10.4266/acc.000525.
Supplementary Table 1.
Normal values for missing value imputation
acc-000525-Supplementary-Table-1.pdf
Supplementary Table 2.
Baseline demographic characteristics and laboratory values for both the non-event and event groups within the Internal Medicine cohorts
acc-000525-Supplementary-Table-2.pdf
Supplementary Table 3.
Baseline demographic characteristics and laboratory values for both the non-event and event groups within the Obstetrics and Gynecology cohorts
acc-000525-Supplementary-Table-3.pdf
Supplementary Table 4.
Univariate and multivariate logistic regression analyses of baseline EWSs (VC-MAES, NEWS, MEWS) for predicting prolonged length of stay
acc-000525-Supplementary-Table-4.pdf
Supplementary Figure 1.
A schematic of the VitalCare - Major Adverse Event Score model architecture. LSTM: long short-term memory; DNN: deep neural networks.
acc-000525-Supplementary-Figure-1.pdf
Figure 1.
Flowchart of patient enrollment. ICU: intensive care unit.
acc-000525f1.jpg
Figure 2.
Comparison of predictive performance among VitalCare - Major Adverse Event Score (VC-MAES [MAES]), Modified Early Warning Score (MEWS), and National Early Warning Score (NEWS) models within a 6-hour timeframe before adverse events. (A) Receiver operating characteristic curves with area under the curves. (B) Precision-recall curves with area under the curves. AUROC: area under the receiver operating characteristic curve; AUPRC: area under the precision-recall curve.
acc-000525f2.jpg
Figure 3.
Comparison of the predictive performance of the VitalCare - Major Adverse Event Score in identifying clinical deterioration or adverse events within a 6- to 24-hour timeframe before adverse events. (A) Receiver operating characteristic curves with area under the curves. (B) Precision-recall curves with area under the curves. AUROC: area under the receiver operating characteristic curve; AUPRC: area under the precision-recall curve.
acc-000525f3.jpg
Figure 4.
Comparison of predictive performance among the VitalCare - Major Adverse Event Score (VC-MAES [MAES]), Modified Early Warning Score (MEWS), and National Early Warning Score (NEWS) models within a 6-hour timeframe before adverse events in Internal medicine (IM; A, B) and obstetrics and gynecology (OBGYN; C, D) cohorts. (A) Receiver operating characteristic curves (ROC) with area under the receiver operating characteristic (AUROC) in IM. (B) Precision–recall curves with area under the precision–recall curve (AUPRC) in IM. (C) AUROC in OBGYN. (D) AUPRC in OBGYN.
acc-000525f4.jpg
Table 1.
Baseline demographic characteristics and laboratory values of the non-event and event groups
Variable Overall Non-event Event P-value
Number of encounters 6,039 5,822 217 -
Demographics
 Age (yr) 52 (37–68) 51 (37–67) 76 (68–83) <0.001
 Sex <0.001
  Male 1,274 (21.1) 1,146 (19.7) 128 (59.0)
  Female 4,765 (78.9) 4,676 (80.3) 89 (41.0)
Department -
 Internal medicine 2,177 (36.0) 1,970 (33.8) 207 (95.4)
 Obstetrics and gynecology 3,862 (64.0) 3,852 (66.2) 10 (4.6)
Height (cm) 160.0 (155.2–165.0) 160.0 (155.3–165.0) 161.7 (155.0–168.0) 0.033
Weight (kg) 60.8 (53.3–70.7) 61.0 (53.6–71.0) 55.0 (48.0–62.0) <0.001
Body mass index (kg/m2) 23.9 (21.3–26.8) 24.0 (21.4–26.9) 21.5 (18.5–23.8) <0.001
First MAES 1.5 (0.7–3.5) 1.4 (0.7–3.2) 6.2 (3.2–16.0) <0.001
First NEWS 1.0 (0.0–2.0) 1.0 (0.0–2.0) 3.0 (2.0–6.0) <0.001
First MEWS 1.0 (1.0–2.0) 1.0 (1.0–2.0) 2.0 (1.0–4.0) <0.001
Vital sign
 Pulse (/min) 82.3±14.9 82.0±14.6 92.4±18.4 <0.001
 Respiratory rate (/min) 19.6±2.4 19.6±2.1 22.2±5.4 <0.001
 Systolic blood pressure (mm Hg) 124.6±17.3 124.5±17.2 127.7±20.7 0.028
 Diastolic blood pressure (mm Hg) 73.0±11.9 73.0±11.9 73.4±13.5 0.630
 Body temperature (°C) 36.8±0.4 36.8±0.4 37.0±0.6 <0.001
 Saturation point O2 (%) 97.4±2.2 97.5±2.1 96.5±4.3 0.001
Laboratory
 Total bilirubin (mg/dl) 0.5±0.4 0.5±0.4 0.7±0.4 <0.001
 Lactate (mmol/L) 1.6±1.5 1.5±1.3 2.1±1.9 <0.001
 pH 7.4±0.1 7.4±0.1 7.4±0.1 0.014
 Sodium (mmol/L) 136.4±3.9 136.5±3.7 134.5±6.5 <0.001
 Potassium (mmol/L) 4.3±0.5 4.3±0.5 4.4±0.8 0.012
 Creatinine (mg/dl) 1.0±1.1 0.9±0.9 1.9±2.3 <0.001
 Hematocrit (%) 35.6±5.2 35.7±5.2 34.6±6.6 0.015
 White blood cell count (103/µl) 9.1±4.7 9.0±4.5 11.6±7.5 <0.001
 HCO3 (mmol/L) 24.0±5.2 24.1±4.9 23.6±6.0 0.272
 Platelets (103/µl) 233.8±83.1 233.9±81.2 232.8±123.5 0.896
 C-reactive protein (mg/dl) 3.3±6.1 2.8±5.5 9.4±9.0 <0.001

Values are presented as median (interquartile range), number (%), or mean±standard deviation.

MAES (VC-MAES): VitalCare - Major Adverse Event Score; NEWS: National Early Warning Score; MEWS: Modified Early Warning Score; HCO3: bicarbonate.

Table 2.
AUROC and AUPRC comparison results within 6, 12, and 24 hours preceding the event
EWS Within timeframe AUROC (95% CI) AUPRC (95% CI)
MAES 6 0.918 (0.912–0.924) 0.353 (0.330–0.375)
NEWS 6 0.797 (0.784–0.810) 0.124 (0.110–0.139)
MEWS 6 0.722 (0.707–0.737) 0.079 (0.069–0.090)
MAES 12 0.915 (0.910–0.920) 0.337 (0.319–0.355)
NEWS 12 0.777 (0.766–0.788) 0.143 (0.131–0.155)
MEWS 12 0.691 (0.678–0.704) 0.095 (0.085–0.105)
MAES 24 0.908 (0.904–0.912) 0.333 (0.320–0.346)
NEWS 24 0.758 (0.750–0.767) 0.159 (0.149–0.171)
MEWS 24 0.671 (0.660–0.681) 0.112 (0.103–0.120)

AUROC: area under the receiver operating characteristic curve; AUPRC: area under the precision-recall curve; EWS: early warning score; MAES (VC-MAES): VitalCare - Major Adverse Event Score; MEWS: Modified Early Warning Score.

Table 3.
Performance metrics for MAES, NEWS, and MEWS across key cutoffs
EWS Cutoff Metrics (95% CI), %
Sensitivity Specificity PPV NPV F1-score
MAES 30 65.8 (63.5–67.9) 94.3 (94.1–94.4) 7.1 (6.9–7.4) 99.8 (99.7–99.8) 12.9 (12.4–13.3)
40 58.6 (56.3–60.1) 97.9 (96.8–97.0) 11.2 (10.7–11.6) 99.7 (99.7–99.7) 18.8 (18.1–19.4)
50 52.4 (50.0–54.6) 98.3 (98.3–98.4) 17.5 (16.8–18.1) 99.7 (99.7–99.7) 26.2 (25.1–27.2)
70 37.5 (35.3–39.3) 99.6 (99.6–99.6) 39.5 (37.2–41.5) 99.6 (99.6–99.6) 38.5 (36.4–40.4)
NEWS 4 52.3 (50.1–54.7) 93.3 (93.2–93.4) 5.0 (4.8–5.1) 99.7 (99.6–99.7) 9.1 (8.7–9.5)
5 45.2 (42.9–47.5) 96.4 (96.3–96.5) 7.8 (7.4–8.2) 99.6 (99.6–99.6) 13.3 (12.6–14.0)
6 36.9 (34.8–39.3) 98.1 (98.0–98.1) 11.4 (10.8–12.0) 99.6 (99.6–99.6) 17.4 (16.4–18.4)
7 28.5 (26.3–30.4) 99.0 (99.0–99.1) 16.3 (15.1–17.3) 99.5 (99.5–99.5) 20.7 (19.2–22.0)
MEWS 3 48.4 (46.2–50.4) 92.4 (92.3–92.5) 4.1 (3.9–4.2) 99.6 (99.6–99.6) 7.5 (7.2–7.8)
4 36.5 (34.4–38.6) 96.9 (96.8–97.0) 7.3 (6.9–7.7) 99.6 (99.5–99.6) 12.2 (11.5–12.8)
5 26.4 (24.3–28.4) 98.8 (98.7–98.8) 12.5 (11.6–13.4) 99.5 (99.5–99.5) 16.9 (15.7–18.2)

MAES (VC-MAES): VitalCare - Major Adverse Event Score; NEWS: National Early Warning Score; MEWS: Modified Early Warning Score; EWS: early warning score; PPV: positive predictive value; NPV: negative predictive value.

Table 4.
Univariate and multivariate logistic regression analyses of baseline EWSs (MAES, NEWS, MEWS) for predicting major adverse events
EWS Univariate
Multivariate (adjusted for age, sex, and BMI)
OR (95% CI) P-value OR (95% CI) P-value
Baseline MAES 2.065 (1.871–2.280) <0.001 1.484 (1.333–1.651) <0.001
Baseline NEWS 1.051 (1.045–1.057) <0.001 1.032 (1.026–1.038) <0.001
Baseline MEWS 1.064 (1.054–1.073) <0.001 1.041 (1.032–1.052) <0.001

EWS: early warning score; MAES (VC-MAES): VitalCare - Major Adverse Event Score; NEWS: National Early Warning Score; MEWS: Modified Early Warning Score; BMI: body mass index; OR: odds ratio.

  • 1. Liu V, Kipnis P, Rizk NW, Escobar GJ. Adverse outcomes associated with delayed intensive care unit transfers in an integrated healthcare system. J Hosp Med 2012;7:224-30.ArticlePubMedPDF
  • 2. Sykora D, Traub SJ, Buras MR, Hodgson NR, Geyer HL. Increased inpatient length of stay after early unplanned transfer to higher levels of care. Crit Care Explor 2020;2:e0103. ArticlePubMedPMC
  • 3. Escobar GJ, Greene JD, Gardner MN, Marelich GP, Quick B, Kipnis P. Intra-hospital transfers to a higher level of care: contribution to total hospital and intensive care unit (ICU) mortality and length of stay (LOS). J Hosp Med 2011;6:74-80.ArticlePubMedPDF
  • 4. Bapoje SR, Gaudiani JL, Narayanan V, Albert RK. Unplanned transfers to a medical intensive care unit: causes and relationship to preventable errors in care. J Hosp Med 2011;6:68-72.ArticlePubMedPDF
  • 5. Le Lagadec MD, Dwyer T. Scoping review: the use of early warning systems for the identification of in-hospital patients at risk of deterioration. Aust Crit Care 2017;30:211-8.ArticlePubMed
  • 6. Cummings BC, Ansari S, Motyka JR, Wang G, Medlin RP Jr, Kronick SL, et al. Predicting intensive care transfers and other unforeseen events: analytic model validation study and comparison to existing methods. JMIR Med Inform 2021;9:e25066. ArticlePubMedPMC
  • 7. Lee A, Bishop G, Hillman KM, Daffurn K. The medical emergency team. Anaesth Intensive Care 1995;23:183-6.ArticlePubMedPDF
  • 8. Smith ME, Chiovaro JC, O'Neil M, Kansagara D, Quinones A, Freeman M, et al. Early warning system scores: a systematic review [Internet]. Department of Veterans Affairs (US); 2014 [cited 2025 Apr 1]. Available from: https://www.ncbi.nlm.nih.gov/pubmed/25506953
  • 9. Romero-Brufau S, Morlan BW, Johnson M, Hickman J, Kirkland LL, Naessens JM, et al. Evaluating automated rules for rapid response system alarm triggers in medical and surgical patients. J Hosp Med 2017;12:217-23.ArticlePubMedPDF
  • 10. Downey CL, Tahir W, Randell R, Brown JM, Jayne DG. Strengths and limitations of early warning scores: a systematic review and narrative synthesis. Int J Nurs Stud 2017;76:106-19.ArticlePubMed
  • 11. Haegdorens F, Van Bogaert P, Roelant E, De Meester K, Misselyn M, Wouters K, et al. The introduction of a rapid response system in acute hospitals: a pragmatic stepped wedge cluster randomised controlled trial. Resuscitation 2018;129:127-34.ArticlePubMed
  • 12. Romero-Brufau S, Huddleston JM, Naessens JM, Johnson MG, Hickman J, Morlan BW, et al. Widely used track and trigger scores: are they ready for automation in practice? Resuscitation 2014;85:549-52.ArticlePubMed
  • 13. Holland M, Kellett J. The United Kingdom’s National Early Warning Score: should everyone use it?: a narrative review. Intern Emerg Med 2023;18:573-83.ArticlePubMedPMCPDF
  • 14. Hong N, Liu C, Gao J, Han L, Chang F, Gong M, et al. State of the art of machine learning-enabled clinical decision support in intensive care units: literature review. JMIR Med Inform 2022;10:e28781. ArticlePubMedPMC
  • 15. Greco M, Caruso PF, Cecconi M. Artificial intelligence in the intensive care unit. Semin Respir Crit Care Med 2021;42:2-9.ArticlePubMed
  • 16. Salehinejad H, Meehan AM, Rahman PA, Core MA, Borah BJ, Caraballo PJ. Novel machine learning model to improve performance of an early warning system in hospitalized patients: a retrospective multisite cross-validation study. EClinicalMedicine 2023;66:102312.ArticlePubMedPMC
  • 17. Peelen RV, Eddahchouri Y, Koeneman M, van de Belt TH, van Goor H, Bredie SJ. Algorithms for prediction of clinical deterioration on the general wards: a scoping review. J Hosp Med 2021;16:612-9.ArticlePubMedPDF
  • 18. Nolan JP, Berg RA, Andersen LW, Bhanji F, Chan PS, Donnino MW, et al. Cardiac arrest and cardiopulmonary resuscitation outcome reports: update of the Utstein Resuscitation Registry Template for In-Hospital Cardiac Arrest: a consensus report from a Task Force of the International Liaison Committee on Resuscitation (American Heart Association, European Resuscitation Council, Australian and New Zealand Council on Resuscitation, Heart and Stroke Foundation of Canada, InterAmerican Heart Foundation, Resuscitation Council of Southern Africa, Resuscitation Council of Asia). Resuscitation 2019;144:166-77.ArticlePubMed
  • 19. Reese J, Deakyne SJ, Blanchard A, Bajaj L. Rate of preventable early unplanned intensive care unit transfer for direct admissions and emergency department admissions. Hosp Pediatr 2015;5:27-34.ArticlePubMed
  • 20. Sung M, Hahn S, Han CH, Lee JM, Lee J, Yoo J, et al. Event prediction model considering time and input error using electronic medical records in the intensive care unit: retrospective study. JMIR Med Inform 2021;9:e26426. ArticlePubMedPMC
  • 21. Fang AH, Lim WT, Balakrishnan T. Early warning score validation methodologies and performance metrics: a systematic review. BMC Med Inform Decis Mak 2020;20:111.ArticlePubMedPMCPDF
  • 22. Chen Y, Scholten A, Chomsky-Higgins K, Nwaogu I, Gosnell JE, Seib C, et al. Risk factors associated with perioperative complications and prolonged length of stay after laparoscopic adrenalectomy. JAMA Surg 2018;153:1036-41.ArticlePubMedPMC
  • 23. Patel K, Diaz MJ, Taneja K, Batchu S, Zhang A, Mohamed A, et al. Predictors of inpatient admission likelihood and prolonged length of stay among cerebrovascular disease patients: a nationwide emergency department sample analysis. J Stroke Cerebrovasc Dis 2023;32:106983.ArticlePubMed
  • 24. Finlayson SG, Subbaswamy A, Singh K, Bowers J, Kupke A, Zittrain J, et al. The clinician and dataset shift in artificial intelligence. N Engl J Med 2021;385:283-6.ArticlePubMedPMCPDF
  • 25. Cabitza F, Campagner A, Soares F, García de Guadiana-Romualdo L, Challa F, Sulejmani A, et al. The importance of being external: methodological insights for the external validation of machine learning models in medicine. Comput Methods Programs Biomed 2021;208:106288.ArticlePubMed
  • 26. Wong A, Otles E, Donnelly JP, Krumm A, McCullough J, DeTroyer-Cooley O, et al. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Intern Med 2021;181:1065-70.ArticlePubMed
  • 27. Byrd TF 4th, Southwell B, Ravishankar A, Tran T, Kc A, Phelan T, et al. Validation of a proprietary deterioration index model and performance in hospitalized adults. JAMA Netw Open 2023;6:e2324176. ArticlePubMedPMC
  • 28. Wu CL, Kuo CT, Shih SJ, Chen JC, Lo YC, Yu HH, et al. Implementation of an electronic national early warning system to decrease clinical deterioration in hospitalized patients at a tertiary medical center. Int J Environ Res Public Health 2021;18:4550.ArticlePubMedPMC
  • 29. Churpek MM, Carey KA, Snyder A, Winslow CJ, Gilbert E, Shah NS, et al. Multicenter development and prospective validation of eCARTv5: a gradient-boosted machine-learning early warning score. Crit Care Explor 2025;7:e1232. ArticlePubMedPMC
  • 30. Lambert SI, Madi M, Sopka S, Lenes A, Stange H, Buszello CP, et al. An integrative review on the acceptance of artificial intelligence among healthcare professionals in hospitals. NPJ Digit Med 2023;6:111.ArticlePubMedPMCPDF

Figure & Data

References

    Citations

    Citations to this article as recorded by  
    • Clinical Context Is More Important than Data Quantity to the Performance of an Artificial Intelligence-Based Early Warning System
      Taeyong Sim, Eunyoung Cho, Jihyun Kim, Ho Gwan Kim, Soo-Jeong Kim
      Journal of Clinical Medicine.2025; 14(13): 4444.     CrossRef

    • PubReader PubReader
    • ePub LinkePub Link
    • Cite
      CITE
      export Copy
      Close
      Download Citation
      Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

      Format:
      • RIS — For EndNote, ProCite, RefWorks, and most other reference management software
      • BibTeX — For JabRef, BibDesk, and other BibTeX-specific software
      Include:
      • Citation for the content below
      Prospective external validation of a deep-learning-based early-warning system for major adverse events in general wards in South Korea
      Acute Crit Care. 2025;40(2):197-208.   Published online May 30, 2025
      Close
    • XML DownloadXML Download
    Figure
    • 0
    • 1
    • 2
    • 3
    Related articles
    Prospective external validation of a deep-learning-based early-warning system for major adverse events in general wards in South Korea
    Image Image Image Image
    Figure 1. Flowchart of patient enrollment. ICU: intensive care unit.
    Figure 2. Comparison of predictive performance among VitalCare - Major Adverse Event Score (VC-MAES [MAES]), Modified Early Warning Score (MEWS), and National Early Warning Score (NEWS) models within a 6-hour timeframe before adverse events. (A) Receiver operating characteristic curves with area under the curves. (B) Precision-recall curves with area under the curves. AUROC: area under the receiver operating characteristic curve; AUPRC: area under the precision-recall curve.
    Figure 3. Comparison of the predictive performance of the VitalCare - Major Adverse Event Score in identifying clinical deterioration or adverse events within a 6- to 24-hour timeframe before adverse events. (A) Receiver operating characteristic curves with area under the curves. (B) Precision-recall curves with area under the curves. AUROC: area under the receiver operating characteristic curve; AUPRC: area under the precision-recall curve.
    Figure 4. Comparison of predictive performance among the VitalCare - Major Adverse Event Score (VC-MAES [MAES]), Modified Early Warning Score (MEWS), and National Early Warning Score (NEWS) models within a 6-hour timeframe before adverse events in Internal medicine (IM; A, B) and obstetrics and gynecology (OBGYN; C, D) cohorts. (A) Receiver operating characteristic curves (ROC) with area under the receiver operating characteristic (AUROC) in IM. (B) Precision–recall curves with area under the precision–recall curve (AUPRC) in IM. (C) AUROC in OBGYN. (D) AUPRC in OBGYN.
    Prospective external validation of a deep-learning-based early-warning system for major adverse events in general wards in South Korea
    Variable Overall Non-event Event P-value
    Number of encounters 6,039 5,822 217 -
    Demographics
     Age (yr) 52 (37–68) 51 (37–67) 76 (68–83) <0.001
     Sex <0.001
      Male 1,274 (21.1) 1,146 (19.7) 128 (59.0)
      Female 4,765 (78.9) 4,676 (80.3) 89 (41.0)
    Department -
     Internal medicine 2,177 (36.0) 1,970 (33.8) 207 (95.4)
     Obstetrics and gynecology 3,862 (64.0) 3,852 (66.2) 10 (4.6)
    Height (cm) 160.0 (155.2–165.0) 160.0 (155.3–165.0) 161.7 (155.0–168.0) 0.033
    Weight (kg) 60.8 (53.3–70.7) 61.0 (53.6–71.0) 55.0 (48.0–62.0) <0.001
    Body mass index (kg/m2) 23.9 (21.3–26.8) 24.0 (21.4–26.9) 21.5 (18.5–23.8) <0.001
    First MAES 1.5 (0.7–3.5) 1.4 (0.7–3.2) 6.2 (3.2–16.0) <0.001
    First NEWS 1.0 (0.0–2.0) 1.0 (0.0–2.0) 3.0 (2.0–6.0) <0.001
    First MEWS 1.0 (1.0–2.0) 1.0 (1.0–2.0) 2.0 (1.0–4.0) <0.001
    Vital sign
     Pulse (/min) 82.3±14.9 82.0±14.6 92.4±18.4 <0.001
     Respiratory rate (/min) 19.6±2.4 19.6±2.1 22.2±5.4 <0.001
     Systolic blood pressure (mm Hg) 124.6±17.3 124.5±17.2 127.7±20.7 0.028
     Diastolic blood pressure (mm Hg) 73.0±11.9 73.0±11.9 73.4±13.5 0.630
     Body temperature (°C) 36.8±0.4 36.8±0.4 37.0±0.6 <0.001
     Saturation point O2 (%) 97.4±2.2 97.5±2.1 96.5±4.3 0.001
    Laboratory
     Total bilirubin (mg/dl) 0.5±0.4 0.5±0.4 0.7±0.4 <0.001
     Lactate (mmol/L) 1.6±1.5 1.5±1.3 2.1±1.9 <0.001
     pH 7.4±0.1 7.4±0.1 7.4±0.1 0.014
     Sodium (mmol/L) 136.4±3.9 136.5±3.7 134.5±6.5 <0.001
     Potassium (mmol/L) 4.3±0.5 4.3±0.5 4.4±0.8 0.012
     Creatinine (mg/dl) 1.0±1.1 0.9±0.9 1.9±2.3 <0.001
     Hematocrit (%) 35.6±5.2 35.7±5.2 34.6±6.6 0.015
     White blood cell count (103/µl) 9.1±4.7 9.0±4.5 11.6±7.5 <0.001
     HCO3 (mmol/L) 24.0±5.2 24.1±4.9 23.6±6.0 0.272
     Platelets (103/µl) 233.8±83.1 233.9±81.2 232.8±123.5 0.896
     C-reactive protein (mg/dl) 3.3±6.1 2.8±5.5 9.4±9.0 <0.001
    EWS Within timeframe AUROC (95% CI) AUPRC (95% CI)
    MAES 6 0.918 (0.912–0.924) 0.353 (0.330–0.375)
    NEWS 6 0.797 (0.784–0.810) 0.124 (0.110–0.139)
    MEWS 6 0.722 (0.707–0.737) 0.079 (0.069–0.090)
    MAES 12 0.915 (0.910–0.920) 0.337 (0.319–0.355)
    NEWS 12 0.777 (0.766–0.788) 0.143 (0.131–0.155)
    MEWS 12 0.691 (0.678–0.704) 0.095 (0.085–0.105)
    MAES 24 0.908 (0.904–0.912) 0.333 (0.320–0.346)
    NEWS 24 0.758 (0.750–0.767) 0.159 (0.149–0.171)
    MEWS 24 0.671 (0.660–0.681) 0.112 (0.103–0.120)
    EWS Cutoff Metrics (95% CI), %
    Sensitivity Specificity PPV NPV F1-score
    MAES 30 65.8 (63.5–67.9) 94.3 (94.1–94.4) 7.1 (6.9–7.4) 99.8 (99.7–99.8) 12.9 (12.4–13.3)
    40 58.6 (56.3–60.1) 97.9 (96.8–97.0) 11.2 (10.7–11.6) 99.7 (99.7–99.7) 18.8 (18.1–19.4)
    50 52.4 (50.0–54.6) 98.3 (98.3–98.4) 17.5 (16.8–18.1) 99.7 (99.7–99.7) 26.2 (25.1–27.2)
    70 37.5 (35.3–39.3) 99.6 (99.6–99.6) 39.5 (37.2–41.5) 99.6 (99.6–99.6) 38.5 (36.4–40.4)
    NEWS 4 52.3 (50.1–54.7) 93.3 (93.2–93.4) 5.0 (4.8–5.1) 99.7 (99.6–99.7) 9.1 (8.7–9.5)
    5 45.2 (42.9–47.5) 96.4 (96.3–96.5) 7.8 (7.4–8.2) 99.6 (99.6–99.6) 13.3 (12.6–14.0)
    6 36.9 (34.8–39.3) 98.1 (98.0–98.1) 11.4 (10.8–12.0) 99.6 (99.6–99.6) 17.4 (16.4–18.4)
    7 28.5 (26.3–30.4) 99.0 (99.0–99.1) 16.3 (15.1–17.3) 99.5 (99.5–99.5) 20.7 (19.2–22.0)
    MEWS 3 48.4 (46.2–50.4) 92.4 (92.3–92.5) 4.1 (3.9–4.2) 99.6 (99.6–99.6) 7.5 (7.2–7.8)
    4 36.5 (34.4–38.6) 96.9 (96.8–97.0) 7.3 (6.9–7.7) 99.6 (99.5–99.6) 12.2 (11.5–12.8)
    5 26.4 (24.3–28.4) 98.8 (98.7–98.8) 12.5 (11.6–13.4) 99.5 (99.5–99.5) 16.9 (15.7–18.2)
    EWS Univariate
    Multivariate (adjusted for age, sex, and BMI)
    OR (95% CI) P-value OR (95% CI) P-value
    Baseline MAES 2.065 (1.871–2.280) <0.001 1.484 (1.333–1.651) <0.001
    Baseline NEWS 1.051 (1.045–1.057) <0.001 1.032 (1.026–1.038) <0.001
    Baseline MEWS 1.064 (1.054–1.073) <0.001 1.041 (1.032–1.052) <0.001
    Table 1. Baseline demographic characteristics and laboratory values of the non-event and event groups

    Values are presented as median (interquartile range), number (%), or mean±standard deviation.

    MAES (VC-MAES): VitalCare - Major Adverse Event Score; NEWS: National Early Warning Score; MEWS: Modified Early Warning Score; HCO3: bicarbonate.

    Table 2. AUROC and AUPRC comparison results within 6, 12, and 24 hours preceding the event

    AUROC: area under the receiver operating characteristic curve; AUPRC: area under the precision-recall curve; EWS: early warning score; MAES (VC-MAES): VitalCare - Major Adverse Event Score; MEWS: Modified Early Warning Score.

    Table 3. Performance metrics for MAES, NEWS, and MEWS across key cutoffs

    MAES (VC-MAES): VitalCare - Major Adverse Event Score; NEWS: National Early Warning Score; MEWS: Modified Early Warning Score; EWS: early warning score; PPV: positive predictive value; NPV: negative predictive value.

    Table 4. Univariate and multivariate logistic regression analyses of baseline EWSs (MAES, NEWS, MEWS) for predicting major adverse events

    EWS: early warning score; MAES (VC-MAES): VitalCare - Major Adverse Event Score; NEWS: National Early Warning Score; MEWS: Modified Early Warning Score; BMI: body mass index; OR: odds ratio.


    ACC : Acute and Critical Care
    TOP