As stated in our original article, this work represents our institution's initial experience with clinical prediction modeling using machine learning. Our primary objective was not to establish a definitive clinical tool, but rather to explore the predictive potential of routinely available biomarkers, assess their relative importance, and propose preliminary threshold values that could guide future investigations.
We fully acknowledge that our sample size was limited and that the events-per-variable ratio was below the ideal threshold for stable model estimation. This limitation was explicitly discussed in our paper. We view our findings as hypothesis-generating and foundational for larger, multi-center studies. These forthcoming studies should aim to achieve more robust calibration and external validation, as recommended in our article.
Regarding the balanced 50/50 sampling strategy, we recognize that this approach may affect calibration and could lead to optimistic accuracy estimates. Our intent was to explore model performance under controlled class balance, not to simulate real-world prevalence.
Concerning the choice of algorithms, including the Probabilistic Data Association (PDA) classifier, our goal was to test a range of classification approaches during this exploratory phase. We fully agree that explainability and transparency using methods such as SHAP or feature importance visualization are critical for clinical translation. However, in addition to the black-box methods, we also used an explainable method (DT) in our study. Among these, the DT, which is transparent and explainable, was the most successful, as shown in Figure 3.
We recognize importance of TRIPOD-AI reporting guidance. Thus, in the conclusion section of our study, this issue was highlighted, and we recommended that researchers refine machine learning models in future studies.
In the last paragraph of our article, we acknowledged the limitations highlighted in the critique and stated: "Our study findings suggest that preoperative levels of magnesium, albumin, and total iron-binding capacity may help to predict POAF risk. However, it should be kept in mind that this study was conducted on a limited scale due to the low rate of atrial fibrillation in a selected population. Therefore, the findings of the study need to be strengthened by external validation for their applicability. Future studies should involve larger patient populations to validate these predictors and refine machine learning models."
In conclusion, despite its exploratory nature, we believe our study provides valuable early insights into the feasibility of applying machine-learning approaches to identify biomarkers predictive of POAF. We are grateful for this constructive feedback.
Data Sharing Statement: The data that support the findings of this study are available from the corresponding author upon reasonable request.
Author Contributions: All authors contributed equally to this article.
Conflict of Interest: The authors declared no conflicts of interest with respect to the authorship and/or publication of this article.
Funding: The authors received no financial support for the research and/or authorship of this article.