Towards Robust and Explainable Heart Disease Prediction: A Hybrid Feature Engineering and Cross-Dataset Validation Approach Using Machine Learning
DOI:
https://doi.org/10.70849/IJSCIKeywords:
Heart disease, machine learning, hybrid feature engineering, explainable AI, SHAP, cross-dataset validation.Abstract
Cardiovascular diseases (CVDs) remain the leading cause of mortality worldwide, claiming more than 20 million lives annually. Early detection is crucial to lowering mortality and reducing healthcare burdens. Machine learning (ML) has demonstrated significant potential for predicting CVD risk, yet most existing approaches suffer from three major shortcomings: dependence on small datasets (mostly Cleveland), reliance on single-method feature engineering, and poor interpretability that hinders clinical acceptance.
This paper proposes a hybrid, explainable ML framework integrating multiple feature engineering methods (Chi-Square, Recursive Feature Elimination, LASSO regression, and Autoencoder-based reduction) with cross-dataset validation. Five models—Logistic Regression, Support Vector Machines, Random Forest, XGBoost, and Deep Neural Networks—were tested on the Cleveland dataset and validated on the Kaggle dataset. The best performance was obtained by a Deep Neural Network (accuracy = 94.8%) and XGBoost (94.3%). SHAP-based interpretability revealed that chest pain, cholesterol, maximum heart rate, ST depression, and vessel count are the most influential features.
Unlike earlier studies, our framework ensures accuracy, generalization across datasets, and transparency of predictions. These findings pave the way for trustworthy clinical decision support and telemedicine deployment.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.








