TY - JOUR
T1 - Machine Learning-Based Model to Predict Heart Disease in Early Stage Employing Different Feature Selection Techniques
AU - Biswas, Niloy
AU - Ali, Md Mamun
AU - Rahaman, Md Abdur
AU - Islam, Minhajul
AU - Mia, Md Rajib
AU - Azam, Sami
AU - Ahmed, Kawsar
AU - Bui, Francis M.
AU - Al-Zahrani, Fahad Ahmed
AU - Moni, Mohammad Ali
N1 - Publisher Copyright:
© 2023 Niloy Biswas et al.
PY - 2023
Y1 - 2023
N2 - Almost 17.9 million people are losing their lives due to cardiovascular disease, which is 32% of total death throughout the world. It is a global concern nowadays. However, it is a matter of joy that the mortality rate due to heart disease can be reduced by early treatment, for which early-stage detection is a crucial issue. This study is aimed at building a potential machine learning model to predict heart disease in early stage employing several feature selection techniques to identify significant features. Three different approaches were applied for feature selection such as chi-square, ANOVA, and mutual information, and the selected feature subsets were denoted as SF1, SF2, and SF3, respectively. Then, six different machine learning models such as logistic regression (C1), support vector machine (C2), K-nearest neighbor (C3), random forest (C4), Naive Bayes (C5), and decision tree (C6) were applied to find the most optimistic model along with the best-fit feature subset. Finally, we found that random forest provided the most optimistic performance for SF3 feature subsets with 94.51% accuracy, 94.87% sensitivity, 94.23% specificity, 94.95 area under ROC curve (AURC), and 0.31 log loss. The performance of the applied model along with selected features indicates that the proposed model is highly potential for clinical use to predict heart disease in the early stages with low cost and less time.
AB - Almost 17.9 million people are losing their lives due to cardiovascular disease, which is 32% of total death throughout the world. It is a global concern nowadays. However, it is a matter of joy that the mortality rate due to heart disease can be reduced by early treatment, for which early-stage detection is a crucial issue. This study is aimed at building a potential machine learning model to predict heart disease in early stage employing several feature selection techniques to identify significant features. Three different approaches were applied for feature selection such as chi-square, ANOVA, and mutual information, and the selected feature subsets were denoted as SF1, SF2, and SF3, respectively. Then, six different machine learning models such as logistic regression (C1), support vector machine (C2), K-nearest neighbor (C3), random forest (C4), Naive Bayes (C5), and decision tree (C6) were applied to find the most optimistic model along with the best-fit feature subset. Finally, we found that random forest provided the most optimistic performance for SF3 feature subsets with 94.51% accuracy, 94.87% sensitivity, 94.23% specificity, 94.95 area under ROC curve (AURC), and 0.31 log loss. The performance of the applied model along with selected features indicates that the proposed model is highly potential for clinical use to predict heart disease in the early stages with low cost and less time.
UR - http://www.scopus.com/inward/record.url?scp=85159376797&partnerID=8YFLogxK
U2 - 10.1155/2023/6864343
DO - 10.1155/2023/6864343
M3 - Article
AN - SCOPUS:85159376797
SN - 2314-6133
VL - 2023
JO - BioMed Research International
JF - BioMed Research International
M1 - 6864343
ER -