TY - JOUR
T1 - A comparative study of different machine learning tools in detecting diabetes
AU - Ghosh, Pronab
AU - Azam, Sami
AU - Karim, Asif
AU - Hassan, Mehedi
AU - Roy, Kuber
AU - Jonkman, Mirjam
PY - 2021
Y1 - 2021
N2 - A significant proportion of people around the world are currently suffering from the harmful effects of diabetes and a considerable number of them not being identified at an early stage. Over time this may result in serious health problem such as blindness and kidney failure. To accurately classify the disease, different machine learning (ML) approaches can be utilized. In this context, four separate ML algorithms, namely Gradient Boosting (GB), Support Vector Machine (SVM) AdaBoost (AB), and Random Forest (RF) are evaluated using the Pima Indians diabetes dataset, first with based on all features, then to the features selected with the Minimal Redundancy Maximal Relevance (MRMR) Feature Selection (FS) approach. Seven different types of performance evaluation metrics were computed with a 10-fold cross-validation (CV) approach. Computational complexity is also evaluated. The best results were obtained with the Random Forest approach, achieving an accuracy of 99.35%.
AB - A significant proportion of people around the world are currently suffering from the harmful effects of diabetes and a considerable number of them not being identified at an early stage. Over time this may result in serious health problem such as blindness and kidney failure. To accurately classify the disease, different machine learning (ML) approaches can be utilized. In this context, four separate ML algorithms, namely Gradient Boosting (GB), Support Vector Machine (SVM) AdaBoost (AB), and Random Forest (RF) are evaluated using the Pima Indians diabetes dataset, first with based on all features, then to the features selected with the Minimal Redundancy Maximal Relevance (MRMR) Feature Selection (FS) approach. Seven different types of performance evaluation metrics were computed with a 10-fold cross-validation (CV) approach. Computational complexity is also evaluated. The best results were obtained with the Random Forest approach, achieving an accuracy of 99.35%.
KW - AdaBoost
KW - Gradient boosting
KW - MRMR
KW - Random forest
KW - Support Vector Machine (RBF kernel)
UR - http://www.scopus.com/inward/record.url?scp=85116911563&partnerID=8YFLogxK
U2 - 10.1016/j.procs.2021.08.048
DO - 10.1016/j.procs.2021.08.048
M3 - Article
AN - SCOPUS:85116911563
SN - 1877-0509
VL - 192
SP - 467
EP - 477
JO - Procedia Computer Science
JF - Procedia Computer Science
IS - 1
T2 - 25th KES International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, KES 2021
Y2 - 8 September 2021 through 10 September 2021
ER -