A comparative study of different machine learning tools in detecting diabetes

Pronab Ghosh, Sami Azam, Asif Karim, Mehedi Hassan, Kuber Roy, Mirjam Jonkman

Research output: Contribution to journalArticlepeer-review

94 Downloads (Pure)


A significant proportion of people around the world are currently suffering from the harmful effects of diabetes and a considerable number of them not being identified at an early stage. Over time this may result in serious health problem such as blindness and kidney failure. To accurately classify the disease, different machine learning (ML) approaches can be utilized. In this context, four separate ML algorithms, namely Gradient Boosting (GB), Support Vector Machine (SVM) AdaBoost (AB), and Random Forest (RF) are evaluated using the Pima Indians diabetes dataset, first with based on all features, then to the features selected with the Minimal Redundancy Maximal Relevance (MRMR) Feature Selection (FS) approach. Seven different types of performance evaluation metrics were computed with a 10-fold cross-validation (CV) approach. Computational complexity is also evaluated. The best results were obtained with the Random Forest approach, achieving an accuracy of 99.35%.

Original languageEnglish
Pages (from-to)467-477
Number of pages11
JournalProcedia Computer Science
Issue number1
Publication statusPublished - 2021
Event25th KES International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, KES 2021 - Szczecin, Poland
Duration: 8 Sept 202110 Sept 2021


Dive into the research topics of 'A comparative study of different machine learning tools in detecting diabetes'. Together they form a unique fingerprint.

Cite this