TY - JOUR
T1 - Expert cancer model using supervised algorithms with a LASSO selection approach
AU - Ghosh, Pronab
AU - Karim, Asif
AU - Atik, Syeda Tanjila
AU - Afrin, Saima
AU - Saifuzzaman, Mohd
PY - 2021/6
Y1 - 2021/6
N2 - One of the most critical issues of the mortality rate in the medical field in current times is breast cancer. Nowadays, a large number of men and women are facing cancer-related deaths due to the lack of early diagnosis systems and proper treatment per year. To tackle the issue, various data mining approaches have been analyzed to build an effective model that helps to identify the different stages of deadly cancers. The study successfully proposes an early cancer disease model based on five different supervised algorithms such as logistic regression (henceforth LR), decision tree (henceforth DT), random forest (henceforth RF), Support vector machine (henceforth SVM), and K-nearest neighbor (henceforth KNN). After an appropriate preprocessing of the dataset, least absolute shrinkage and selection operator (LASSO) was used for feature selection (FS) using a 10-fold cross-validation (CV) approach. Employing LASSO with 10-fold cross-validation has been a novel steps introduced in this research. Afterwards, different performance evaluation metrics were measured to show accurate predictions based on the proposed algorithms. The result indicated top accuracy was received from RF classifier, approximately 99.41% with the integration of LASSO. Finally, a comprehensive comparison was carried out on Wisconsin breast cancer (diagnostic) dataset (WBCD) together with some current works containing all features.
AB - One of the most critical issues of the mortality rate in the medical field in current times is breast cancer. Nowadays, a large number of men and women are facing cancer-related deaths due to the lack of early diagnosis systems and proper treatment per year. To tackle the issue, various data mining approaches have been analyzed to build an effective model that helps to identify the different stages of deadly cancers. The study successfully proposes an early cancer disease model based on five different supervised algorithms such as logistic regression (henceforth LR), decision tree (henceforth DT), random forest (henceforth RF), Support vector machine (henceforth SVM), and K-nearest neighbor (henceforth KNN). After an appropriate preprocessing of the dataset, least absolute shrinkage and selection operator (LASSO) was used for feature selection (FS) using a 10-fold cross-validation (CV) approach. Employing LASSO with 10-fold cross-validation has been a novel steps introduced in this research. Afterwards, different performance evaluation metrics were measured to show accurate predictions based on the proposed algorithms. The result indicated top accuracy was received from RF classifier, approximately 99.41% with the integration of LASSO. Finally, a comprehensive comparison was carried out on Wisconsin breast cancer (diagnostic) dataset (WBCD) together with some current works containing all features.
KW - Breast cancer
KW - Decision tree
KW - SVM
UR - http://www.scopus.com/inward/record.url?scp=85099595575&partnerID=8YFLogxK
U2 - 10.11591/ijece.v11i3.pp2631-2639
DO - 10.11591/ijece.v11i3.pp2631-2639
M3 - Article
AN - SCOPUS:85099595575
VL - 11
SP - 2632
EP - 2640
JO - International Journal of Electrical and Computer Engineering
JF - International Journal of Electrical and Computer Engineering
SN - 2088-8708
IS - 3
ER -