One of the most critical issues of the mortality rate in the medical field in current times is breast cancer. Nowadays, a large number of men and women are facing cancer-related deaths due to the lack of early diagnosis systems and proper treatment per year. To tackle the issue, various data mining approaches have been analyzed to build an effective model that helps to identify the different stages of deadly cancers. The study successfully proposes an early cancer disease model based on five different supervised algorithms such as logistic regression (henceforth LR), decision tree (henceforth DT), random forest (henceforth RF), Support vector machine (henceforth SVM), and K-nearest neighbor (henceforth KNN). After an appropriate preprocessing of the dataset, least absolute shrinkage and selection operator (LASSO) was used for feature selection (FS) using a 10-fold cross-validation (CV) approach. Employing LASSO with 10-fold cross-validation has been a novel steps introduced in this research. Afterwards, different performance evaluation metrics were measured to show accurate predictions based on the proposed algorithms. The result indicated top accuracy was received from RF classifier, approximately 99.41% with the integration of LASSO. Finally, a comprehensive comparison was carried out on Wisconsin breast cancer (diagnostic) dataset (WBCD) together with some current works containing all features.
|Number of pages
|International Journal of Electrical and Computer Engineering
|Published - Jun 2021