An effective ensemble machine learning approach to classify breast cancer based on feature selection and lesion segmentation using pre-processed mammograms

A. K.M.Rakibul Haque Rafid, Sami Azam, Sidratul Montaha, Asif Karim, Kayes Uddin Fahim, Md Zahid Hasan

    Research output: Contribution to journalArticlepeer-review

    20 Citations (Scopus)
    86 Downloads (Pure)

    Abstract

    Background: Breast cancer, behind skin cancer, is the second most frequent malignancy among women, initiated by an unregulated cell division in breast tissues. Although early mammogram screening and treatment result in decreased mortality, differentiating cancer cells from surrounding tissues are often fallible, resulting in fallacious diagnosis. Method: The mammography dataset is used to categorize breast cancer into four classes with low computational complexity, introducing a feature extraction-based approach with machine learning (ML) algorithms. After artefact removal and the preprocessing of the mammograms, the dataset is augmented with seven augmentation techniques. The region of interest (ROI) is extracted by employing several algorithms including a dynamic thresholding method. Sixteen geometrical features are extracted from the
    ROI while eleven ML algorithms are investigated with these features. Three ensemble models are generated from these ML models employing the stacking method where the first ensemble model is built by stacking ML models with an accuracy of over 90% and the accuracy thresholds for generating the rest of the ensemble models are >95% and >96. Five feature selection methods with fourteen configurations are applied to notch up the performance. Results: The Random Forest Importance algorithm, with a threshold of 0.045, produces 10 features that acquired the highest performance with 98.05% test accuracy by stacking Random Forest and XGB classifier, having a higher than >96%
    accuracy. Furthermore, with K-fold cross-validation, consistent performance is observed across all K values ranging from 3–30. Moreover, the proposed strategy combining image processing, feature extraction and ML has a proven high accuracy in classifying breast cancer.
    Original languageEnglish
    Article number1654
    Pages (from-to)1-26
    Number of pages26
    JournalBiology
    Volume11
    Issue number11
    DOIs
    Publication statusPublished - 11 Nov 2022

    Bibliographical note

    Publisher Copyright:
    © 2022 by the authors.

    Fingerprint

    Dive into the research topics of 'An effective ensemble machine learning approach to classify breast cancer based on feature selection and lesion segmentation using pre-processed mammograms'. Together they form a unique fingerprint.

    Cite this