Atrous spatial pyramid pooling with swin transformer model for classification of gastrointestinal tract diseases from videos with enhanced explainability

Arefin Ittesafun Abian, Mohaimenul Azam Khan Raiaan, Mirjam Jonkman, Sheikh Mohammed Shariful Islam, Sami Azam

Research output: Contribution to journalArticlepeer-review

1 Downloads (Pure)

Abstract

Accurate and early identification of gastrointestinal (GI) lesions is crucial for treating and preventing GI diseases, including cancer. Automated computer-aided diagnosis methods can assist physicians in early and accurate detection. Video classification of GI endoscopic videos is challenging due to the complexity and variability of visual data. This research proposes a novel method for classifying GI diseases using endoscopic videos. Leveraging the public HyperKvasir dataset, we applied preprocessing algorithms to enhance GI frames by removing noise and artifacts with morphological opening and closing techniques, ensuring high-quality visuals. We addressed dataset imbalance by proposing a novel algorithm. Our hybrid model, Atrous Spatial Pyramid Pooling with Swin Transformer (ASPPST), combines advanced Convolutional Neural Networks and the Swin Transformer to classify GI videos into 30 distinct classes. We incorporated Gradient-Class Activation Mapping (Grad-CAM) in ASPPST's final layer to improve model explainability. The proposed model achieved 97.49 % accuracy in classifying 30 GI diseases, outperforming other transfer learning models and transformers by 8.04 % and 3.99 %, respectively. It also demonstrated a precision of 97.80 %, recall of 97.77 %, and an F1 score of 97.75 %, showcasing robustness across metrics. The high accuracy of ASPPST makes it suitable for real-world use, delivering fewer errors and more precise results in GI endoscopy video classification. Our approach advances artificial intelligence (AI) in computer vision and deep learning for biomedical engineering applications. Grad-CAM integration enhances transparency, boosting clinician trust and adoption of AI tools in diagnostic workflows.

Original languageEnglish
Article number110656
Pages (from-to)1-18
Number of pages18
JournalEngineering Applications of Artificial Intelligence
Volume150
DOIs
Publication statusPublished - 15 Jun 2025

Bibliographical note

Publisher Copyright:
© 2025 The Authors

Fingerprint

Dive into the research topics of 'Atrous spatial pyramid pooling with swin transformer model for classification of gastrointestinal tract diseases from videos with enhanced explainability'. Together they form a unique fingerprint.

Cite this