TY - JOUR
T1 - Atrous spatial pyramid pooling with swin transformer model for classification of gastrointestinal tract diseases from videos with enhanced explainability
AU - Abian, Arefin Ittesafun
AU - Raiaan, Mohaimenul Azam Khan
AU - Jonkman, Mirjam
AU - Islam, Sheikh Mohammed Shariful
AU - Azam, Sami
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2025/6/15
Y1 - 2025/6/15
N2 - Accurate and early identification of gastrointestinal (GI) lesions is crucial for treating and preventing GI diseases, including cancer. Automated computer-aided diagnosis methods can assist physicians in early and accurate detection. Video classification of GI endoscopic videos is challenging due to the complexity and variability of visual data. This research proposes a novel method for classifying GI diseases using endoscopic videos. Leveraging the public HyperKvasir dataset, we applied preprocessing algorithms to enhance GI frames by removing noise and artifacts with morphological opening and closing techniques, ensuring high-quality visuals. We addressed dataset imbalance by proposing a novel algorithm. Our hybrid model, Atrous Spatial Pyramid Pooling with Swin Transformer (ASPPST), combines advanced Convolutional Neural Networks and the Swin Transformer to classify GI videos into 30 distinct classes. We incorporated Gradient-Class Activation Mapping (Grad-CAM) in ASPPST's final layer to improve model explainability. The proposed model achieved 97.49 % accuracy in classifying 30 GI diseases, outperforming other transfer learning models and transformers by 8.04 % and 3.99 %, respectively. It also demonstrated a precision of 97.80 %, recall of 97.77 %, and an F1 score of 97.75 %, showcasing robustness across metrics. The high accuracy of ASPPST makes it suitable for real-world use, delivering fewer errors and more precise results in GI endoscopy video classification. Our approach advances artificial intelligence (AI) in computer vision and deep learning for biomedical engineering applications. Grad-CAM integration enhances transparency, boosting clinician trust and adoption of AI tools in diagnostic workflows.
AB - Accurate and early identification of gastrointestinal (GI) lesions is crucial for treating and preventing GI diseases, including cancer. Automated computer-aided diagnosis methods can assist physicians in early and accurate detection. Video classification of GI endoscopic videos is challenging due to the complexity and variability of visual data. This research proposes a novel method for classifying GI diseases using endoscopic videos. Leveraging the public HyperKvasir dataset, we applied preprocessing algorithms to enhance GI frames by removing noise and artifacts with morphological opening and closing techniques, ensuring high-quality visuals. We addressed dataset imbalance by proposing a novel algorithm. Our hybrid model, Atrous Spatial Pyramid Pooling with Swin Transformer (ASPPST), combines advanced Convolutional Neural Networks and the Swin Transformer to classify GI videos into 30 distinct classes. We incorporated Gradient-Class Activation Mapping (Grad-CAM) in ASPPST's final layer to improve model explainability. The proposed model achieved 97.49 % accuracy in classifying 30 GI diseases, outperforming other transfer learning models and transformers by 8.04 % and 3.99 %, respectively. It also demonstrated a precision of 97.80 %, recall of 97.77 %, and an F1 score of 97.75 %, showcasing robustness across metrics. The high accuracy of ASPPST makes it suitable for real-world use, delivering fewer errors and more precise results in GI endoscopy video classification. Our approach advances artificial intelligence (AI) in computer vision and deep learning for biomedical engineering applications. Grad-CAM integration enhances transparency, boosting clinician trust and adoption of AI tools in diagnostic workflows.
KW - Computer vision
KW - Gastrointestinal
KW - Interpretability
KW - Swin transformer
KW - Video classification
UR - http://www.scopus.com/inward/record.url?scp=105000581525&partnerID=8YFLogxK
U2 - 10.1016/j.engappai.2025.110656
DO - 10.1016/j.engappai.2025.110656
M3 - Article
AN - SCOPUS:105000581525
SN - 0952-1976
VL - 150
SP - 1
EP - 18
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 110656
ER -