TY - JOUR
T1 - Spatial distribution and machine learning prediction of sexually transmitted infections and associated factors among sexually active men and women in Ethiopia, evidence from EDHS 2016
AU - Kassaw, Abdul-Aziz Kebede
AU - Yilma, Tesfahun Melese
AU - Sebastian, Yakub
AU - Birhanu, Abraham Yeneneh
AU - Melaku, Mequannent Sharew
AU - Jemal, Sebwedin Surur
N1 - Funding Information:
The authors are grateful to the University of Gondar College of medicine and health science's ethical review board for the approval of ethical clearance and our kindly thanks go to MEASURE DHS program and ICF international which granted us the permission to use EDHS data. Finally, we the authors thank all our friends for their valuable advice and help.
Publisher Copyright:
© 2023, The Author(s).
PY - 2023/1/23
Y1 - 2023/1/23
N2 - Introduction: Sexually transmitted infections (STIs) are the major public health problem globally, affecting millions of people every day. The burden is high in the Sub-Saharan region, including Ethiopia. Besides, there is little evidence on the distribution of STIs across Ethiopian regions. Hence, having a better understanding of the infections is of great importance to lessen their burden on society. Therefore, this article aimed to assess predictors of STIs using machine learning techniques and their geographic distribution across Ethiopian regions. Assessing the predictors of STIs and their spatial distribution could help policymakers to understand the problems better and design interventions accordingly. Methods: A community-based cross-sectional study was conducted from January 18, 2016, to June 27, 2016, using the 2016 Ethiopian Demography and Health Survey (EDHS) dataset. We applied spatial autocorrelation analysis using Global Moran’s I statistics to detect latent STI clusters. Spatial scan statics was done to identify local significant clusters based on the Bernoulli model using the SaTScan™ for spatial distribution and Supervised machine learning models such as C5.0 Decision tree, Random Forest, Support Vector Machine, Naïve Bayes, and Logistic regression were applied to the 2016 EDHS dataset for STI prediction and their performances were analyzed. Association rules were done using an unsupervised machine learning algorithm. Results: The spatial distribution of STI in Ethiopia was clustered across the country with a global Moran’s index = 0.06 and p value = 0.04. The Random Forest algorithm was best for STI prediction with 69.48% balanced accuracy and 68.50% area under the curve. The random forest model showed that region, wealth, age category, educational level, age at first sex, working status, marital status, media access, alcohol drinking, chat chewing, and sex of the respondent were the top 11 predictors of STI in Ethiopia. Conclusion: Applying random forest machine learning algorithm for STI prediction in Ethiopia is the proposed model to identify the predictors of STIs.
AB - Introduction: Sexually transmitted infections (STIs) are the major public health problem globally, affecting millions of people every day. The burden is high in the Sub-Saharan region, including Ethiopia. Besides, there is little evidence on the distribution of STIs across Ethiopian regions. Hence, having a better understanding of the infections is of great importance to lessen their burden on society. Therefore, this article aimed to assess predictors of STIs using machine learning techniques and their geographic distribution across Ethiopian regions. Assessing the predictors of STIs and their spatial distribution could help policymakers to understand the problems better and design interventions accordingly. Methods: A community-based cross-sectional study was conducted from January 18, 2016, to June 27, 2016, using the 2016 Ethiopian Demography and Health Survey (EDHS) dataset. We applied spatial autocorrelation analysis using Global Moran’s I statistics to detect latent STI clusters. Spatial scan statics was done to identify local significant clusters based on the Bernoulli model using the SaTScan™ for spatial distribution and Supervised machine learning models such as C5.0 Decision tree, Random Forest, Support Vector Machine, Naïve Bayes, and Logistic regression were applied to the 2016 EDHS dataset for STI prediction and their performances were analyzed. Association rules were done using an unsupervised machine learning algorithm. Results: The spatial distribution of STI in Ethiopia was clustered across the country with a global Moran’s index = 0.06 and p value = 0.04. The Random Forest algorithm was best for STI prediction with 69.48% balanced accuracy and 68.50% area under the curve. The random forest model showed that region, wealth, age category, educational level, age at first sex, working status, marital status, media access, alcohol drinking, chat chewing, and sex of the respondent were the top 11 predictors of STI in Ethiopia. Conclusion: Applying random forest machine learning algorithm for STI prediction in Ethiopia is the proposed model to identify the predictors of STIs.
KW - Ethiopia
KW - Machine learning
KW - Prediction
KW - Sexually transmitted infections
KW - Spatial distribution
UR - http://www.scopus.com/inward/record.url?scp=85146819652&partnerID=8YFLogxK
U2 - 10.1186/s12879-023-07987-6
DO - 10.1186/s12879-023-07987-6
M3 - Article
C2 - 36690950
AN - SCOPUS:85146819652
SN - 1471-2334
VL - 23
SP - 1
EP - 17
JO - BMC Infectious Diseases
JF - BMC Infectious Diseases
IS - 49
ER -