Application of gradient boosted trees to gender prediction based on motivations of masters athletes

Joe Walsh, Ian Heazlewood, Mik Climstein

Research output: Contribution to journalArticle

Abstract

Gradient boosted decision trees are statistical learning ensemble methods that iteratively refit decision tree sub-models to residuals. The aim of this research was to apply gradient boosted decision trees and investigate their ability as statistical techniques to predict gender based upon psychological constructs measuring motivations to participate in masters sports. Comparison was made between previously published research utilizing logistic regression, discriminate function analysis, radial basis functions and multilayer perceptrons with a selection of unboosted and boosted decision tree based models. The tree models selected were J48, C5.0, gradient boosted machine (GBM), XGBoost and LightGBM. The sample consisted of 3928 masters athletes (2010 males) from the World Masters Games, the largest sporting event in the world (by participant numbers). The efficacy of tree based models for prediction in this environment was established with even baseline older implementations, giving higher prediction accuracy than any methods used in prior research. The highest predictive accuracy was achieved using GBM (0.7134), exceeding accuracies of models using XGBoost (0.7012) or LightGBM (0.6904). These two recent implementations of boosting may have given lower predictive accuracy than GBM due to the high dimensionality relative to the number of cases in the data.

Original languageEnglish
Pages (from-to)235-252
Number of pages18
JournalModel Assisted Statistics and Applications
Volume13
Issue number3
DOIs
Publication statusPublished - 1 Aug 2018

Fingerprint Dive into the research topics of 'Application of gradient boosted trees to gender prediction based on motivations of masters athletes'. Together they form a unique fingerprint.

  • Cite this