Application of gradient boosted trees to gender prediction based on motivations of masters athletes

Joe Walsh, Ian Heazlewood, Mik Climstein

    Research output: Contribution to journalArticlepeer-review


    Gradient boosted decision trees are statistical learning ensemble methods that iteratively refit decision tree sub-models to residuals. The aim of this research was to apply gradient boosted decision trees and investigate their ability as statistical techniques to predict gender based upon psychological constructs measuring motivations to participate in masters sports. Comparison was made between previously published research utilizing logistic regression, discriminate function analysis, radial basis functions and multilayer perceptrons with a selection of unboosted and boosted decision tree based models. The tree models selected were J48, C5.0, gradient boosted machine (GBM), XGBoost and LightGBM. The sample consisted of 3928 masters athletes (2010 males) from the World Masters Games, the largest sporting event in the world (by participant numbers). The efficacy of tree based models for prediction in this environment was established with even baseline older implementations, giving higher prediction accuracy than any methods used in prior research. The highest predictive accuracy was achieved using GBM (0.7134), exceeding accuracies of models using XGBoost (0.7012) or LightGBM (0.6904). These two recent implementations of boosting may have given lower predictive accuracy than GBM due to the high dimensionality relative to the number of cases in the data.

    Original languageEnglish
    Pages (from-to)235-252
    Number of pages18
    JournalModel Assisted Statistics and Applications
    Issue number3
    Publication statusPublished - 1 Aug 2018


    Dive into the research topics of 'Application of gradient boosted trees to gender prediction based on motivations of masters athletes'. Together they form a unique fingerprint.

    Cite this