Quantified Cluster Analysis techniques for IR Spectroscopy

  • Simon Crase

    Student thesis: Doctor of Philosophy (PhD) - CDU


    Improvised explosive devices (IEDs) and the terrorist networks that employ them pose a threat in many world regions. An essential task towards disrupting these terrorist networks is identifying the relationships and linkages between the individuals that form these networks. This research explores the novel concept of applying machine learning to analyse chemical test results from homemade explosives, with the aim of identifying matches and relationships between the samples (and hence, the bombmakers who made them). Due to the unknown (unlabelled) and constantly evolving nature of many of the secondary ingredients in these homemade explosives, we apply unsupervised techniques in the form of cluster analysis to match the spectroscopy samples.

    As per our literature survey, it is rare for cluster analysis to be applied to spectroscopy in a quantified manner to match samples. Most mature chemometric techniques are for supervised learning where labelled datasets are available. Hence, we pursue cluster analysis techniques from the machine learning domain but synthesise them to focus on IR spectroscopy data characteristics.

    In achieving this, we present an analysis model tailored for the quantified cluster analysis of IR spectroscopy. Chemometric preprocessing techniques and unsupervised cluster analysis algorithms are applied, along with quantitative metrics to evaluate the data’s tendency to cluster, predict the number of clusters, and assess the quality of clustering outcomes. To improve on initial results, we then: 

    . present a framework for selecting appropriate cluster analysis algorithms,

    . evaluate spectral preprocessing techniques effect on cluster analysis,

    . investigate and present a novel feature selection technique based on clusterability metrics,

    . develop methods for matching samples at various levels of a clustering hierarchy,

    . and adapt multi-sensor fusion to cluster analysis.

    These techniques are demonstrated on three spectroscopy datasets of homemade explosives, and multiple public datasets from food and industry to ensure the wider applicability of the techniques.
    Date of Award2022
    Original languageEnglish
    SupervisorSuresh Thennadil (Supervisor) & Benjamin Hall (Supervisor)

    Cite this