minSNPs: An R package for the derivation of resolution-optimised SNP sets from microbial genomic data

Kian Soon Hoon, Deborah C. Holt, Sarah Auburn, Peter Shaw, Philip M. Giffard

Research output: Contribution to journalArticlepeer-review

85 Downloads (Pure)

Abstract

Here, we present the R package, minSNPs. This is a re-development of a previously described Java application named Minimum SNPs. MinSNPs assembles resolution-optimised sets of single nucleotide polymorphisms (SNPs) from sequence alignments such as genome-wide orthologous SNP matrices. MinSNPs can derive sets of SNPs optimised for discriminating any user-defined combination of sequences from all others. Alternatively, SNP sets may be optimised to determine all sequences from all other sequences, i.e., to maximise diversity. MinSNPs encompasses functions that facilitate rapid and flexible SNP mining, and clear and comprehensive presentation of the results. The minSNPs' running time scales in a linear fashion with input data volume and the numbers of SNPs and SNPs sets specified in the output. MinSNPs was tested using a previously reported orthologous SNP matrix of Staphylococcus aureus and an orthologous SNP matrix of 3,279 genomes with 164,335 SNPs assembled from four S. aureus short read genomic data sets. MinSNPs was shown to be effective for deriving discriminatory SNP sets for potential surveillance targets and in identifying SNP sets optimised to discriminate isolates from different clonal complexes. MinSNPs was also tested with a large Plasmodium vivax orthologous SNP matrix. A set of five SNPs was derived that reliably indicated the country of origin within three south-east Asian countries. In summary, we report the capacity to assemble comprehensive SNP matrices that effectively capture microbial genomic diversity, and to rapidly and flexibly mine these entities for optimised marker sets.

Original languageEnglish
Article numbere15339
Pages (from-to)1-19
JournalPEERJ
Volume11
DOIs
Publication statusPublished - 2023

Bibliographical note

Funding Information:
Kian Soon Hoon (as student) and Philip Giffard, Deborah Holt and Sarah Auburn (as the supervisory team) are recipients of a Charles Darwin University ‘‘Charles Darwin International PhD Scholarship’’ to pursue this project. The early stages of this project were supported by a Charles Darwin University Institute of Advanced Studies Rainmaker Startup Grant, (ID 18916864), awarded to Philip Giffard and Peter Shaw. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Funding Information:
The authors thank Mariana Barnes and Tegan Harris from the Menzies School of Health Research for assistance with the installation of minSNPs onto the Charles Darwin University high performance computer (HPC) cluster and also with the associated software documentation tasks. The authors thank Kamil Braima, Angela Rumaseb and Aiden Webb for testing the documentation. Kian Soon Hoon (as student) and Philip Giffard, Deborah Holt and Sarah Auburn (as the supervisory team) are recipients of a Charles Darwin University ''Charles Darwin International PhD Scholarship'' to pursue this project. The early stages of this project were supported by a Charles Darwin University Institute of Advanced Studies Rainmaker Startup Grant, (ID 18916864), awarded to Philip Giffard and Peter Shaw. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Publisher Copyright:
Copyright © 2023 Hoon et al.

Fingerprint

Dive into the research topics of 'minSNPs: An R package for the derivation of resolution-optimised SNP sets from microbial genomic data'. Together they form a unique fingerprint.

Cite this