Large biodiversity datasets conform to Benford's law: Implications for assessing sampling heterogeneity

Judit K. Szabo, Lucas Rodriguez Forti, Corey T. Callaghan

    Research output: Contribution to journalArticlepeer-review

    12 Citations (Scopus)
    20 Downloads (Pure)

    Abstract

    Inadequate sampling can cause biased estimates of species diversity, as species occurrence generally follows a log-normal distribution with a long tail. Understanding this sampling bias is fundamental to inform biodiversity conservation actions. However, currently available tests to assess data quality, such as fitting species abundance distribution (SAD) models and rarefaction curves are computationally costly and can still lead to erroneous conclusions. We evaluated Benford's law (first digit distribution) as a complementary method to assess data heterogeneity and survey coverage in large biodiversity datasets, including eBird data for 157 countries and three non-avian GBIF datasets. We also tested conformity to Benford's law of four simulated communities with different SAD models and four corrupted datasets with log-normal SAD. Finally, we evaluated the effect of including rare species in three datasets on the conformity to Benford's law and also compared Benford fit to the results of traditional methods to estimate survey completeness in seven datasets. Species-rich datasets with a large number of observations tended to obtain a good fit. Benford conformity can be a simple and sensitive measure of sampling evenness, complementing traditional methods to assess quality data in large-scale studies. Benford's test can reflect species abundance heterogeneity, especially in log-normally distributed data, but was not ideal to evaluate surveys completeness, as its results diverged from those of traditional methods. As the contribution of citizen science continues to increase in biodiversity monitoring, this fast and efficient method can play a critical role to assess the quality of datasets.

    Original languageEnglish
    Article number109982
    Pages (from-to)1-13
    Number of pages13
    JournalBiological Conservation
    Volume280
    DOIs
    Publication statusPublished - Apr 2023

    Bibliographical note

    Funding Information:
    LRF received a fellowship from the Coordination for the Improvement of Higher Education Personnel (CAPES - Finance Code 001 ). CTC was supported by a Marie Skłodowska-Curie Individual Fellowship (No 891052 ). The authors are indebted to the two anonymous reviewers and the associate editor for their insightful comments that have greatly improved the article.

    Publisher Copyright:
    © 2023 Elsevier Ltd

    Fingerprint

    Dive into the research topics of 'Large biodiversity datasets conform to Benford's law: Implications for assessing sampling heterogeneity'. Together they form a unique fingerprint.

    Cite this