Analysis of erroneous data entries in paper based and electronic data collection

Benedikt Ley, Komal Raj Rijal, Jutta Marfurt, Naba Raj Adhikari, Megha Raj Banjara, Upendra Thapa Shrestha, Kamala Thriemer, Ric N. Price, Prakash Ghimire

    Research output: Contribution to journalArticleResearchpeer-review

    3 Downloads (Pure)

    Abstract

    Objective: Electronic data collection (EDC) has become a suitable alternative to paper based data collection (PBDC) in biomedical research even in resource poor settings. During a survey in Nepal, data were collected using both systems and data entry errors compared between both methods. Collected data were checked for completeness, values outside of realistic ranges, internal logic and date variables for reasonable time frames. Variables were grouped into 5 categories and the number of discordant entries were compared between both systems, overall and per variable category.

    Results: Data from 52 variables collected from 358 participants were available. Discrepancies between both data sets were found in 12.6% of all entries (2352/18,616). Differences between data points were identified in 18.0% (643/3580) of continuous variables, 15.8% of time variables (113/716), 13.0% of date variables (140/1074), 12.0% of text variables (86/716), and 10.9% of categorical variables (1370/12,530). Overall 64% (1499/2352) of all discrepancies were due to data omissions, 76.6% (1148/1499) of missing entries were among categorical data. Omissions in PBDC (n = 1002) were twice as frequent as in EDC (n = 497, p < 0.001). Data omissions, specifically among categorical variables were identified as the greatest source of error. If designed accordingly, EDC can address this short fall effectively.

    Original languageEnglish
    Article number537
    Pages (from-to)1-6
    Number of pages6
    JournalBMC Research Notes
    Volume12
    Issue number1
    DOIs
    Publication statusPublished - 22 Aug 2019

    Fingerprint

    Data acquisition
    Nepal
    Information Systems
    Biomedical Research
    Research Design

    Cite this

    Ley, B., Rijal, K. R., Marfurt, J., Adhikari, N. R., Banjara, M. R., Shrestha, U. T., ... Ghimire, P. (2019). Analysis of erroneous data entries in paper based and electronic data collection. BMC Research Notes, 12(1), 1-6. [537]. https://doi.org/10.1186/s13104-019-4574-8
    Ley, Benedikt ; Rijal, Komal Raj ; Marfurt, Jutta ; Adhikari, Naba Raj ; Banjara, Megha Raj ; Shrestha, Upendra Thapa ; Thriemer, Kamala ; Price, Ric N. ; Ghimire, Prakash. / Analysis of erroneous data entries in paper based and electronic data collection. In: BMC Research Notes. 2019 ; Vol. 12, No. 1. pp. 1-6.
    @article{a3fe84d561fe4996b15cb745a18922f1,
    title = "Analysis of erroneous data entries in paper based and electronic data collection",
    abstract = "Objective: Electronic data collection (EDC) has become a suitable alternative to paper based data collection (PBDC) in biomedical research even in resource poor settings. During a survey in Nepal, data were collected using both systems and data entry errors compared between both methods. Collected data were checked for completeness, values outside of realistic ranges, internal logic and date variables for reasonable time frames. Variables were grouped into 5 categories and the number of discordant entries were compared between both systems, overall and per variable category. Results: Data from 52 variables collected from 358 participants were available. Discrepancies between both data sets were found in 12.6{\%} of all entries (2352/18,616). Differences between data points were identified in 18.0{\%} (643/3580) of continuous variables, 15.8{\%} of time variables (113/716), 13.0{\%} of date variables (140/1074), 12.0{\%} of text variables (86/716), and 10.9{\%} of categorical variables (1370/12,530). Overall 64{\%} (1499/2352) of all discrepancies were due to data omissions, 76.6{\%} (1148/1499) of missing entries were among categorical data. Omissions in PBDC (n = 1002) were twice as frequent as in EDC (n = 497, p < 0.001). Data omissions, specifically among categorical variables were identified as the greatest source of error. If designed accordingly, EDC can address this short fall effectively.",
    keywords = "AKVO, Electronic data entry, Epidata, Paper based data entry",
    author = "Benedikt Ley and Rijal, {Komal Raj} and Jutta Marfurt and Adhikari, {Naba Raj} and Banjara, {Megha Raj} and Shrestha, {Upendra Thapa} and Kamala Thriemer and Price, {Ric N.} and Prakash Ghimire",
    year = "2019",
    month = "8",
    day = "22",
    doi = "10.1186/s13104-019-4574-8",
    language = "English",
    volume = "12",
    pages = "1--6",
    journal = "BMC Research Notes",
    issn = "1756-0500",
    publisher = "BioMed Central",
    number = "1",

    }

    Analysis of erroneous data entries in paper based and electronic data collection. / Ley, Benedikt; Rijal, Komal Raj; Marfurt, Jutta; Adhikari, Naba Raj; Banjara, Megha Raj; Shrestha, Upendra Thapa; Thriemer, Kamala; Price, Ric N.; Ghimire, Prakash.

    In: BMC Research Notes, Vol. 12, No. 1, 537, 22.08.2019, p. 1-6.

    Research output: Contribution to journalArticleResearchpeer-review

    TY - JOUR

    T1 - Analysis of erroneous data entries in paper based and electronic data collection

    AU - Ley, Benedikt

    AU - Rijal, Komal Raj

    AU - Marfurt, Jutta

    AU - Adhikari, Naba Raj

    AU - Banjara, Megha Raj

    AU - Shrestha, Upendra Thapa

    AU - Thriemer, Kamala

    AU - Price, Ric N.

    AU - Ghimire, Prakash

    PY - 2019/8/22

    Y1 - 2019/8/22

    N2 - Objective: Electronic data collection (EDC) has become a suitable alternative to paper based data collection (PBDC) in biomedical research even in resource poor settings. During a survey in Nepal, data were collected using both systems and data entry errors compared between both methods. Collected data were checked for completeness, values outside of realistic ranges, internal logic and date variables for reasonable time frames. Variables were grouped into 5 categories and the number of discordant entries were compared between both systems, overall and per variable category. Results: Data from 52 variables collected from 358 participants were available. Discrepancies between both data sets were found in 12.6% of all entries (2352/18,616). Differences between data points were identified in 18.0% (643/3580) of continuous variables, 15.8% of time variables (113/716), 13.0% of date variables (140/1074), 12.0% of text variables (86/716), and 10.9% of categorical variables (1370/12,530). Overall 64% (1499/2352) of all discrepancies were due to data omissions, 76.6% (1148/1499) of missing entries were among categorical data. Omissions in PBDC (n = 1002) were twice as frequent as in EDC (n = 497, p < 0.001). Data omissions, specifically among categorical variables were identified as the greatest source of error. If designed accordingly, EDC can address this short fall effectively.

    AB - Objective: Electronic data collection (EDC) has become a suitable alternative to paper based data collection (PBDC) in biomedical research even in resource poor settings. During a survey in Nepal, data were collected using both systems and data entry errors compared between both methods. Collected data were checked for completeness, values outside of realistic ranges, internal logic and date variables for reasonable time frames. Variables were grouped into 5 categories and the number of discordant entries were compared between both systems, overall and per variable category. Results: Data from 52 variables collected from 358 participants were available. Discrepancies between both data sets were found in 12.6% of all entries (2352/18,616). Differences between data points were identified in 18.0% (643/3580) of continuous variables, 15.8% of time variables (113/716), 13.0% of date variables (140/1074), 12.0% of text variables (86/716), and 10.9% of categorical variables (1370/12,530). Overall 64% (1499/2352) of all discrepancies were due to data omissions, 76.6% (1148/1499) of missing entries were among categorical data. Omissions in PBDC (n = 1002) were twice as frequent as in EDC (n = 497, p < 0.001). Data omissions, specifically among categorical variables were identified as the greatest source of error. If designed accordingly, EDC can address this short fall effectively.

    KW - AKVO

    KW - Electronic data entry

    KW - Epidata

    KW - Paper based data entry

    UR - http://www.scopus.com/inward/record.url?scp=85071230377&partnerID=8YFLogxK

    U2 - 10.1186/s13104-019-4574-8

    DO - 10.1186/s13104-019-4574-8

    M3 - Article

    VL - 12

    SP - 1

    EP - 6

    JO - BMC Research Notes

    JF - BMC Research Notes

    SN - 1756-0500

    IS - 1

    M1 - 537

    ER -