TY - JOUR
T1 - Analysis of erroneous data entries in paper based and electronic data collection
AU - Ley, Benedikt
AU - Rijal, Komal Raj
AU - Marfurt, Jutta
AU - Adhikari, Naba Raj
AU - Banjara, Megha Raj
AU - Shrestha, Upendra Thapa
AU - Thriemer, Kamala
AU - Price, Ric N.
AU - Ghimire, Prakash
PY - 2019/8/22
Y1 - 2019/8/22
N2 - Objective: Electronic data collection (EDC) has become a suitable alternative to paper based data collection (PBDC) in biomedical research even in resource poor settings. During a survey in Nepal, data were collected using both systems and data entry errors compared between both methods. Collected data were checked for completeness, values outside of realistic ranges, internal logic and date variables for reasonable time frames. Variables were grouped into 5 categories and the number of discordant entries were compared between both systems, overall and per variable category. Results: Data from 52 variables collected from 358 participants were available. Discrepancies between both data sets were found in 12.6% of all entries (2352/18,616). Differences between data points were identified in 18.0% (643/3580) of continuous variables, 15.8% of time variables (113/716), 13.0% of date variables (140/1074), 12.0% of text variables (86/716), and 10.9% of categorical variables (1370/12,530). Overall 64% (1499/2352) of all discrepancies were due to data omissions, 76.6% (1148/1499) of missing entries were among categorical data. Omissions in PBDC (n = 1002) were twice as frequent as in EDC (n = 497, p < 0.001). Data omissions, specifically among categorical variables were identified as the greatest source of error. If designed accordingly, EDC can address this short fall effectively.
AB - Objective: Electronic data collection (EDC) has become a suitable alternative to paper based data collection (PBDC) in biomedical research even in resource poor settings. During a survey in Nepal, data were collected using both systems and data entry errors compared between both methods. Collected data were checked for completeness, values outside of realistic ranges, internal logic and date variables for reasonable time frames. Variables were grouped into 5 categories and the number of discordant entries were compared between both systems, overall and per variable category. Results: Data from 52 variables collected from 358 participants were available. Discrepancies between both data sets were found in 12.6% of all entries (2352/18,616). Differences between data points were identified in 18.0% (643/3580) of continuous variables, 15.8% of time variables (113/716), 13.0% of date variables (140/1074), 12.0% of text variables (86/716), and 10.9% of categorical variables (1370/12,530). Overall 64% (1499/2352) of all discrepancies were due to data omissions, 76.6% (1148/1499) of missing entries were among categorical data. Omissions in PBDC (n = 1002) were twice as frequent as in EDC (n = 497, p < 0.001). Data omissions, specifically among categorical variables were identified as the greatest source of error. If designed accordingly, EDC can address this short fall effectively.
KW - AKVO
KW - Electronic data entry
KW - Epidata
KW - Paper based data entry
UR - http://www.scopus.com/inward/record.url?scp=85071230377&partnerID=8YFLogxK
U2 - 10.1186/s13104-019-4574-8
DO - 10.1186/s13104-019-4574-8
M3 - Article
C2 - 31439025
AN - SCOPUS:85071230377
SN - 1756-0500
VL - 12
SP - 1
EP - 6
JO - BMC Research Notes
JF - BMC Research Notes
IS - 1
M1 - 537
ER -