ON DATA QUALITY ASSURANCE AND CONFLATION ENTANGLEMENT IN CROWDSOURCING FOR ENVIRONMENTAL STUDIES
Keywords: Data Curation, Data Quality, Data Fusion, Data Conflation, Citizen Science, Crowdsourcing
Abstract. Volunteer geographical information (VGI) either in the context of citizen science, active crowdsourcing and even passive crowdsourcing has been proven useful in various societal domains such as natural hazards, health status, disease epidemic and biological monitoring. Nonetheless, the variable degrees or unknown quality due to the crowdsourcing settings are still an obstacle for fully integrating these data sources in environmental studies and potentially in policy making. The data curation process in which a quality assurance (QA) is needed is often driven by the direct usability of the data collected within a data conflation process or data fusion (DCDF) combining the crowdsourced data into one view using potentially other data sources as well. Using two examples, namely land cover validation and inundation extent estimation, this paper discusses the close links between QA and DCDF in order to determine whether a disentanglement can be beneficial or not to a better understanding of the data curation process and to its methodology with respect to crowdsourcing data. Far from rejecting the usability quality criterion, the paper advocates for a decoupling of the QA process and the DCDF step as much as possible but still in integrating them within an approach analogous to a Bayesian paradigm.