AN ASSESSMENT OF CITIZEN CONTRIBUTED GROUND REFERENCE DATA FOR LAND COVER MAP ACCURACY ASSESSMENT

It is now widely accepted that an accuracy assessment should be part of a thematic mapping programme. Authoritative good or best practices for accuracy assessment have been defined but are often impractical to implement. Key reasons for this situation are linked to the ground reference data used in the accuracy assessment. Typically, it is a challenge to acquire a large sample of high quality reference cases in accordance to desired sampling designs specified as conforming to good practice and the data collected are normally to some degree imperfect limiting their value to an accuracy assessment which implicitly assumes the use of a gold standard reference. Citizen sensors have great potential to aid aspects of accuracy assessment. In particular, they may be able to act as a source of ground reference data that may, for example, reduce sample size problems but concerns with data quality remain. The relative strengths and limitations of citizen contributed data for accuracy assessment are reviewed in the context of the authoritative good practices defined for studies of land cover by remote sensing. The article will highlight some of the ways that citizen contributed data have been used in accuracy assessment as well as some of the problems that require further attention, and indicate some of the potential ways forward in the future.


INTRODUCTION
The assessment of the accuracy of thematic maps such as those depicting land cover obtained via remote sensing has evolved considerably over the last four decades (e.g.Foody, 2002;Congalton and Green, 2009).It is now widely accepted that an accuracy assessment should be part of land cover mapping programmes.This is primarily because without an accuracy assessment each map produced is simply an untested hypothesis, one of many possible representations of the world which may or may not be fit for its intended purpose (Strahler et al., 2006).This is important as it is now very simple to produce thematic maps from remote sensing.Indeed there are, for example, numerous global land cover maps available but they do differ markedly in their representation and it is sometimes difficult to know which is the most suitable one to use in an application or how to best use the set without information on accuracy (Giri et al., 2005;Jung et al., 2006;McCallum et al. 2006;Fritz and See, 2008; Critically, a map is not suited for scientific inference without a rigorous assessment of its quality, leaving the map as little more than a pretty picture (M c Roberts, 2011).
An accuracy assessment may be used to do more than simply indicate the quality of a land cover map.Critically, an accuracy assessment may also be used to add value to the land cover map.By undertaking a rigorous accuracy assessment it may, for example be possible to refine estimates of the areal extent of land cover classes that occur within the mapped region.The latter can have a major impact on, amongst other things, estimates of the magnitude and direction of land cover changes (Olofsson et al., 2013) and ecosystem services valuations (Foody, 2015).
Good practices for land cover map accuracy assessment have been established (Strahler et al., 2006;Olofsson et al., 2013Olofsson et al., , 2014)).Additionally adaptable resources have been made available to the community to facilitate rigorous accuracy assessment (e.g.Olofsson et al., 2012).However, map accuracy assessment remains a challenging task.One fundamental problem is that an accuracy assessment requires ideally a gold standard reference data set to compare against the map(s) being evaluated (Foody, 2010(Foody, , 2013)).Frequently, however, the ground reference data are flawed in relation to their quantity and quality, which can impact negatively on accuracy assessment.There is also often a negative relationship between data quantity and quality, making it difficult to acquire a large high quality data set.It may, however, be possible for citizens to help reduce some of the problems by providing reference data.
Citizens have contributed to scientific research for centuries.However, recent technological developments, notably web2.0 which facilitates the collaboration and interaction of people with each other including two way data transfers and growth of user-generated content, combined with the proliferation of inexpensive location-aware devices have led to dramatic growth in citizen sensing and participation in collaborative volunteer projects (Arsanjani et al., 2015a).Citizen science research has grown enormously in recent years, revolutionizing parts of geography and forming a key component of future research priorities in the subject (CSDGSND, 2010).
It is now possible for almost anyone anywhere in the world to provide spatially located information that may be used to inform a diverse array of research and practical applications.This has seen the recent rise of citizen sensors and the provision of volunteered geographic information (VGI; Goodchild, 2007) to add to more conventional crowdsourcing activity.These various sources of citizen contributed data can differ greatly in detail, ranging from altruistic volunteering to paid crowdsourcing.In this article, attention is focused on data that originates from citizens who are typically, but not necessarily, amateurs and acting voluntarily.Moreover, the data contributed may have been provided unintentionally and normally for little if any reward (e.g.data mined from social media etc.) or deliberately in response to a call for information from, perhaps, an authoritative mapping body.
Citizens have the potential to become a major source of reference data for accuracy assessment.Issues of data quantity and quality may remain but given the existence of authoritative best practices it should be possible to gauge how suitable citizen contributed data are for some accuracy assessment tasks.This article first outlines the authoritative good practices for ground data collection in accuracy assessment.It will highlight the limitations of authoritative data sets before considering the relative merits of citizen contributed data for accuracy assessment.The focus is on only the use of the reference data in accuracy assessment but it should be noted that citizen derived data could, of course, be used in other parts of a mapping programme (e.g. for use in training a supervised classification of remotely sensed imagery).Additionally only conventional 'global' accuracy assessment is discussed although it should be noted that the broad geographic coverage that can be provided by citizens can be used to indicate spatial variation in map quality (e.g.Comber et al., 2013).

GOOD PRACTICES FOR REFERENCE DATA ACQUISITION
Remote sensing has considerable potential for the provision of environmental information for thematic mapping applications at a range of spatial and temporal scales (Foody and Curran, 1994;Cihlar, 2000;Wulder et al., 2008).Thematic maps are typically derived from remotely sensed imagery through a digital image classification (Mather and Koch, 2011).In this type of analysis it is typically assumed that the classes are discrete and mutually exclusive as well as exhaustively defined.These assumptions are not always satisfied, often leading to negative impacts on land cover map accuracy (e.g.Foody, 1996;Foody, 2004;Rocchini et al., 2013).However, in many cases the problems can be addressed and useful representations of land cover obtained.
Frequently supervised digital image classification analyses are used to obtain land cover maps from remotely sensed imagery.
Beyond fundamental issues such as image pre-processing, such classification analyses comprise three stages: training, class allocation and testing.Issues connected with each stage can greatly impact upon the quality of the classification and hence the resulting map.Here, the focus is entirely on the reference data used in the final, testing, stage of the classification that seeks to indicate the quality of the classification, normally in terms of its accuracy.
Authoritative statements on good practice for land cover map accuracy assessment have been defined (Strahler et al., 2006;Olofsson et al., 2014).An accuracy assessment has three major components, namely the response design, the sampling design and the analysis (Stehman and Czaplewski, 1998).These apply to accuracy assessments using authoritative and/or citizen contributed data.
The response design sets out the protocol to determine if the class label depicted in the land cover map under evaluation is in agreement with the label contained in the ground reference data set.It includes issues such as the selection of the spatial unit (e.g.pixel, block or object) and the sources of information (e.g.reference data could come from field visits, inventories, aerial photograph analysis etc.).The effect of error and uncertainty should also be considered.It may, for example, be useful to have each case labelled by multiple interpreters to give a guide to the quality of the reference data and to aid the definition of agreement (e.g. should only cases for which all interpreters agree on a label be used in an accuracy assessment, should secondary labels and certainty information be used etc.).The reference labelling protocol must also be defined which may be associated with challenges linked to the minimum mapping unit.Finally, while agreement may seem a simple concept there are many issues that require careful attention.These include problems linked to the ability to correctly locate a site geographically in both the land cover map and on the ground as well as the effects of inter-rater uncertainty in labelling and semantics.Further details on this, and the other, parts of an accuracy assessment are given in the literature (e.g.Stehman and Czaplewski, 1998;Strahler et al., 2006;Olofsson et al., 2014).
As the evaluation of classification accuracy cannot normally be undertaken for the entire map it is usual to base the assessment on a sample of cases.To ensure a statistically rigorous and credible accuracy assessment it is important that the sample used for the accuracy assessment is acquired following an appropriate design.Good practice recommendations call for the assessment to be based on the use of probability sampling.A range of designs are available, with choice between them often based on the accuracy objectives and key design criteria.Popular approaches include the use of simple random, stratified, systematic and cluster sampling.For each sampling design, recommendations on key factors such as suitable sample size may also be followed to ensure the sample meets the goals of a mapping project (e.g.Stehman, 1999Stehman, , 2009Stehman, , 2012)).
As a crude summary, the required sample for an authoritative accuracy assessment can be defined following simple rules and recommendations (Stehman and Czaplewski, 1998;Strahler et al., 2006;Olofsson et al., 2014).The size of the sample, for example, may be estimated from sampling theory (Foody, 2009) or heuristics such as those that suggest at least 50 cases perclass acquired via an appropriate sample design (Congalton and Green, 2009).For example, if a simple random sampling design was to be used, the required sample size may be estimated from equation 1.
where P is a planning value for the population proportion of correctly allocated cases, h the half width of the desired confidence interval and z α/2 the critical value of the normal distribution for the two-tailed significance level α (Cochran, 1977).
The approach can be adapted to meet specific project needs.If, for example, the objective is to test the statistical significance of differences in map accuracy, perhaps in evaluating a set of different mapping approaches, the required sample size can be estimated using the same basic principles.For this, however, the probability of detecting a specified effect, which represents the minimum meaningful difference in classification accuracy, is represented by the power of the test, 1-β (Fleiss et al. 2003).
With α, 1-β and the effect size selected, the required sample size from each of the populations being compared may be estimated using equation 2.
. In this type of comparative study it is important to note that the sample size should be determined with care as sizes too small and too large can be problematic (Foody, 2009).In many instances simple random sampling is not ideal.In such cases other designs may be used and a variety of probability designs are available, notably the use of stratified, systematic and cluster sampling designs.For each design, the sample size required may be calculated and this may be optimized to meet the specific objectives of a study (e.g.Stehman, 2012).Again the basis is straightforward with, for example, the size of the sample for stratum i, n i , in a stratified random sample of fixed size n estimated using equation 3.
where N is the population size and N i the size of stratum i.This approach can be adapted to fit the specific circumstances of a study, such as variations in the cost perstratum or project objectives (Cochran, 1977;Barnett, 1991;Som, 1996;Stehman, 2012).
In the analysis stage the aim is typically to obtain rigorous and credible accuracy information.This typically draws on analysis of the error matrix or confusion matrix that shows a crosstabulation of the map and ground reference data labels for the sample of cases used.A range of quantitative measures of accuracy can be obtained from the matrix and it is important that the accuracy assessment takes into account the nature of the data used.Ideally, therefore, the error matrix, together with key information on issues such as the sample design used in its formation, should be reported in the output of an accuracy assessment.This allows other users to obtain information that they may need (e.g. for the calculation of standard errors and confidence intervals) but also because the matrix may be used to help refine estimates of key properties such as the areal extent of classes and so add value to the map (Olofsson et al., 2013).The formulae used to estimate accuracy values and their associated variances need to be selected in relation to the sample design used to acquire the data.Formulae for popular designs such as simple random, stratified random and cluster sampling are provided in Stehman and Foody (2009).
A rigorous accuracy assessment provides not only information on map quality but also means to enhance the value and usefulness of the map.The ability of an accuracy assessment to add value to a map can be illustrated with examples.In showing how a confusion matrix used for accuracy assessment can also aid accurate estimation of class areal extents Olofsson et al. (2013) provide an example focused on the estimation of the extent of deforestation in a region.In this example, a highly accurate map, its overall accuracy was ~94%, suggests that 22,353 ha of the study region had been deforested.However, adjusting for even the low levels of error present, the actual areal extent was double what the map showed, at 45,651 ha.Not only is the difference large it has important implications to the carbon budget of the region as outlined by Olofsson et al. (2013).Similarly, Foody (2015) shows how errors in a land cover classification can have a large impact on valuations of ecosystem services.For example, using the National Land Cover Data (NLCD) set for the conterminous USA, which is ~84% accurate (Wickham et al., 2013), directly in a basic transfer function approach to ecosystem services valuation provides an estimate of US$1118 billion yr -1 .Adjusting the estimate for the pattern of mis-classification evident in the confusion matrix used in the assessment of map accuracy, however, shows that the value of the ecosystem services is markedly lower, at US$600 billion yr -1 .The pattern of error and the differential value of the classes will determine the size and direction of the change in value that arises when adjusting for the effects of mis-classification bias.For example, at a global scale the value of wetlands estimated from the IGBP DISCover land cover map rises from US$1.92 trillion yr -1 to US$2.79 trillion yr -1 when adjustment is made for classification error (Foody, 2015).
Although the demands made by good practice documents may not seem onerous or problematic it is often difficult to acquire a ground reference data set in strict accordance to the authoritative good practices.Consequently, it is often impractical to follow the good practices.The sample used is often of inappropriate size and/or quality, impacting negatively on the accuracy assessment (Foody, 2009(Foody, , 2010(Foody, , 2013)).
Even if the concerns with issues such as the sampling design have been addressed satisfactorily there are still other concerns, notably those linked to the quality of the reference data.Typically the ground reference data are used in an accuracy assessment as if perfect (i.e. that they are a gold standard or ground truth).Sometimes it is recognized that the ground reference data are flawed but the analysis proceeds as if it is perfect.This can be a dangerous situation in an accuracy assessment.It is possible for even small errors in the ground reference data set to be a source of substantial error and misinterpretation in an accuracy assessment (Carlotto, 2009;Foody, 2013).For example, in a study of land cover change the effects of even very small reference data errors led to substantial mis-estimation of both classification accuracy and of the area of land undergoing change (Foody, 2013).As one example, for one simple scenario in which the area of a rare land cover change that actually occurs in 0.5% of the study area will be exaggerated by ~40 times if the ground data and land cover map used have an accuracy of 80% and 70% respectively.Fortunately, however, it is sometimes possible to address the effects of ground reference data error and obtain accurate estimates of map accuracy and class extent if the ground reference data error is well-known and characterized (Foody, 2010).

POTENTIAL OF CITIZEN CONTRIBUTED DATA
Some of the problems commonly encountered with ground reference data sets, even from highly authoritative sources, can potentially be addressed in a variety of way.At one extreme the effects of ground data error can, as noted above, be addressed directly if the error is well known and characterized.Alternatively, the problems of design-based accuracy assessment can, to some extent, be addressed by adopting model-based approaches.Standard components of design-based accuracy assessment, such as the confusion matrix and measures of overall accuracy, are not encountered with model-based inference.However, the latter can be useful in relation to issues such as area estimation that can be used in some accuracy assessments (M c Roberts, 2011).Additionally, as will be noted later in this section, modelling approaches can also provide a means to analyze imperfect data set such as those contributed by volunteers.Here, the main focus of attention is on how citizens could contribute to standard design-based accuracy assessment of land cover maps.
A key attraction of citizen sensing for ground reference data collection is their ability to contribute data at a range of spatial and temporal scales.Thus, citizen sensors could reduce or even possibly remove problems linked to ground data sample size, location and timing relative to image acquisition.Additionally, the data can arise in a range of different ways.Data could be contributed passively by exploiting information provided unintentionally, or actively in response to a request from a body that could steer the contributions to meet particular needs.An overview of the use of volunteered geographic information arising from citizen sensors in accuracy assessment is provided by Fonte et al. (2015a).
Although it may seem odd for citizens to contribute unknowingly to accuracy assessments this type of passive citizen sensing has occurred when members of the general public have uploaded photographs to sites such as Flickr or Panoramio.The photographs may have been added to the sites to share with friends and family, but they may have additional uses.Critically, the photographs also provide geolocated images that can be interpreted to yield land cover data that might be used as ground reference data in an accuracy assessment (Antoniou et al., 2010;Estima and Painho. 2013).Additionally, volunteers to projects such as OpenStreetMap may provide land cover data that could be used as reference data (Arjansani et al., 2015a;Estima and Painho, 2015).In a similar way, contributors to internet projects such as the Degree Confluence Project may provide unintentionally data that can be used in an accuracy assessment.These contributors visit the points of intersection of lines of latitude and longitude globally and take photographs of the site.The photographs acquired are available through the project website and may be interpreted to yield ground reference data for an accuracy assessment (Iwao et al., 2006;Foody and Boyd, 2013).Moreover, through the project the photographs of a site are up-dated enabling use through time.The systematic sampling design used is also compatible with best practice recommendations for accuracy assessment.
With active sensing, the citizens contributing data often do so to contribute to scientific research or practical applications.Critically a body seeking to assess the accuracy of a map can design key aspects of the accuracy assessment programme.For example, the sites to be visited for data collection could be specified following an appropriate probability sampling design for an accuracy assessment.The sampling approach can also be designed to fit with existing authoritatively defined data sets and resources, notably by blending the data sets and using explicitly adaptable resources such as the sample defined by Olofsson et al. (2012).
Moreover, given the recent growth of resources such as Google Earth that allow easy access to high quality and often fine spatial resolution imagery for the globe, the data collection need not involve fieldwork, although that can still be useful.A variety of internet based resources are available to help citizens label imagery that may be of anywhere on the planet from the comfort of their own home (e.g.Fritz et al., 2012;Bastin et al., 2013).
Concerns with data quality can also be addressed to some degree.It is, for example, possible to have each site interpreted and labelled by multiple citizens which can aid some modelbased analyses that can provide accuracy estimates (Foody et al., 2013).For example, latent class modelling allows estimates of the accuracy of the data contributed by citizens to be estimated from the data alone, without any reference data.The approach is based on the probability of observing the patterns of class allocation made by the set of citizens contributing to the task; each citizen need not label the exact same set of data as the approach can accommodate missing observations.The set of class labels provided by the citizens form the visible or manifest variables of the analysis and are used to provide information on the unobserved (latent) variable.In typical use, the set of citizens contributing, C, are each presented with a set of cases to label.The citizens may, for example, be presented with fine spatial resolution images for selected locations via an internet based system (e.g.Fritz et al., 2012).With, M c representing one of the set of C manifest variables indexed 1≤c≤C, and its values are class labels represented by m v which lie in the range r (1-q) and using vector notation M and m to represent the complete response patterns (i.e.M denotes (M 1 ,.., M c ) and m denotes (m 1 ,..m q )), the latent class model is that the probability of obtaining the response pattern m, represented as Prob(M=m), is a weighted average of the q class-specific probabilities P(M=m|T=t) (Magidson and Vermunt, 2004).If the set of labels derived from each citizen can be assumed to be conditionally independent of those from all other citizens contributing labels, the latent class model may be written as equation 4.

 
in which Prob(T=t) is the proportion of cases belonging to latent class t (Yang and Becker, 1997;Vermunt and Magidson, 2003); the approach can often be readily adapted for situations in which there is dependence in the labelling.The quality of the model is generally illustrated by its fit to the data and this is commonly assessed with the likelihood ratio chi-squared statistic, with a model viewed as fitting the data if the calculated value of statistic obtained is sufficiently small to be attributable to the effect of chance (Magidson and Vermunt, 2004).Critically, this type of approach provides a means to assess the accuracy of maps without any reference data (Foody, 2012) and can also convey information on the quality of the citizens contributing data in terms of the accuracy of their labelling (Foody et al., 2013).The type of model may also allow the production of information on the confidence or certainty with which individual cases in the map have been classified that would be of value to some users.This could, for example, be used potentially to help illustrate the spatial variation in the uncertainty or quality of the labelling in a land cover map.

CHALLENGES AND OPPORTUNITIES
Citizen contributed data has considerable potential for use in accuracy assessment but a range of challenges exist.The tension between the wisdom and power of the crowd versus mob rule are well-known (Roman, 2009).Before citizen contributed data become accepted widely for use in accuracy assessment a variety of concerns will need to be addressed.The latter extend well beyond the basic concerns with data quality and trustworthiness, with problems connected with issues such as the location, timing and sustainability of data collection as well as a suite of legal and ethical concerns (Vandecasteele and Devillers, 2015;Arsanjani et al., 2015b).For example, the data sets obtained from citizens, especially that contributed unintentionally, may be acquired from highly unrepresentative samples.
A variety of approaches may be used to address the concerns with VGI.For example, markedly different approaches for assessing the quality of VGI are available (e.g.Goodchild and Li, 2012).Some approaches may simply follow a basic voting approach if there are multiple contributions on a particular case, others may have a hierarchy of contributors with some established and trusted people effectively acting as gatekeepers while others may make use of geographical contextual information to sense-check contributions or actually seek to infer quality from the data themselves.
As awareness of the challenges in using VGI grows there is increasing effort on methods to reduce problems and tentative steps to the definition of good practices for VGI collection are emerging (Fonte et al., 2015b).The issues are also not always straightforward.For example, some citizen science projects allow multiple contributions for same case while others actively discourage it.The former allows multiple labels to be available for each case which can aid some analyses but the latter would act to reduce duplication of effort and encourage a larger sample of cases to be labelled, albeit individually.The relative value of these approaches may differ between applications.
Finally, there is, of course, considerable scope for blending VGI with authoritative data sets although some users may wish to ensure that the data sources used can be identified so that attention may focus on cases from just one source independent of the other.For example, the design used by Olofsson et al. (2012) is adaptable and it would be possible to direct citizens to sites to collect data in order to meet specific research priorities (e.g. to increase the precision of estimates in a stratum of interest).

CONCLUSIONS
Good practices for authoritatve accuracy assessment have been defined but may sometimes be impractical to implement.One key problem encountered commonly is the acquisition of a suitable ground reference data set on which to base the accuracy assessment.
Citizen sensing provides the potential to help address some of the problems encountered in the assessment of land cover map accuracy.It is not a panacea but does have the ability to provide reference data over a range of spatial and temporal scales.
Although numerous concerns exist with citizen contributed data, and especially their quality, these are also research opportunities.Means to work effectively with citizen sensor data and to enhance future data acquisitions by defining good practices are emerging and it is anticipated that such data will increasingly be used to inform assessments of land cover map accuracy.