HOW MUCH DO WE KNOW ABOUT THE CONTRIBUTORS TO VOLUNTEERED GEOGRAPHIC INFORMATION AND CITIZEN SCIENCE PROJECTS ?

In the last number of years there has been increased interest from researchers in investigating and understanding the characteristics and backgrounds of citizens who contribute to Volunteered Geographic Information (VGI) and Citizen Science (CS) projects. Much of the reluctance from stakeholders such as National Mapping Agencies, Environmental Ministries, etc. to use data and information generated and collected by VGI and CS projects grows from the lack of knowledge and understanding about who these contributors are. As they are drawn from the crowd there is a sense of the unknown about these citizens. Subsequently there are justifiable concerns about these citizens’ ability to collect, generate and manage high quality and accurate spatial, scientific and environmental data and information. This paper provides a meta review of some of the key literature in the domain of VGI and CS to assess if these concerns are well founded and what efforts are ongoing to improve our understanding of the crowd.


INTRODUCTION
Every year nearly one billion people throughout the world volunteer through public, non-profit or for-profit organizations, or directly for friends or neighbours (Manatschal and Freitag, 2014).Volunteered Geographic Information (VGI) and Citizen Science (CS) both involve citizens volunteering for the the collection, management and analysis of spatial and/or scientific data relating to the natural and built environment.In CS these groups of citizens are usually working as part of a collaborative project with professional scientists.Most VGI projects see citizens working and collaborating in self-organising groups and networks.Generally VGI and CS are seen as specific examples of Crowdsourcing (Estells-Arolas and Gonzlez-Ladrn-De-Guevara, 2012) where a group of individuals ('the crowd') of varying knowledge, heterogeneity, and number, voluntarily become involved in the undertaking of a particular task.The undertaking of the task, of variable complexity and modularity, in which the crowd participates sees these individuals bring their work, money, knowledge, skills and/or experience to this task.Coleman et al. (2009) explains that there are many reasons why citizens volunteer for VGI and CS projects.Individual informal volunteering does indeed coincide with high personal levels of altruistic reciprocity (Manatschal and Freitag, 2014).
With the advances in Internet technologies, smartdevices, social networking, etc. over the past decade VGI and CS are generating data and information which compares very favourable in quality and accuracy to that collected by professional surveyors, scientists, researchers and engineers (Tulloch et al., 2013).However there continues to be a severe reluctance from professional, industrial, governmental and other authoritative organisations to use data and information collected and generated from VGI and CS as a supplement or replacement to their own data flows and collection activities.In areas such as National Mapping and Cadastral activities VGI has been shown to have particularly high potential for processes such as change detection, update of national spatial databases, supply of tacit local knowledge, etc (Comber et al., 2013).Tulloch et al. (2013) argues that CS is often the only practical way to achieve the geographic extent required to document environmental patterns and address some scientific research questions at large scales.Volunteer groups are conducting many of the tasks that were previously the responsibility of governments and scientific organisations.Volunteers are fulfilling the role of low-cost service providers for state-run programs, especially given the necessity for monitoring under various national and international legal obligations (Measham and Barnett, 2008).Miller-Rushing et al. (2012) remarks that prior to the professionalisation of science in the late 19th century nearly all scientific research was carried out by amateurs specifically people who were not paid as scientific professionals.
Despite an abundance of literature reporting studies of VGI and CS there remains a lack of understanding of who the crowd of citizens involved actually are (Nov et al., 2014).This is one of the most often cited reasons why National Mapping Agencies, scientific institutions, government agencies etc. remain reluctant to use VGI and CS data and information.This paper provides a meta review of some of the key literature in the domain of VGI and CS to assess if these concerns are well founded and what efforts are ongoing to improve our understanding of the crowd.The paper attempts to discuss how much we know about contributors to VGI and CS projects in light of the current research work which has been reported in this area.What aspects of the crowd of contributors to VGI and CS are most closely related to the quality and accuracy of the data and information collected and generated?The paper addresses several of the key aims of this ISSDQ 2015 Special Session.In section 2. we address and discuss the impacts of the background of contributors have on what is recorded in crowdsourced data (see section 2.1), the variations between different groups of contributors (see section 2.2), and possible impact and effect from technologies used by contributors (see section 2.3).This short paper closes in Section 3. with a discussion of some of the key messages from the paper with some suggestions for immediate future work and research direction.Hunter et al. (2013) remarks that there are some inherent weaknesses to citizen science and crowd sourcing projects.The limited training, knowledge and expertise of contributors and their relative anonymity can lead to poor quality, misleading or even malicious data being submitted.It can also lead to a reluctance from professional scientists or professional organisations to use data collected by citizens or crowdsourcing projects.In this section we discuss the current understanding about crowd of contributors to VGI and CS projects under three different headings: 1. Impacts on data quality and data collection directly related to the background of the contributors involved (Section 2.1) 2. How variations within the crowd of contributors and how the crowd interacts with each other can effect data quality (Section 2.2) 3. How the types of technologies used by contributors can influence the types of data collected and ultimately the quality of the crowdsourced data (Section 2.3) Figure 2. illustrates how these three headings influence the behaviour of the crowd.The purpose of Figure 2. is to emphasise how these three headings are related and are not mutually exclusive.The background of volunteers from the crowd can influence how they interact with the other members of the crowd or volunteer group.Their background also influences what technologies they have at their disposal and have prior skills or knowledge of using.The type of crowd (for example their area of interest) can influence which types of technologies that are available for a VGI or CS project.

Impacts from the Background of Contributors
Contributors to VGI and CS projects can be selected from a specific subset of the crowd or from citizen groups already in existence.On the other hand VGI and CS projects can be formed and collect new contributors or members as time passes and the project grows in popularity.Generally joining citizen scientist groups or projects does not usually require any previous scientific training or scientific background.There is not a great deal of literature which deals with the backgrounds of contributors to VGI and CS projects.Many studies often focus on the motivations of those contributors but do not ask the volunteers about their socio-economic or professional skills backgrounds.Haklay (2013) remarks that there is a global growth in the population of well-educated individuals, with many millions of people who have advanced degrees or some level of higher education in science or engineering but do not use their scientific knowledge in their daily life.For many of these people, education provided a starting point for an interest in science, which is not fulfilled in their daily activities.Thus, CS provides an opportunity to explore this dormant interest.Consequently scientists running a CS project can assume a basic understanding of scientific principles by the participants.
Often the background of contributors to VGI and CS is seen to be related to the types of citizen groups or organisations which those citizens are currently (or have been) involved in.Wehn et al. (2015) comments that in some cases local government authorities appear to prioritize selected groups of citizens regarded as more knowledgeable for CS projects.They illustrate this example from the UK where members of Civil Defence groups or those having specific professional skills are selected.This leads to the formation of networks of "qualified observers selected for participation in flood monitoring progammes" (Wehn et al., 2015)(p.232) The majority of focus on the types of contributors to VGI and CS focusses on those actively involved in these projects.Interestingly Manatschal and Freitag (2014) asks researchers to consider that most studies evaluating the motives of volunteering naturally analyse only active volunteers thus neglecting the group of non-volunteers.What are the factors which prevent these nonvolunteers becoming involved in a specific VGI or CS project?Is there something about the design, marketing, mission or governance of those projects which prevents these non-volunteers becoming involved?Or is it a different aspect of their background, education or lifestyle which creates a barrier to participation?Courter et al. (2013) argues that the professional and daily lives of citizens often prevents them from being actively involved in CS and VGI during normal working hours and working days.They comment that one potential source of bias in CS is the tendency for more higher levels of reporting and activity to take place on weekends rather than on weekdays.Sundeen et al. (2007) reports that the backgrounds of volunteers to volunteer activities such as crowdsourcing is heavily linked to three key obstacles: the lack of free time, the lack of interest in the subject or problem, and ill health for the citizen themselves or family members.These three obstacles individually or in combination can have a great impact on how citizens volunteered to activities such as VGI and CS.Arsanjani and Bakillah (2014) conclude in their study of OSM that the involvement in the OSM project is dependent on a wide variety of factors.High levels of participation in OSM is strongly related to areas with high population density, middle level of education, high income, high rate of overnight stays, high number of foreigners, and residents aged from 18 to 69 are more likely to be involved in OSM.
Passive data collection using mobile devices is a popular means of collecting data in CS and VGI.This usually involves the citizen installing software or an 'app' on their mobile device and allowing this application run continuously or at certain specified times.
The application takes care of all data management and data transfer back to the CS or VGI project server infrastructure.In cases of passive data collection the citizens involved will often have to already own a specific model or make of mobile device.These devices can extend to other types of devices including audio-visual equipment, GPS sensors and other environmental sensors.

Variations in the Crowd and Group Dynamics
The structure and organisation of a crowd in a crowdsourcing project can influence the types of data collected and the quality of that data.This relates to the group dynamics and how these groups respond to different challenges.Measham and Barnett (2008) show that community education seems to be a major focus of environmental volunteer groups while self-education as a reason for being involved in a volunteer group is less of a focus in urban areas.Miller-Rushing et al. (2012) suggest that citizen volunteer projects can be designed to tackle questions or tasks at geographic, temporal and thematic scales which are unachievable through professional scientific approaches.Involving citizen scientists in certain projects that professionals would not conduct on their own either because the type of question/task or the place of study is also an important consideration.These questions or tasks must sufficiently interest or motivate volunteer groups to participate.The more interesting and motivating the questions or tasks are the more likely the crowd is to grow in size.
Within a community group or crowd there are different dynamics at play.There will always be the situation where in a voluntarybased community structure or project some volunteers will perform more of the tasks, do more of the work and overall produce higher quality output.Nov et al. (2014) suggests that scientific organisations consider a more nuanced governance structure in which high-performing volunteers are more empowered, would not only enhance citizen scientists' motivations, but may also reduce the load on professional scientists who could delegate some tasks, such as quality control, to these high-performing volunteers.In VGI projects such as OpenStreetMap studies have shown that high frequency contributors perform over 90% of the work for a specific set of tasks and that these high frequency contributors tend to help each other rather than lower frequency contributors (Mooney and Corcoran, 2014).Arsanjani and Bakillah (2014) show that proximity to areas of high population is strongly related to participation in OSM and the subsequent dynamic updating of OSM data in those areas.In the work by Embling et al. (2015) on using Citizen Science in the marine environment they conclude that it is important the CS projects consolidate volunteer effort at locations which are important to the study or project.This will then help to ensure sufficient survey data to achieve statistical rigour in identifying trends.
Different members of the crowd or different groups within the crowd can have very different intrinsic approaches to the voluntary activities they are involved in.Niman (2013)(pg 26) argues that in online gaming and related activities there has been the "emergence of a fun economy . . .and what passes for fun, at times, looks to be a great deal of hard work".This extends to crowdsourcing where the degree to which an individual can contribute towards the collective good in a VGI or CS project depends on the quality of their own skill set and their concept of achievement and evaluation of extrinsic rewards.Manatschal and Freitag (2014) content that there are clear differences in the motivational structures in those involved in crowdsourcing activities such as VGI and CS.VGI and CS are types of "formal volunteering" and participation in these activities are often driven by hedonistic (having fun) or egoistic (recognition and rewards) reasons.The concept of altruistic motivations (simply helping others) appears to be more closely aligned with "informal volunteering".This has an effect on the group dynamics of crowd-based projects.
Crowd dynamics and behaviour can often be better understood if citizen groups are part of known groups or communities.Lawrence (2006) argues that in the domain of volunteer and citizen-based biological monitoring the most effective CS project design should be focused on including or engaging participant organisations and groups which are already in existence before the CS problem emerged.Participant organisations which have existed over a long period of time also provide improved opportunities for engagement and ultimately better data quality.To the contrary crowsourced communities such as OpenStreetMap were not in existence in a different guise before OpenStreetMap started.However many of the influential contributors to the early days of Open-StreetMap were already well established members of the Open Source Software community.With Open Source communities it is very often the case that members of these communities will bring with them significant software and technical skills before joining (Hertel et al., 2003).

Impact of Technology and Technological Effects
The final impact we shall discuss is that of technology on the quality of data and information collected and generated by crowdsourced projects such as VGI and CS.For the most part the crowd are not trained to use professional scientific equipment, methods and protocols.Newman et al. (2011) emphasise that there must be sufficient and effective technological, software and scientific support available to volunteers to provide them with the correct training to contribute effectively.The type of VGI or CS project will influence the type of technologies and skills required.In OpenStreetMap, for example, there are many different interfaces volunteers can use to contribute spatial data to the OpenStreetMap database.Figure 2.3 shows one of the web-based editing tools which can be used by OpenStreetMap contributors.This web-based tool is one of a wide variety of tools which also includes editing capabilities within GIS software such as QGIS and offline OSM editor software.OSM editing software which provides more detailed GIS-like functionality may be more useful and familiar with volunteers who are skilled in GIS or related disciplines.
In the early days of citizen science and volunteer monitoring data capture was often performed by analogue means: pen and paper, phonecalls, photographs, etc.Today citizen science projects and VGI projects usually make use of software and web-based technologies.The complexity of these technologies and the levels of skill required to understand and use them must be carefully considered by those developing VGI and CS projects.Kremen et al. (2011) show that in pollinator monitoring data biases exist in the types of observations that citizen scientists make.This is related to the complexity of the observational tasks but the authors show that these can be mitigated against by better contributor training and technological support.Project design and technological design should avoid making the observational tasks very difficult.This can greatly increase the opportunities for data entry errors, observational bias and general poor quality data.
Most VGI and CS projects do not evaluate how effective the technologies used actually are.Evaluations on how digital and internet technologies are being used in citizen engagement and crowdsourcing are few in the literature (Mandarano et al., 2010).Effective use of technologies and proper user-interface design can have a very positive influence on the quality of the data collected in crowdsourced projects such as CS (Preece and Bowser, 2014).Lawrence (2006) argues that technology must be used carefully.Technologies such as smartdevices, tablets etc can greatly enhance the data capture in CS and VGI and also greatly simplify the data collection and data entry processes.However Lawrence argues that some studies have shown that if tasks are very repetitive then this can have a negative impact on data quality regardless of the types of technologies used.

DISCUSSIONS AND CONCLUSIONS
In this short paper we have provided a discussion and meta review of some of the key literature in the domain of VGI and CS to investigate the level of understanding of the characteristics and dynamics of citizens "the crowd" involved in these activities.
How much do we really know about the contributors to VGI and CS projects?Despite an abundance of literature reporting studies of the data collected and generated by VGI and CS there still remains a lack of understanding of who the crowd of citizens involved actually are (Nov et al. (2014)).In some examples the crowd of citizens are drawn from an existing community or volunteer and citizen groups.In other examples the crowd grows as the VGI or CS project grows in popularity.In this sense our knowledge of the contributors to VGI and CS projects is still limited.
The paper has considered the impact on VGI and CS data quality related to: the background of contributors (their skills, their socio-economic status, etc.), the variations in crowd type and the structure of networks of citizens involved in VGI and CS, and finally the impact of the use of technology within these projects.We discussed how these three issues are tightly coupled and interrelated.
Perhaps one of the most interesting underlying issues in regards to the quality of data produced by VGI and CS is that high quality data is collected, generated and managed by volunteers without any financial incentives to strive for good data quality?As outlined above studies have shown that demonstrated the quality of these data streams.Increasing financial investment in a volunteermonitoring programme will not necessarily lead to higher quality data and more outputs.The simplistic idea that injecting increased financing and capital into VGI and CS will not necessarily result in higher quality data (Tulloch et al., 2013;Sundeen et al., 2007).Rather there is an onus on organisations which instigate and manage VGI and CS projects to understand their likely specific pool of contributors, level of enthusiasm for the project amongst these contributors and the likelihood for the programme to provide data and information over the designed spatial, temporal and other thematic ranges.Within the context of Citizen Science, research on motivations for participation is still in an early phase and empirical evidence is very scarce (Nov et al., 2014).Certom et al. (2014) emphasises the need for further empirical research in this area because of the novelty of the area of research and the expected and important future development of crowdsourcing for environmental governance and sustainability.
While there is not sufficient space in an abstract to fully explore this issue our initial conclusions from this meta analysis are clear.
The crowd can become involved in VGI and CS in many different manifestations: as individual citizens, as local community groups, groups selected by a professional organisation, groups bound by some environmental or political ideology, cause or issue, as socially connected groups, etc.In an extensive study by Sundeen et al. (2007) lack of time is the most frequently mentioned obstacle to becoming involved in volunteering by citizens.Those living in smaller cities are more likely to mention lack of time as an obstacle with a possible explanation for this being that potential volunteers are called on in greater frequency in smaller populations.Other socio-economic barriers are mentioned.Very often problems with sustainability of participation by citizen volunteers in VGI and CS are emphasised by professional organisations as a major impediment.Such organisations cite the inability to predict how long citizens will remain involved as an obstacle to engagement with VGI and CS.Sundeen et al. (2007) believes that this is not wholly an issue for citizen volunteers.They conclude that organisations' volunteer recruitment strategies should consider these barriers and find appropriate ways to respond to these concerns.Organisations need to be creative in finding ways to generate volunteer interest in participation and work with volunteers who are participating in their projects (Preece and Bowser (2014); Lawrence (2006)).Nov et al. (2014) concludes that successful crowdsourcing projects should be designed such that the goals of citizen and professional scientists or professional organisations are properly aligned.
So how much do we know about contributors to VGI and CS projects?The background of contributors is often subject to much speculation from academic analysis of the contributors to VGI and CS projects.There is no hard statistical evidence which indicates that different backgrounds of contributors has influence on what is recorded on crowdsourced data.Rather the literature appears to indicate that contributors who can allocate more voluntary time to their efforts in VGI and CS projects are more likely to contribute more frequently, in greater quantity and with higher quality.Targeted selection of crowds from specific groups or organisations can assist in assembling a crowd who are a known quantity and this often eases the concerns of the professional or commercial organisations involved.VGI and CS projects have far greater flexibility and control over the types of technologies which are used by contributors to these projects.The projects can impose some specific requirements for the types of devices or flavours of technologies that must be used in order to contribute to a specific project.Good user-interface design on data capture software and applications can greatly assist in imposing quality control and checking as data is captured.

Figure 1 :
Figure 1: An overview of how the background of contributors, variations in the structure of citizen networks and the technologies used in VGI and CS interact with each other.The background of the contributors, the variations in how crowdsourced communities are structured and the technologies they are using for data/information collect are all interconnected and need to be considered why assessing the characteristics of a particular crowd or VGI/CS project 2. UNDERSTANDING THE CROWD OF CONTRIBUTORS

Figure 2 :
Figure 2: A screenshot of the ID editor for OpenStreetMap.This is one of a wide range of web-based and desktop-based editing software for OpenStreetMap Indeed the precise influence of technologies of the quality of data produced by the crowd is open to debate and requires further research work and analysis.Nov et al. (2014) finds that the overall contribution data quality is positively affected only by collective motives and reputation and does not seem to be heavily influenced by the technologies used.Authors such asSpinsanti and Ostermann (2013) have developed approach which use VGI generated by 'the crowd' for other purposes (Twitter feeds, Flickr photo uploads, etc) and develop automated software processes for producing relevant, credible and actionable VGI information usable for crisis events such as forest fires and environmental emergencies.In these cases there is no control or selection of the crowd performed.Rather these automated approaches must build in additional processes to clean and filter this potentially very large stream of VGI.Overall technology is a crowdsourcing enabler.Certom et al. (2014) concludes that the current trend in participatory and crowdsourcing research is aimed at enabling people to collaborate with professional researchers using personal technological devices, information communications technology and sharing collected items, social software of the web 2.0, creative commons and open-access format, etc. Therefore technology has a crucial role to play.