Leveraging crowdsourcing techniques and technologies to generate better...

38
1 Leveraging crowdsourcing techniques and technologies to generate better agricultural information: three case studies, in Indonesia, Nigeria and the Philippines Yakob Seid and François Fonteneau Food and Agriculture Organization of the United Nations, FAO Statistics Division June 2017 Executive Summary Ensuring food security and market efficiency require real-time information, particularly in highly volatile areas. Countries often fail to generate real-time data when applying traditional statistical approaches. The statistics they produce come with some delays and are mainly used for formulating long-term policies and time series analysis. Addressing these data gaps require innovative approaches and new technologies. Recent advancements in data collection using mobile phones have closed the information gap significantly over the past decade in terms of the speed of data collection, dissemination and geographical coverage. Moreover, citizen engagement through crowdsourcing platforms has added to these advancements by moving data collection beyond traditional sources. These new developments provide opportunities to adopt alternative approaches to generate statistics at various degrees. Moreover, they open up opportunities for the private sector, government and development partners, including United Nations agencies, to play a significant role in increasing the adoption of these new approaches through advocacy and by demonstrating the efficiency of them. Indonesia, Nigeria and the Philippines showed an interest in adopting these new approaches so that real-time data could be available to use in making better decisions to improve the well- being of their citizens. The Food and Agricultural Organization of the United Nation (FAO) provided technical assistance and financial support to these countries through a project funded by the Bill and Melinda Gates Foundation. The objective of the Strengthening Agricultural Market Information Systems globally and in selected countries (Nigeria, India, and Bangladesh) using innovative methods and digital technology project was to design and test innovative statistical methods based on digital and georeferenced technologies that increase the quality, timeliness, accuracy and comparability of agricultural data. With the support of the mentioned project, pilots for collecting agricultural data using mobile phones based on crowdsourcing approaches were implemented in Indonesia, Nigeria and the Philippines. The focus of the pilot in Indonesia was on data related to market prices, while the focus of the pilot in Nigeria was to capture more data on agricultural inputs and the outputs. The pilot in the Philippines focused on assessing the effectiveness of crowdsourcing as a tool for forecasting palay production and analysing the crop situation in the country. For these pilots, crowdsource refers to citizen-provided data, a reflection of the information on the ground as known by the local producers, consumers, and sellers trained to provide specific information rather than spontaneous mass data input from volunteers with no

Transcript of Leveraging crowdsourcing techniques and technologies to generate better...

Page 1: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

1

Leveraging crowdsourcing techniques and technologies to generate better agricultural information: three case studies, in Indonesia, Nigeria and the

Philippines

Yakob Seid and François Fonteneau Food and Agriculture Organization of the United Nations, FAO

Statistics Division

June 2017

Executive Summary

Ensuring food security and market efficiency require real-time information, particularly in highly volatile areas. Countries often fail to generate real-time data when applying traditional statistical approaches. The statistics they produce come with some delays and are mainly used for formulating long-term policies and time series analysis. Addressing these data gaps require innovative approaches and new technologies. Recent advancements in data collection using mobile phones have closed the information gap significantly over the past decade in terms of the speed of data collection, dissemination and geographical coverage. Moreover, citizen engagement through crowdsourcing platforms has added to these advancements by moving data collection beyond traditional sources. These new developments provide opportunities to adopt alternative approaches to generate statistics at various degrees. Moreover, they open up opportunities for the private sector, government and development partners, including United Nations agencies, to play a significant role in increasing the adoption of these new approaches through advocacy and by demonstrating the efficiency of them. Indonesia, Nigeria and the Philippines showed an interest in adopting these new approaches so that real-time data could be available to use in making better decisions to improve the well-being of their citizens. The Food and Agricultural Organization of the United Nation (FAO) provided technical assistance and financial support to these countries through a project funded by the Bill and Melinda Gates Foundation. The objective of the Strengthening Agricultural Market Information Systems globally and in selected countries (Nigeria, India, and Bangladesh) using innovative methods and digital technology project was to design and test innovative statistical methods based on digital and georeferenced technologies that increase the quality, timeliness, accuracy and comparability of agricultural data. With the support of the mentioned project, pilots for collecting agricultural data using mobile phones based on crowdsourcing approaches were implemented in Indonesia, Nigeria and the Philippines. The focus of the pilot in Indonesia was on data related to market prices, while the focus of the pilot in Nigeria was to capture more data on agricultural inputs and the outputs. The pilot in the Philippines focused on assessing the effectiveness of crowdsourcing as a tool for forecasting palay production and analysing the crop situation in the country. For these pilots, crowdsource refers to citizen-provided data, a reflection of the information on the ground as known by the local producers, consumers, and sellers trained to provide specific information rather than spontaneous mass data input from volunteers with no

Page 2: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

2

training. No real sampling methodology or practice was performed before selecting the “crowd”. Through the pilots, it was confirmed that these new approaches contribute towards efforts to deal with immediate data needs by providing real-time statistics on important agricultural variables with a minimum investment. The pilots also showed how such a system could be facilitated thought the establishment of efficient private-public partnerships (PPPs). For these pilot, the following arrangement were made: privately owned information technology firms developed and managed the information technology infrastructure, government institutes developed the content and conceptual framework related to the data, and the citizens participated in providing the required data. In addition, FAO and the partners played a critical role by providing technical assistance. The pilot projects serve as working examples of the possibilities of experimenting with new approaches. They, however, come with a cost and capacity needs that national data agencies must deal with in order to manage the new data sources and adjust or reorganize the current data process in place. In the present paper, overviews of the three pilots are given, with discussions on the methodologies, results and lessons learned related to these endeavours.

Page 3: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

3

Part 1. Monitoring agricultural market prices with crowdsourcing techniques and technologies: Indonesia 1.1. Rationale and objective

In Indonesia, capturing data, particularly in rural areas, has always been a challenge. Official

government statistics are usually collected by the central statistical agency and are reliant on

field-level staff to visit markets and accurately capture a price point on a regular basis in

addition to their myriad of other duties. The prices are often recorded on paper and then

entered on a spreadsheet when they return to their offices. The spreadsheet is eventually

submitted to the central office for compilation. Central statistical agencies are impeded by

resource constraints and often face competing priorities between data accuracy and speed of

publication. This ultimately leads to delays in the publication of price statistics.

United Nations agencies face many of the same constraints mentioned above as central

statistical agencies. Their field staff is also often overburdened with multiple responsibilities

despite investments made by the World Food Programme (WFP) and FAO Indonesia in mobile

data technologies for price data collection. Accordingly, these two agencies share interest in

applying new approaches to collect data, including crowdsourcing, as a means to ease the

burden of data collection experience by their field staff.

In early 2015, WFP Indonesia and Pulse Lab Jakarta took initial steps to assess the potential of

crowdsourcing price data in Nusa Tenggara Barat province in Eastern Indonesia. Following a

competitive bid process, Premise Data Corporation, a San Francisco-based technology

company, was awarded a contract to collect price data on 20 commodities in Nusa Tenggara

Barat province for a pilot over a three-month period.

The Nusa Tenggara Barat pilot, which began in February 2015, successfully demonstrated the

potential of crowdsourcing as a means to supplement and augment official price statistics by

increasing the speed and coverage of data collection. For the pilot, three broad geographic

areas within the province were considered from which a sufficient volume of data was

collected to generate statistically representative analysis on a weekly basis.

Following this pilot, the FAO Agricultural Market Information System (AMIS)project sought to

determine whether crowdsourcing price data could be effective in a more rural province with

less connectivity and a less developed road infrastructure. A key area of interest of the AMIS

project is to conduct operational research on the use of innovative methodologies for

agricultural market information systems. In addition, WFP began to explore the possibilities

of using the crowdsourced approach for market-level monitoring (as opposed to the macro

geographic focus of the Nusa Tenggara Barat pilot). With these objectives in mind, a second

phase of the pilot in Indonesia began in October 2015. Phase two of the pilot project expanded

to monitoring in Nusa Tenggara Timur , the second poorest province in Indonesia, which

suffers from significant constraints in infrastructure as it is spread over several islands.

Monitoring continued in Nusa Tenggara Barat province but was focused specifically on

Page 4: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

4

Lombok Island. In the second phase, the focus of the pilot project shifted from broad

geographic areas to specific markets recommended by FAO and WFP.

1.2. Methodology

Item and location specification FAO, WFP, Pulse Lab Jakarta and Premise, a mobile information network, jointly defined a basket of 20 primary food staples for the region. In addition to specifying the basket of items, in the second phase of the pilot, they also jointly identified 17 markets in Nusa Tenggara Timur province and 17 in Nusa Tenggara Barat province. These markets were selected using the ground-level knowledge of WFP regarding these provinces and represent key markets in the area.

Data collection

The platform developed is designed to gather a broad array of data. At the heart of the platform is an iterative process by which a question is broken down into a set of tasks, which are then distributed to contributors. Contributors complete these tasks and submit the data for review. The data then goes through a stringent quality control mechanism. The approved data are then incorporated into a relevant set of analytics, which informs the survey’s iteration as shown in the figure below.

A key aspect of the platform is the ability to break down large, complex, possibly nebulous questions into smaller, structured, unambiguous tasks that a contributor—who is not a professional enumerator—can easily understand and complete. These tasks are pushed to contributors through a smartphone application developed to enable the rapid deployment of

Page 5: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

5

non-professional contributors at a scale to answer questions that would otherwise require professional enumerators or subject matter experts to address.

Data processing

Every data point received by the system is subject to an extensive quality control process, which contains steps generic to all data captured and steps specific to the Indonesia pilot project. The process begins at the point of data capture, with a requirement that each observation be accompanied by some form of evidence that can be used to verify the data point. Each data point is then subjected to a set of fraud and error detection algorithms. The observations that are not eliminated by these algorithms are manually reviewed. Finally, rules are applied regarding packaging size, price normalization and outlier detection.

The system enables the collection of a comprehensive set of metadata, the foundation of the quality control process, with each observation, including a picture of the item observed. As shown in the figure below, it was possible to have the observation metadata, including timestamps, contributor’s details and geographic coordinates which can then be verified and cross-checked against other fields.

The quality control process put in place incorporates both automated and manual elements. In instances in which the automated quality control system suspects potential error or fraud but cannot make a definite determination, the submission is flagged for further evaluation by human quality control reviewers.

Signals used to identify incorrectly entered or fraudulent data include but are not limited to:

Metadata:

o Location: User locations are validated and cross-referenced at the point of price capture and during the quality control process. The recorded location of the contributor is examined and the system automatically discards observations that

Page 6: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

6

are outside of a configurable radius from the intended venue’s location. Additionally, the system identifies when users attempt to submit observations that appear to be from multiple venues (by manually changing the venue name), but have identical global positioning system (GPS) coordinates.

o Mock GPS software and location spoofing: During the pilot, contributors attempted to use third party software to “spoof” their GPS in order to access additional tasks and/or complete tasks with less travel. The presence of these types of software are relatively easy to detect.

o Price: Item prices are flagged when the contributor submits a price observation significantly outside a reasonable range of prices for the currency, product, brand or packaging size in question.

o Additional manually entered fields: for all products, the size, quantity and units of measure are verified. This eliminates a common class of errors in which users incorrectly enter the size or the unit measure associated with a product. These checks are intentionally performed outside of the application to account for the possibility of product packaging changes over time or by country.

o Image: Image analysis is one of the most important techniques used in a fraud and error detection system. Duplicate images, images that are very similar to one another and images that appear to have been photographed off a computer screen are used to flag observations.

Contributor behaviour: The platform is designed to detect a wide range of contributor behaviours that correlate with poor data quality and price anomalies. Warnings and behavioural nudges are surfaced in the application to discourage these behaviours.

o Overlapping contributors: A single data contributor operating multiple accounts, or groups of contributors capturing the same data are identified by session overlaps and other correlations in the venues visited. The graph structure of this data can be explored to identify clusters of fraudulent activity.

o Duplicate data: Another common fraud pattern entails users attempting to submit the same product multiple times at the same location, often accompanied by slight changes in price with each submission. Observations submitted under the same specification within the same place are flagged; these are usually rejected after fraud and error detection is performed.

o “Traveling gnome” fraud: A contributor attempts to submit observations of the same product multiple times, either by carrying the product to multiple locations or by carrying a picture of the product to multiple locations. Pattern detection and trend analyses can identify these attempts and flag for manual review.

Manual quality control

All observations are either accepted, rejected or flagged for manual review by the automated fraud and error detection algorithms. The observations that are flagged for manual review are then placed in a queue and distributed to the manual quality control team.

Page 7: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

7

A web-based platform called Scope was also developed, which enables users to view individual observations to simplify the manual quality control process. Scope allows quality control workers to quickly review large batches of observations and annotate individual observations as accepted or rejected. It is designed to not only present individual observations to quality control workers, but also to provide detailed descriptions of the tasks. Scope also provides mechanisms for quality control workers to give feedback to contributors about why their observations were rejected.

Data dissemination and access

In addition to providing a platform for quality control, each data point captured by the contributors was made available to FAO, WFP and Pulse Lab Jakarta through Scope. Observations can be filtered by time, commodity, location and other parameters. This allowed the United Nations team in Indonesia to access each observation through an intuitive interface. Each observation could be downloaded to a CSV file, including a filtered list of observations, if required.

While Scope enabled access to a large volume of data and made it possible to meet the objective to validate the robustness of crowdsourcing as an approach for data collection in Indonesian markets, it did not specifically provide access to aggregated data. Aggregated data, such as the median price of a commodity per market per week, are critical for WFP and FAO to perform analysis and glean insights on the current food security situation. The volume of data generated by observers and presented through Scope were beyond what would be used by a food security analyst. Rather than create a process for the United Nations team to generate weekly statistics from the individual observations presented in Scope, the system was able to directly generate weekly statistics per commodity and market.

Weekly price statistics were not disseminated directly through the Scope platform initially whose primary function is observation-level data exposure. Instead, weekly price statistics were made accessible through a dashboard (later integrated into the Scope platform) and CSV files, which were generated and uploaded to an online repository on a weekly basis.

The dashboard allows for visualization of changes in food prices over time by commodities and markets. It also depicts changes over time of a weighted index, which was calculated based on the relative importance of each commodity to a household budget on average.

1.3. Report of activities and results

Highlights from phase 1

Network ramp-up

During the first phase of the project, it was possible to build a network comprised of more than 500 contributors on Lombok Island, who visited more than 5,000 unique locations and submitted 66,902 observations. In extremely rural Sumbawa, a trusted network of 24 contributors were built, who visited nearly 50 unique locations and submitted 757 observations.

Page 8: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

8

Numerous recruitment methods, including Facebook advertisements and Facebook Messenger, were used. Ultimately, Blackberry Messenger (“BBM”) proved to be the most effective recruitment method for Indonesia. Locals began sharing the project’s pin number through numerous university groups and on various blogs. This made it possible to build an initial core group of 50 contributors. Within a month, the network of contributors began growing organically, by 30 percent each week, and data volume increased, on average, by 40 percent each week. The network grew so quickly that it became necessary to appoint a local community manager to moderate the Facebook group from which contributors started crowdsourcing answers to each other’s questions.

After six weeks of data capture in phase 1, the network reached a steady state, supplying almost 8,000 accepted data points weekly, far exceeding the target of 1,200 specified in the engagement contract. Geographic coverage remained strong, and the constant addition of new contributors can be attributed to the discovery of more than 250 new markets each week, bringing the total to more than 5,000 unique markets across Lombok Island.

Price trends

At the beginning of the first phase of the project, a broad deflationary trend in the prices of various staple goods was observed. Between 1 April and 30 June 2015, a deflationary trend in general food prices on Lombok Island was observed, as shown in the figure below.

The decrease in prices was primarily driven by a decline in the price of low quality rice, which was then confirmed by WFP, and followed typical seasonal patterns for rice prices in Nusa Tenggara Barat province . In addition to low quality rice, strong deflationary trends were also observed in the prices of several other staples, such as flour, sugar and tofu.

Page 9: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

9

Highlights from phase 2

Market-focused monitoring

The primary objective of the shift to market-based monitoring was to assess whether crowdsourcing as a data collection modality can work at a very specific locality and be reliable through a sufficient number of observations. This shift had two consequences for phase 2 of the project: a change in the volume of observations required from a market and a change in task assignment to be specific to a location.

The ramp-up time to reach those levels varied considerably by province and district. In Nusa Tenggara Barat province , where the network of contributors was already in place, 15 of the 17 markets were fully covered (five observations per commodity) within a month.

There were different challenges and subsequent ramp-up time associated with the three distinct islands in the province that were monitored. In Timor island, where Facebook advertisements were feasible, 468 observations across three markets were captured in November. By January 2016, 15 markets were covered with nine of them with a sufficient observations per commodity level. In February 2016, full commodity and market coverage was achieved, with nearly 10,000 observations captured by more than 100 contributors.

A single market was selected for monitoring in Flores Island, where neither FAO nor WFP have staff on the ground. In the last week of January 2016, full volume of captures was achieved with 14 active contributors.

Monitoring Sumba Island, which is very rural, has limited road infrastructure and a sparse population, was particularly challenging. In January 2016, only four contributors were actively sending price captures. Recruitment in Sumba improved dramatically after WFP shared a list of potential contributors in January 2016. In February 2016, the four markets in Sumba had sufficient volume of price observations, with nearly 2,000 data points successfully captured by 39 contributors.

Overall, the shift to market-based monitoring provided new challenges with a change from tasks unrestricted geographically to specific locations. Challenges were further compounded by the very rural nature of the markets, in Nusa Tenggara Timur province and particularly in Sumba Island. New recruitment methods were required, after which the approach was successful.

Availability survey

As part of the pilot, a methodology for capturing availability and quality information was tested. The table below shows the availability and quality survey questions posed to the contributors. From the deployment of the survey in mid-February 2016 to the end of data collection in April 2016, 2,161 responses were collected from the availability and quality surveys across 17 markets. Results indicated that low quality rice, maize, and tomatoes were mostly present in the markets studied in Nusa Tenggara Barat province, and that there were some fluctuations in the amount of tomatoes available.

Page 10: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

10

Contributor responses regarding the number of products for sale at the markets also suggested that low quality rice, maize, and tomatoes were consistently available. In response to the question “How much of the product is being sold”, when data were aggregated by product throughout the study period, 80-86 percent of the observations indicated that a “normal amount” of the product was being sold. Additionally, the amount available did not appear to vary for any product.

There did seem to be some difference in availability reports between products: “less than usual” amounts of Tomatoes available for sale at the markets were reported as compared to rice and maize. This finding matched expectations by the client team that the availability of fresh vegetables was more likely to fluctuate relative to staple foods, such as maize and rice.

Page 11: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

11

Quality survey Survey results from both quality questions indicated that for the three products measured, there were seldom damaged or low quality food-items on sale. In response to the question asking users to check off the different damage attributes, 85 percent, 97 percent, and 97 percent of the observations reported no damage for low quality rice, maize, and tomatoes, respectively. For low quality rice, “broken” was the only damage attribute with a substantial number of observations.

Consistent with the results from the damage attributes question, 98-99 percent of the observations for each product reported that products for sale were of either fair or good quality. Low quality rice had more “fair” responses rather than “good” responses relative to maize or tomatoes, but this may be influenced by perception of “low quality” rice in comparison to “medium or high quality” rice.

Overall, the availability and quality of surveys appear to have generated consistent and stable data results. However, neither quality nor availability results showed a great deal of change over the two-month data capture period, while prices for tomatoes and maize did increase over this time period.

Utilization of results

Data from phase 1 of the pilot project were not used operationally by WFP but rather illustrated the potential of the approach. The web-based interface into the underlying data (Scope) developed offered a highly detailed view of each price observation. Through Scope, the exact location and time of a price observation, along with a photograph of the commodity, was visible to users of the system including WFP and government staff in real time. The pilot project was shared with partners in Government, specifically in the Food Security Agency within the Ministry of Agriculture in Jakarta and in Mataram, the capital of Nusa Tenggara Barat province. Government partners showed keen interest in the approach but wariness over the use of data generated by non-government staff.

During the second phase of the pilot project, food price monitoring in Indonesia became a key issue because of the impact of drought related to El Nino. Indonesia has historically been severely affected by El Nino events, which have led to spikes in rice prices and ultimately an increase in the levels of poverty and political instability. The availability of real-time price data, particularly in the drought-stricken Nusa Tenggara Barat province and Nusa Tenggara Timur province, was a significant asset for United Nations agencies. Data from the pilot project were incorporated into a wider drought-monitoring system, which combined drought exposure data derived from satellite imagery and economic vulnerability data from a national household expenditure survey.

The drought-monitoring system successfully demonstrated that real-time data could be used to support decision-making. The drought-monitoring system has been showcased to the Office of the President in Indonesia and is being considered for use within Government. This price data collection pilot project, which has ended, clearly shows that the use of innovative approaches to capture real-time data can be of high value to key decision- makers.

Page 12: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

12

Lessons learned

Crowdsourcing has proved to be an accurate and reliable means to rapidly capture price data in markets not normally covered by traditional data collection modalities. This is largely attributable to the objectivity of the observations and the ability to verify data through automated and manual quality control mechanisms. Using a network of contributors to collect information on the quantity and quality of food present in markets was a new approach for Premise and the United Nations team. Results from this exercise are promising and the findings are in-line with expectations in Indonesia.

Below is a discussion on the main lessons learned for future consideration.

Improving the validation process

Further testing of this approach is required, ideally in a scenario in which there are known and verifiable variations in availability and quality in specific markets and commodities over time to assure the quality of the data. With additional data and testing of the ability to detect change in availability and quality over time, a new dynamic monitoring capacity could be built for food security analysis.

Building and maintaining a crowd: challenge Building and maintaining a network of contributors requires the establishment of trust between users and the operating firm, in the Indonesian case Premise. The Growth Team of Premise manages the recruitment of contributors and communications with users throughout a data collection campaign. Users are most likely to become reliable contributors of quality data if the recruitment and payment process is frictionless. To establish trust with users, the Growth Team provides simple tasks during the on boarding process and attempts to make payment for successful data captures as quickly as possible. With this initial trust in place, contributors are likely to continue to be actively engaged, if there is a consistent amount of work, with a minimum level of compensation.

Fraud detection is a core function of the data control processes of Premise. However, in Nusa Tenggara Barat province, new forms of fraud were prevalent during phase 2 of the project. Two specific and new types of fraud uncovered in the province were the use of (a) fake camera applications and (b) fake location applications. Fake camera applications allowed contributors to bypass their internal camera when collecting an observation and instead, use an image from another source and posit it as an image they captured at a market. Fake location applications allow contributors to falsify their current location during recruitment or when making an observation, which, in turn, enables them to fraudulently join a campaign outside of their location or to capture an observation away from the specified market, such as at home.

Rural recruitment

It was possible to maintain a steady volume of data during phase 2 by building up additional super-users. However, recruitment outside of urban areas and on smaller islands proved to be difficult without on the ground assistance. The main challenges identified are the following:

Low connectivity

Page 13: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

13

o Sumba Islands and Rote Island only have WIFI available in a few cafes

Lack of sufficient volume of Facebook users and smartphone penetration o Facebook advertisements could not reach islands in Nusa Tenggara Timur

province, with the exception of Timor. This is driven by the mechanism Facebook uses to target advertisements subnationally, which is biased towards urban areas with larger populations

o Potential users did not own an Android phone and could not be converted to active contributors

Language barriers o The application in use is translated into Bahasa Indonesia, but many potential

users in Nusa Tenggara Timur province only spoke local languages and could not interact with the application.

Page 14: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

14

Part 2. Improving agricultural value chain transparency with crowdsourcing techniques and technologies: Nigeria 2.1. Rationale and objective

During the implementation of activities within the AMIS project plan in Nigeria, assessments

of market prices were carried out in the pilot states – Abuja, Kaduna, and Niger. The reports

from the assessment show that most of the market price data producers in Nigeria face similar

challenges, which include:

Inadequate funding;

Inadequate infrastructure;

Inefficient data collection methodology; and

Lack of sufficient skills and capacity.

Moreover, through the consultative assessments of data needs with farmers’ associations

conducted between October and December 2015 in three pilot states, the following gaps were

identified:

Lack of market price data (wholesale and retail);

Insufficient access to market (hat is the lack of ready market or buyers for their outputs, hence, the need to rely on middle men); and

Data on access to crop production inputs, such as pesticides, fertilizers, farming equipment and loans.

To tackle these identified challenges on market prices data, the Nigeria AMIS project proposed

to create data exchange solutions for agricultural producers and market stakeholders to

facilitate access to and the sharing of near real-time information. This solution will provide

farmers (including small-scale farmers) with access to near real-time price data across Nigeria,

which will enable them to improve decision-making processes and increase earnings. For this,

within the scope of the AMIS project, it was proposed to test the crowdsourcing approach in

Kaduna state. This allowed the farmers to provide and access data, such as the prices of

different commodities, crop production forecasts, stock data and certain important variables

on cost of crop production that serve as inputs to the farm gate value. Then, through insights

from the pilot, the AMIS project in Nigeria was intended to inform the Government of Nigeria

on the usability and efficiency of technology-based crowdsourcing for data collection.

The primary aims of the crowdsourcing initiative in Nigeria were the following:

1. Test the responsiveness and reliability of a crowd composed of local agricultural community;

2. Determine the sustainability of data collection through crowdsourcing approach using a simple technology, such as short message service (SMS); and

Page 15: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

15

3. Obtain individual and market-level data across several data categories, each with its own data periodicity;

The four-month pilot was undertaken in 10 local government areas and four trader markets identified in Kaduna state. The data source was the agricultural community itself: farmers, traders, and suppliers — equipment, pesticide and fertilizer providers—submitted data directly through SMS. Data coverage included multiple topics, such as prices, stocks and forecasts. Reporting frequencies varied from three times per week to monthly, depending on the type of data. The highest periodicity data pertained to farm gate prices. Data for stock and forecasts were covered monthly. For the implementation of the pilot, the AMIS project contracted Knoema to provide the supporting and data-related solutions. To meet the pilot requirements, comprehensive technological solutions were implemented, which included the following core components:

An SMS data collection platform managed by the local partner of Knoema, Digital connect Ltd (DCL);

Data processing and implementation of quality control protocols;

A data repository of raw and quality controlled, aggregated data; and

Data dissemination tools, including a web-based portal and SMS data to the citizen contributors.

2.2. Methodology

Identifying pilot citizen data contributors

Initially, the pilot project was expected to work on the recruitment of the crowd through local farmers’ associations. However, the farmers' associations are involved in only four of the AMIS crops, limiting the advanced knowledge of farmers’ representation for the items in the expanded food basket for Kaduna state. As such, several of the crops covered by the pilot scope have no corresponding farmers’ associations through which farmers could be recruited. For this, it was critical to engage and seek cooperation from government agencies that focus on rural agriculture. By doing this, it became possible to recruit the required number of the crowd for the pilot. Collection, management and dissemination of short message service data Based on the market penetration rate for smartphones in Nigeria and among the targeted crowds, the project chose to test a SMS-based approach to data collection and dissemination. The overall data flow is presented in the picture below.

Page 16: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

16

Data storage and validation The SMS Messaging Platform received and stored SMS sent by the citizen data contributors. Contributors used a standardized SMS short code reference guide to structure the SMS messages. The system used an automated validation process through which the short code messages were evaluated based on several parameters, including:

Number of elements. SMS that lacked any of the four required elements were immediately designated as “invalid” and a reply SMS was sent to the contributor to request resubmission, “Incomplete entries, provide information in the form: [SMS code] [crop id] [unit of measure] [value].” Expected short codes. A valid SMS that included an unexpected code also received an automated SMS to indicate which element of the SMS contained an error, such as “Crop ID or Commodity doesn’t seem to exist on selected list” or “Kindly provide the correct measurement code”. Known phone number. The phone number of the SMS sender was compared to the crowd register to confirm the collector type and local government area or market to which the

Page 17: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

17

contributor belonged. For a SMS from an unregistered phone number, the collector was assigned to the specified code in the SMS and the location to “Kaduna State”. During the pilot, multiple types of SMS were sent to the crowd: reminders to submit data, confirmation of valid SMS, notification of invalid SMS, and SMS data from the pilot. The messaging for invalid SMS notifications, in particular, went through several revisions. The initial response a crowd participant received to an invalid SMS was invalid data category or indicator code. Later the message was revised to provide a sample SMS correctly formulated with explicit, structured guidance to the contributor so that the contributor could resend the SMS with valid content.

For example, a sample message to a farmer that submitted a price using the wrong unit of measure for a crop would be: Incomplete entries, provide information in the format [SMS code][crop id][unit of measure code][value].

In some cases, specific errors repeated by a contributor were flagged in the text message sent to the contributor to reduce errors.

Short message service processing decoding An SMS processing tool (decoding algorithms) was developed during the course of the pilot to decode data from each SMS and validate the information against the pilot requirements, crowd registration logs and other mechanisms. The objective of the algorithms was to improve the validity of SMS messages by identifying and automatically correcting errors detected through pattern recognition. The valid SMS rate averaged 65 percent before use of the algorithms to improve SMS decoding. The initial algorithm (Annex 1) improved the SMS validity rate to 90 percent by identifying and revising SMS to correct the following types of user errors:

Use of miscellaneous characters, such as #29000, =N=21000, 51’000.

Substitution of alpha characters for numerical characters, such as the letter O instead of the number zero in “2O,OOO”’ or an exclamation point in place of a one in “!,000”

Submission of duplicate messages from the same data contributor-crop-category combination on the same day

Use of information other than required codes, such as using “100kg, (4 bags)” instead of the appropriate measurement code.

The final algorithm improvements, which fixed spacing errors and substituted periods for commas as the thousand separator, increased the valid SMS rate to about 90 percent. The manual review process further improved this rate to 95 percent and greater during the final weeks of the pilot. Accordingly, 1,279 valid SMS were processed during the final week of the pilot, 357 more than would have been possible without the latest algorithms and manual reviews. After running the algorithms, manual “decoding” was also performed on invalid SMS. The three most common errors adjusted manually were the incorrect assignment of a unit of measure, improper sequencing of a SMS and sending of multiple data contributions in a single SMS.

Page 18: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

18

Statistical validation of short message service data Using the discrete data points decoded from each SMS — for example, “TRP 11 3 200” became “Retail trader price, soybean, per kg, 200 naira”—Then, this data are further cross-checked based on a set of qualitative factors, including:

• Price changes of more than 20 percent since the last report;

• Stock changes of more than 30 percent since the last report;

• Production forecast changes of more than 35 percent since the last report;

• Cost of production changes of more than 20 percent since the last report. This statistical validation was automated using a custom-built data management tool. Data that fail to meet these validation rules are considered invalid and were excluded from the next step: aggregation and dissemination.

Data aggregation and dissemination Validated data are aggregated and disseminated through two systems: a web-based data portal and SMS. Through the web portal, a comprehensive dataset and custom dashboards to display data were published with flexible options to sort by crops, contributors and locations. Similar to this, SMS messages were carefully drafted to convey essential data and remain with the 160 character limitation of SMS. 2.3. Report of activities and results

During this pilot exercise, it was possible to surpass the expected crowd size set under the scope of the pilot project inception document. As of the final week of the pilot, the crowd had reached 772 participants, including the freshly recruited large-scale farmers. Each contributor received mobile airtime credit in exchange for their participation. Throughout the pilot, additional participants were recruited to help close crop coverage gaps (farmers), replace non-responsive participants (traders) and meet scope demands of the pilot exercise that fell outside the crowd (production costs primarily relevant to large-scale farming enterprises). The community of Kaduna sent almost 20,000 SMS through this pilot. For some data categories and contributors, such as the prices of common fertilizers and crops produced in the state, the pilot yielded a high response. Gaps and reliability concerns persisted in other areas.

Farm gate prices were the most reliably reported and covered all 14 crops. Seventy percent of the targeted data volume was received despite challenges in registering 10 farmers for carrots, melons, rice paddy and sweet potatoes. Total volume reported data for these crops ranged from 12 to 20 percent of expected.

It was also possible to successfully gather data for almost all of the planned pesticides, farm equipment and fertilizers. Accordingly, data regularly received for 10 of the 15 types of pesticides; 4 of the 5 types of fertilizers; and three of the 10 types of farm equipment. (This is lower because of the nature of the rental equipment).

Page 19: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

19

Data coverage and volume were impeded in other areas because of a number of factors,

including, among them, education level and literacy, willingness and reliability of desired

participants, full localization of the expanded scope and development of influential local

stakeholders and use of SMS-based technology.

One of the weakest collection areas of the pilot was traders. Retail traders were reticent to

disclose stock availability to what they perceived to be a government-related data collection

initiative and only reported 25 percent of the expected volume for this category.

Underrepresentation of wholesale traders in the selected markets strongly affected the

volume of wholesale prices reported.

The pilot yielded representative data that could affect the daily lives of local consumers and producers. While not directly useful as a substitute or even component of official national statistics, such as the consumer price index, these data could eventually be used for a variety of purposes, such as evaluation of lending risk based on trends in output and market valuation, assessments of hyperlocal food security conditions affecting the average citizen, opportunities to invest and improve production techniques and resilience to crop failures, and identify pressure points in the food supply chain infrastructure.

Page 20: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

20

During the pilot, farmers, in particular, expressed appreciation for the SMS data. For instance, a district representative speaking on behalf of bean farmers stated that SMS data were being used to target which market to bring weekly production to for sale. The same contributor expressed interest in even more data by SMS, including prices of others crops and more markets. Farmers also expressed interest in fertilizer stock data, with an eye toward government assistance in increasing the local availability of fertilizers. The online data portal supports analysis of the pilot data at the local government area, market

and crop level with dynamic dashboards and easy export options. Some examples are

discussed below:

Basic trend analysis and alerts

Pilot data could support identification of price trends and surges that would help alert

agricultural extension officers to potential food shortages, inflation and grey economic

situations affecting local consumers. The charts below show the average farm gate price for

cassava during the pilot, which clearly trended upward for several weeks of the pilot before

abruptly returning to December price levels during the final week of collection.

Monitoring the price of basket of goods

The Cost of a selected basket of goods can be understood more rapidly through the use of

ranking charts and calculated averages over weekly, monthly, or other periods of time. This

type of information is also valuable for analysing which crops maintain a higher relative market

value over time; or, conversely, with which potential substitute a consumer might replace an

increasingly costly food staple.

Page 21: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

21

Location-based profiles

Land-based profiles are beneficial to, for example, investors interested in a specific region,

donors seeking to expand or improve supply chain infrastructure within a community and

government planners trying to understand how adverse conditions in an area could affect

food supply and availability.

Profiles can be built per market or Local government area based on the pilot data. The example below shows a sample profile for the Sabon-Gari market. It may be filtered by crop; maize is selected.

The sample visualization below also features: the latest retail and wholesale prices and stocks; long-term price trends; and summary tables by crop and for total data available for the market in Sabon-Gari.

Page 22: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

22

Across-time price comparisons

The data portal easily provides time series data for different commodities in different markets.

As shown in the following graph, farm gate prices for beans in Lere and carrots in Sabon Gari

showed far more price stability while yam farm gate prices in the Zaria market shows greater

fluctuation during the time period.

Page 23: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

23

Cross-dataset analysis

Data collected under the pilot may be examined along with other data sources that are

uploaded to the portal. As shown in the graph below, a monthly dataset from the National

Bureau of Statistics of Nigeria is compared with prices collected under the pilot.

Page 24: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

24

Page 25: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

25

2.4. Lessons learned

Based on the size of the achieved contributor crowd and the reliability and validity of the

selected data types, the pilot showed the potential for using SMS-based technology data

collection to supplement other agricultural data in Nigeria to provide higher periodicity,

broader coverage and hyperlocal data. Data collected through the future extension of the pilot

could be integrated with existing official agricultural data from the National Bureau of

Statistics to improve data needs of the local communities.

To scale up such activities in Nigeria, the following points should be considered:

(Re) verify local needs.

What are the current knowledge gaps that, if filled, would influence decisions and actions

of the local agricultural community? Farmers’ feedback at the conclusion of the pilot

suggested that there might be an equal interest in fertilizer and pesticide pricing and

comparative market prices for their crops.

Capacity building first.

Literacy is essential to this type of data collection, particularly to enforce a standardized

collection methodology. Consider two facets: technological and written language. Pre-

education of the crowd is needed to track stock, make forecasts and use the resulting data

to improve farming and marketing operations.

Ready staffing, retraining capacity.

Collection of this nature requires supervisors in each crowd to direct market visits for

retraining and daily surveillance and a ready response team to correct errors and retrain

contributors on the fly by SMS or phone.

An eye to smartphones.

Develop a plan to integrate testing of smartphone-based applications. Smartphones

reduce the burden of learning and using short codes on the data being sent. In addition,

automated alerts and the organization of data are facilitated easily with a smartphone.

The smartphone can also include an explanation of the indicator to complement training

and provide useful metadata, such as geo-coordinates.

Page 26: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

26

Part 3. Monitoring agricultural crop production forecast with crowdsourcing techniques and technologies: the Philippines

Rationale and objective

The timeliness on the availability of crop production forecasts is an essential factor in preventing a breakdown in the food value chain. Addressing food needs and maintaining normal markets behavior require knowledge about the expected availability of production in a country. Necessary interventions can be made if data on expected production are available within a suitable time period. Accordingly, the availability of crop production forecasts and timeliness equally play an important role in taking necessary action to support the food value chain.

As a complement to other data sources and as a direct input into modelling techniques, unbiased farmers’ information can help in producing reliable crop production estimates. Farmers are the real practitioners of agriculture. Their experience with regard to agricultural practices, knowledge of the context and indigenous knowhow can be very beneficial in providing information on the production prospects that they anticipate. With an appropriate introduction and explanation on the purpose of the information required from them, farmers can be great resources of real time information, as they are in the field observing the crop production and are also a contributing factor in determining the expected reduction on the production. Based on this, even the classical statistical method of data collection in most countries is reliant on farmers through the following means: a direct interview; mail questionnaires; a computer aided telephone interview; or a computer aided personal interview. Accordingly, with a simplified approach, such as crowdsourcing, farmers can still continue to serve as real time and reliable sources to be used to not only generate the general picture of the production prospect, but also for ground truthing information from other sources. Accordingly, the pilot implemented by the Philippine Statistics Authority is intended is to test how useful crowdsourcing technology would be in Philippines to collect data on crop production forecasts from farmers. The proposal of the pilot is to assess the feasibility and the costs and benefits ratio of different data collection schemes to monitor real time crop growth, in particular for palay, using crowdsourcing technologies.

Scheme 1: classic crowdsourcing without financial incentives for data contributors, but incentives in terms of receiving aggregated forecasts data from other contributors

Scheme 2: classic crowdsourcing with financial incentives for data contributors, in addition to receiving aggregated forecasts data from other contributors

Scheme 3: SMS surveys at regular intervals, on a sampled list of farmers (in connection with the sample of the agriculture production survey)

Page 27: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

27

3.1 Methodology Coverage and duration

The project was conducted in Panpanga province, from January to June 2017. The covered barangays and municipalities were grouped into two schemes, based on geographical locations. The contributors for scheme 1 were under the traditional crowdsourcing scheme, wherein they were recruited and sent palay-related data on a monthly basis. At the end of the month, they were given incentives in the form of yield and palay situation forecasts, if they were able to provide information. Scheme 2 is practically the same, but aside from incentives in terms of forecast information, they were given 100 Philippine pesos (Pts) (US$1.96) of mobile load credits, if they were able to provide information. Scheme 3 are the samples from the Monthly Palay and Corn Situation Reporting System, a regular survey conducted by the Philippine Statistics Authority, which served as the control. Crowdsourcing coverage map, January-March 2017, Pampanga

- Scheme 1

- Scheme 2

- Scheme 3

- Scheme 2 & 3 overlap

Page 28: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

28

Crowdsourcing coverage map, April-– June 2017, Pampanga

Contributor recruitment, registration and data collection

Agricultural technicians, while serving as agents, recruited palay farmers in each barangay.

The criteria for a farmer to be qualified as a contributor are the following:

Active palay farmer who has standing crop to be harvested within the quarter;

The farmer owns a mobile phone capable of sending SMS, or at least belongs to

household in which at least one member owns a mobile phone capable of sending SMS

on a regular basis

The palay area to be harvested is not less than 1,000 square meters (0.1 hectare);

The farmer is willing to share palay-related information on a monthly basis; and

The farmer is willing to register for identification.

On average, about 25 farmers were recruited in each municipality (12 municipalities were

considered for this pilot). The number of recruited farmers in each barangay varied widely,

ranging from 1 to 16. This became an issue as some barangays do not have farmers who will

be harvesting within the quarter. Because of this, the quota on the number of contributors

was set by the municipality (at least 24), instead of by the barangay (at least 12).

The contributors were requested to register in the recently developed Crowdsourcing Data

Collection and Quality Control System (CrowdSS) to control the entry of data in the mobile

server and for the monitoring of data submission.

Page 29: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

29

CrowdSS is where the database for the registration and information collected are processed,

consolidated and stored. The figure below illustrates the data flow process.

Information on a farmers’ name, and the Barangay and municipality where he or she lives

were collected during the registration and a unique security code was assigned to each

recruited contributor for identification. For scheme 3, the Monthly Palay and Corn Situation

Reporting System s researchers served as the contributors and the actual information was

collected from the sample farmers of the Monthly Palay and Corn Situation Reporting System.

Palay-related information was gathered from contributors on a monthly basis. The

contributors were given the “crowdsourcing brochure”, which contains the project primer and

a guide for sending information, and includes a section for contributor’s field notes. The

following information was gathered from the contributors:

Information on whether palay is harvested or to be harvested during the current

month

Area to be harvested or harvested in hectares

Quantity harvested or to be harvested in sacks

Information on ecosystem (irrigated or rainfed)

Comparison of expected production with that of last year

Reason for the expected production increase or decrease

Data collection was conducted monthly for five days, from the first to the fifth day of the

month followed by field data validation by the central office experts of the Philippine Statistics

Authority and experts from Pampanga provincial statistics office.

Page 30: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

30

Data consolidation and analysis were carried out monthly and quarterly, using Microsoft Excel

and Stata software. The results were released through SMS, as an incentive. The overall

process is shown in the diagram.

Page 31: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

31

The recruited active palay farmers with

mobile phones in the selected

municipalities and barangays are the

contributors.

The contributors monitored the

palay that they planned to

harvest during the quarter.

Based on their monitoring, the

contributors sent the palay

information through SMS, any

time from the first to the fifth day

of the month.

The information received from

the contributors were initially

processed in the SMS gateway

and were reflected in CrowdSS.

The information received from

the contributors were reviewed

and verified. Questionable

information was subjected to

ground validations.

After the ground validations, the

data were consolidated,

processed, and analysed using

Microsoft Excel and Stata.

Forecast information was

generated after the processing.

The information was released

through the monthly and

quarterly reports.

Through SMS, selected information, which includes the projected yield

and production assessment were shared with the contributors who

sent the information. Scheme 2 contributors were given Pts 100 load

credits in addition to the aggregated forecast.

After the quarter, the contributors sent feedback regarding the project.

The agricultural technicians

agents recruited farmer

contributors.

3 4 5

6 7 8

9

.

1 2

Page 32: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

32

3.2 Report of activities and results

CrowdSS was developed to handle the data collection and quality control. It comprised two

components: the mobile application and the web-based application. The mobile application

component automatically processes a received text message from data contributors in the

SMS gateway. It acknowledges the receipt of a valid text message or informs the sender if the

text message is not valid. It also automatically sends reminder messages to registered data

contributors who have not sent data on time during the data collection period.

On the other hand, the web-based application component consolidates the database, which

includes the contributor registration, data from contributors and feedback. It also has a

completeness check module for checking the completeness of the submissions and a

messaging module, which facilitates sending of text messages to the contributors.

During the span of the project, monthly reports were released on day 20 of the month,

containing general descriptive statistics. Similar to this, quarterly reports with additional

information on the issues and concerns encountered during the reference quarter were

released during the last quarter of the month. Aggregated data on average yield and factors

affecting production were shared with the contributors on day 25 of the month.

Two trainings were given to project staff to help them implement the project. This included

training on E-views and Stata software, as well as on CrowdSS for experts involved in the

project.

Response rates

For the first quarter, the average monthly response rate was 91.98 percent. During the second

quarter, the average monthly response rate was higher, at 95.27 percent. The highest

response rate was recorded during the month of April, 98.62 percent, the lowest response

recorded was during March, at 85.12 percent.

Yield statistics

The average yield for the first quarter was 4.08 metric tonnes per hectare. The yield from

contributors ranged from 0.57 to 7.48 metric tons per hectare. The average yield for scheme

3 was the highest among the schemes, at 4.27 metric tons per hectare. It was followed by

scheme 1, at 4.17 metric tons per hectare. The average yield for scheme 2 was 3.85 metric

tons per hectare. The table below contains the summary of the yields across all schemes for

the first quarter.

Page 33: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

33

The average yield for the second quarter was 4.05 metric tons per hectare. The yields from

the contributors ranged from 0.15 to 6.74 metric tons per hectare. The average yield for

scheme 2 was the highest among the schemes, at 4.11 metric tons per hectare. It was closely

followed by the average yield in scheme 1, recorded at 4.09 metric tons per hectare. The

average yield for scheme 3 was 3.66 metric tons per hectare. The summary of the yields across

the schemes for the second quarter is presented below.

Page 34: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

34

Production Assessment

Summarizing the results of the data collected for the first quarter, 45.85 percent of the

contributors who expected to harvest from January to March projected that their harvest

would be larger compared to the same quarter last year. This expected increase in production

is attributed to the improved usage of fertilizers.

The combined monthly results from April to June indicated that 41.81 percent of the

contributors harvesting in the second quarter expected a larger harvest compared to the same

quarter in the previous year. In this quarter, this larger than expected increase was attributed

to the usage of HYV seeds. The following chart contains a summary of the results.

Page 35: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

35

Test to assess the effect of Incentives

A statistical analysis of the response from schemes 1 and 2 indicated that financial incentives

indeed motivated participation. Notably, the majority of contributors who sent feedback

regarding the study were willing to send information again, with or without incentives.

Comparisons between crowdsourcing and the Philippine Statistics Authority surveys

Two of the Philippine Statistics Authority surveys monitor palay: the Palay Production Survey and

the Monthly Palay and Corn Situation Reporting System. The Palay Production Survey

produces palay production estimates on a quarterly basis, while the Monthly Palay and Corn

Situation Reporting System survey entails the monthly updating of the estimates of the

current quarter based on standing crop and forecast for the next quarter based on planting

intentions.

The following table compares the yield results from the Philippine Statistics Authority surveys

mentioned above with that of yield estimates from the crowdsourcing. This comparison was

made for January to April because the Philippine Statistics Authority results for May and June

were not ready. As indicated in the table, despite some differences at the beginning of the

quarter, the difference gets less as towards the end of the quarter as forecasts become closer

to reality as the harvest period approaches. Also, for the months January to April, the

crowdsourcing results differ with that of the Philippine Statistics Authority by about 18.82

percent. However, during the end of the first quarter (March) the percentage difference

between these two estimates is only 13.11 percent.

Month

Yield (in mt/ha)

Crowdsourcing Philippine Statistics Authority (validated

results)

Percentage difference

January 3.98 4.71 18.32%

February 3.64 4.71 29.41%

March 4.16 4.71 13.11%

April 4.08 4.68 14.45%

May 4.10 n.a

June 3.90 n.a

3.3 Lesson learned

Based on the results and encountered problems during the implementation of the project, as

well as the comparison of the project with the existing Philippine Statistics Authority surveys, the

following lessons were noted.

Effectiveness of crowdsourcing as a tool in forecasting palay production and yield

The study was able to produce yield statistics and paly production assessment and situations

on a monthly basis from the information directly given by the farmers. Moreover, significant

factors affecting palay production assessment were identified as using statistical analysis.

Page 36: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

36

However, a substantial portion of the initial information sent by the farmers required

thorough validation, as most of it seemed to be erroneous. This issue is associated with the

perceived difficulties of some contributors in adhering to minor technical requirements, such

as SMS syntax, and confusion over some concepts, such as harvesting within the month or

quarter.

The generated results are sufficient to conclude that crowdsourcing may be a tool in

forecasting palay yield and situation. In particular, in addition to providing indicative

information on the expected yield well ahead of time, the results from the crowdsourcing may

be valuable inputs in validating the results from regular surveys.

Effectiveness of mobile data collection through short message service

The effectiveness of mobile data collection is clearly seen based on the amount of time it took

for data processing. It is more rapid and more efficient than manual writing of information

and electronic encoding afterwards. The time for consolidation of results was significantly

reduced relative to the existing surveys, as field interviews and encoding of survey returns

were not part of the process.

As explained earlier, an issue regarding the practice of mobile data collection is the

predicament of some contributors in following SMS syntax. Minor deviations from the

prescribed SMS syntax will automatically invalidate the submitted information.

Feasibility and cost and benefit of crowdsourcing and the Philippine Statistics Authority

surveys

Some extra costs associated with the crowdsourcing approach are deemed reasonable given

the added benefits in terms of obtaining indicative real-time crop production forecasts.

Moreover, the cost can also be justified since the crowdsourcing data can be used as

additional data to review and validate the results obtained from the existing surveys of the

Philippine Statistics Authority.

Possibility of handling the information technology part by the internal capacity

The experience from the Philippines pilots showed that the information technology

component of this crowdsourcing approach can easily be handled with the existing statistical

system by dedicating some of the information technology experts to work on such activities.

This is a crucial component for the statistical agencies to own and sustain the newly proposed

approach on obtaining quick and indicative data on the domain of interest.

Page 37: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

37

References

Philippines Statistic Authority, Final Report Assessing the Potential of Crowdsourcing Technologies in Crop Forecasting. (Unpublished report submitted to the project office) Premise, Report and lesson learnt from the joint FAO/WFP/Pulse Lab Jakarta Pilot in Indonesia with Premise Data Cooperation. (Unpublished report submitted to the project office) Knoema, FAO AMIS Nigeria, Final Report, Agricultural Data Exchange, Kaduna State. (Unpublished report submitted to the project office) Campbell, C. 2016. Crowdsource your data collection. 30 June. Available at: https://blog.scistarter.com/2016/07/crowdsource-data-collection/#sthash.IGOjtJf6.dpbs https://www.oecd.org/site/oecdgfd/Session%203.2%20-%20GFD%20Background%20Paper%20-%20DRAFT.pdf. De Lathouwer, B., & Exler, R. 2015. How crowdsourcing feed hungry big data apps. 15 January. Available at: www.baselinemag.com/messaging-and-collaboration/how-crowdsourcing-feeds-hungry-big-data-apps.html. Fu, H. 2016.The data revolution and improving price statistics over the next decade. Presentation. Available at: http://pubdocs.worldbank.org/en/107731487106368064/I-01-Fu-Data-Revolution-and-Improving-Price-Statistics.pdf. Hamadeh, N., Rissanen, M., & Yamanaka, M. 2013. Crowd-sourced price data collection through mobile phones.” Collaboration in Research and Methodology for Official Statistics. World Bank. Available at: http://ec.europa.eu/eurostat/cros/sites/crosportal/files/NTTS2013fullPaper_82-v2.pdf. Jones, M., & Kondylis, F. 2016. Lessons from a crowdsourcing failure. Development Impact, World Bank blog. Available at: https://blogs.worldbank.org/impactevaluations/lessons-crowdsourcing-failure. Pydz, E.B. 2013. Knowing in time: how technology innovations in statistical data collection can make a difference in development”. Discussion paper for session 3.2, 2013 OECD Forum on Development, 4-5 April. Available at https://www.oecd.org/site/oecdgfd/Session%203.2%20-%20GFD%20Background%20Paper%20-%20DRAFT.pdf

Page 38: Leveraging crowdsourcing techniques and technologies to generate better …gsars.org/wp-content/uploads/2017/10/AMIS-crowdsourcing_Final.pdf · 1 Leveraging crowdsourcing techniques

38

Annex 1: Algorithm – Kaduna, Nigeria Collection Pilot To improve the SMS validity by overcoming identified user entry errors, we developed a custom algorithm (attached). Following is a more user- friendly summary of errors and rules. 0.1: if the message contains the following character, replace them with space - Character List: {/, \, - ,_,",=,>,} 0.11: replace single quote ' with empty (NO SPACE) 0.2: after the above one, replace double space into single space, so that message is ready for split by space. 0.3: Replace 'O' to '0'; '!' to '1', 'I' to '1' 1. After decoding and cleaning, if no valid numerical values found, mark the row as Invalid with comment 'Incomplete message'.

2. If no match found on commodity/machinery/fertilizer/pesticides code, mark row as Invalid with comment 'Incomplete message'.

3. If no match found within the Measure-List, mark row as invalid with comment 'Incomplete message'.

4. If value is given as '0' for following SMS short codes (FGP,TWP,TRP,SCPMC,SCPFC,SCPPC), mark row as invalid with comment 'Value cannot be zero’.

5. Unit code check: If unit of measure code is given as 11,12,13 for the SMS short codes OTHER than SCPMC, FCPMC, FCPW,FCPE, mark row as invalid with comment 'Invalid Unit of Measure'.

6. If no matching LGA/Market found for the given telephone number, mark LGA/Market column with 'N/A'.

7. If the phone number and key category does not exist in the crowd registry and if it is a valid message, insert relevant comment as 'Not a Valid Farmer' or 'Not a Valid Wholesale Trader', etc. Also, set LGA/Market as 'N/A'. a. Wholesale trader must submit data only for Wholesale data and Retail Traders must submit retail data. Identify type by using the first two characters of keyword: 'TW' means wholesale; ‘TR' means retail.

b. If there is no matching phone number, mark as 'Not a valid contributor'. Data row can be set as Valid if it is a valid SMS. 8. There are several messages where '.' used as thousand separator instead of ','. Example: FGP 9 6 22.000. Here the value is actually meant as 22000. Remove the '.' only if: after splitting the message, if the 4th element has '.' and after '.', if the length is 3, remove '.' to get the actual value.

9. Some messages combine the commodity and measure codes. Examples: TRP 84 600, TRP 23 520, FGP 56 #13000 FGP 61 1200. In such cases, check the second element in the array. If it exceeds the commodity code of 18, it means, possibly, the user combined the commodity code and measure code. Split the codes. Consider the first as the commodity code and the second as the measure code.

10. In others, the SMS short code and commodity codes are combined. Example: "2348028153733","NIG_AIRTEL","FGP3 6 38500","2017-01-09T14:57:53.548Z". Try to first match with SMS code list. Then take the remaining, assign to the commodity code, and refine the message.