OpenStreetMap and Wikipedia: A Method for Identifying ...€¦ · OpenStreetMap and Wikipedia: A...

Post on 30-May-2018

240 views 0 download

Transcript of OpenStreetMap and Wikipedia: A Method for Identifying ...€¦ · OpenStreetMap and Wikipedia: A...

Green bold = +10% increase

Green = 0-10% increase

Red = 0-10% decrease

Red bold = -10% decrease

OpenStreetMap and Wikipedia: A Method for

Identifying Cultural Ecosystem ServicesProgramme: MSc Geographic Information Science

Name of Student: Christopher French

Supervisors: Muki Haklay and Gianfranco Gliozzo

UCL Department of Civil, Environmental

and Geomatic Engineering, Gower St,

London ,WC1E 6BT

Introduction

The development of web 2.0 technologies[1] has led to a valuable source of

geographic content called volunteered geographic information (VGI)[2] making way

for a multitude of open source applications. This project focuses on two of these,

OpenStreetMap (OSM) and Wikipedia, with an aim of using them to identify and

analyse distributions of cultural ecosystem services (CES’s). Ecosystem services

are the benefits that people experience from environments, material and

immaterial[3]. This work targets CESs, these include religious and spiritual

enrichment, recreational or aesthetic experiences, tourism and benefits with

cognitive, educational or scientific value. Research in the field is growing and this is

of great importance because there are strong linkages between CESs and health,

welfare and social relations; however CES supply is declining and demand is rising.

Aims and Objectives

To carry out spatial and temporal analysis similar to methods used in public

participatory ecosystem service research[4] but by implementing VGI data from

OSM and Wikipedia instead in order to build on current knowledge and test the

credibility of these data sources in this research field. A further aim was to improve

upon previous OSM data extraction and processing methods[5] in terms of time and

computational intensiveness, while delivering equally robust and valuable results.

The chosen study area is South Wales, the extent is diverse, encompassing major

coastal cities, The Valleys, the Brecon Beacons National Park and the Wye Valley,

which is an area of outstanding natural beauty (AONB).

References

[1] O’Reilly, T. (2005). What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software. http://www.oreilly.com/pub/a/web2/archive/what-is-web-20.html?page=1.

[2] Goodchild, M. (2007). Citizens as sensors: The world of volunteered geography. GeoJournal. 69, 211-221.

[3] Millennium Ecosystem Assessment. (2005). Ecosystems and Huma Well-Being. Millennium Ecosystem Assessment. Washington, DC: World Resources Institute.

[4] Brown, G., Fagerholm, N. (2015). Empirical PPGIS/PGIS mapping of ecosystem services: A review and evaluation. Ecosystem Services. 13, 119-133.

[5] Mooney, P., Corcoran, P. (2012a). Characteristics of Heavily Edited Objects in OpenStreetMap. Future Internet. 4, 285-305.

[6] Korner, P. (2011). GitHub: Source code for the OpenStreetMap history splitter. https://github.com/MaZderMind/osm-history-splitter.

[7] OSM (2016). Full OSM history dump. http://planet.openstreetmap.org/planet/full-history.

[8] Django, 2016. Documentation: GeoDjango. https://docs.djangoproject.com/en/1.10/ref/contrib/gis.

[9] MEA (Millennium Ecosystem Assessment). (2005). Ecosystem and human well-being: synthesis. Island Press, Washington, D.C., USA.

[10] Alessa, L., Kliskey, A., Brown, G. (2008). Social-ecological hotspots mapping: a spatial approach for identifying coupled social–ecological space. Landsc. Urban Plann. 85, 27-39.

Methodology

The osm-history-splitter extraction

software[6] was complied on the

Elastic Compute Cloud (EC2) from

Amazon Web Services, allowing

faster download and processing of

the file[7]. This was made faster by

using Python code to split the edits

into sub-bounding boxes based on

their associated tag, grouping ones

with the same tag (natural=peak).

This allowed faster importing into

the database instead of loading

them individually as has been done

previously. GeoDjango[8] was also

implemented to create and

populate the databases to avoid

writing lengthy SQL.

Selecting CES categories was done

by reviewing key papers to find the

most commonly used categories.

These were: recreation, aesthetics,

spiritual and religion, education and

cultural heritage. Grouping tags into

categories was challenging as no

classification scheme exists

therefore it had to be carried out

subjectively. This process would

have benefited from a sensitivity

analysis, principle component

analysis or peer review to add

validity to the decision making. Methodological framework of the data preparation and analysis.

Kernel density estimation (KDE) of all cultural ecosystem services

ResultsFigure 2 - Total yearly OSM edits relating to

CES’s, June 2006 - June 2016

Figure 4 - Total OSM edits relating to each of

the CES’s, split into 2 year time periods, June

2006 - June 2016

Figure 6 - Percentage of OSM edits relating to

each CES’s in each category, June 2006 -

June 2016

Figure 3 - Total Wikipedia article revisions

relating to CES’s, June 2006 - June 2016

Figure 5 - Total Wikipedia article revisions

relating to each of the CES’s, split into 2 year

time periods, June 2006 - June 2016

Figure 7 - Percentage of Wikipedia article

revisions relating to each CES’s in each

category, June 2006 - June 2016

Figure 8 - Kernel density estimation raster surface displaying all

OSM edits relating to CES’s, June 2006 - June 2016

Figure 9 - Kernel density estimation raster surface displaying all

Wikipedia article revisions relating to CES’s, June 2006 - June 2016

KDE and hotspots of each cultural ecosystem service category - OSM Recreation Aesthetics Spiritual and Religion

Education Cultural Heritage

KDE and hotspots of each cultural ecosystem service category - Wikipedia Recreation Aesthetics Spiritual and Religion

Education Cultural Heritage

OSM

Area Recreation Aesthetics Spiritual and Religion Education Cultural Heritage

Overall study area (%) 10.0 20.1 18.5 24.2 27.2

Urban and suburban areas (%) 12.7 13.6 41.8 19.5 12.4

Brecon Beacons National Park (%) 8.7 18.6 5.4 20.4 46.9

Wye Valley (AONB) (%) 9.4 18.8 9.9 33.1 28.7

Wikipedia

Area Recreation Aesthetics Spiritual and Religion Education Cultural Heritage

Overall study area (%) 20.4 22.8 12.6 27.7 16.5

Urban and suburban areas (%) 25.8 18.7 20.8 21.2 13.4

Brecon Beacons National Park (%) 25.3 27.4 10.5 23.2 13.7

Wye Valley (AONB) (%) 13.6 37.3 1.7 39.0 8.5

Figure 10 - Kernel

densities and hotspots

of the individual CES

categories for OSM,

June 2006 to June 2016

Figure 12 - The proportion of CES related OSM edits and Wikipedia

revisions in urban areas, national parks and AONB’s compared to

the total study area.

• Population is not the main contributing factor for the number of edits/revisions,

many other complex factors relating to CES values and benefits are involved.

• Correlations exist between the spatial and temporal trends of OSM edits relating

to aesthetic and education (fig 10); and Wikipedia revisions relating to aesthetics

and recreation (fig 11). Comparing results to literature reveals mixed outcomes[9].

• Anomalous cultural heritage OSM edits made between June 2014 and June

2016 by an individual user have skewed results greatly (figures 4 and 6).

• The difference in datasets is partially causing the contrasts in spatial distributions

of CESs. There were 10x the number of OSM points (even after grouping into

sub-bounding boxes) than the number of Wikipedia articles, but the range in

number of article revisions was far higher. This meant the IDW and KDE maps

have very different results depending on the search radius used.

• KDE outperformed IDW*, producing clearer and more natural areas of high

values and is more fitting for analysing social data of this kind as expected[10].

• VGI data has great potential for further ecosystem service research in a

direction, offering easy access to larger datasets than achievable through PPGIS

method, but does lack information on contributor demographics.

Future work

• Focus on analysing trends in the number of contributors instead of edits

themselves, and also implement contributor edit limits or another method for

dealing with vandalism and anomalous edits/revisions.

• Develop Python code further to split data into square bounding boxes by different

horizontal and vertical scale factors, for accurate overlay with external datasets.

• Implement a principle component analysis and/or sensitivity analysis into the

categorising of OSM tags and Wikipedia articles into different CES groups.

• Utilise the Wikipedia article additional metadata provided by the Geonames API

(i.e. rank) to further the analysis. This was not possible due to time constraints.

Key Findings

• OSM edits and Wikipedia

revisions relating to spiritual

and religion are far lower in the

Brecon Beacons and the Wye

Valley (fig 12) most likely due to

their reliance on religious

buildings mostly found in

populated areas.

* The 2 IDW maps that were inferior to figures 8 and 9 were not included in this poster

Figure 11 - Kernel

densities and hotspots

of the individual CES

categories for Wikipedia,

June 2006 to June 2016