Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots...

20
Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes. Giuseppe Borruso, Gabriella Schoier Department of Economic, Business, Mathematic and Statistic Sciences - DEAMS, University of Trieste, Via Valerio, 4/1, 34127 Trieste, Italy {gabriella.schoier,giuseppe.borruso}@econ.units.it Geographical Analysis, Urban Modeling, Spatial Statistics GEOG- AN-MOD 11

description

Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.Giuseppe Borruso, Gabriella Schoier - University of Trieste

Transcript of Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots...

Page 1: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

Individual movements and geographical data mining. Clustering algorithms for highlighting

hotspots in personal navigation routes.

Giuseppe Borruso, Gabriella SchoierDepartment of Economic, Business, Mathematic and Statistic Sciences -

DEAMS, University of Trieste,Via Valerio, 4/1, 34127 Trieste, Italy

{gabriella.schoier,giuseppe.borruso}@econ.units.it

Geographical Analysis, Urban Modeling, Spatial Statistics GEOG-AN-MOD 11

Page 2: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

Introduction• The rapid developments in the availability and access to spatially referenced information in a

variety of areas has induced the need for better analysis techniques to understand the various phenomena.

• In particular spatial clustering algorithms, which groups similar spatial objects into classes, can be used for the identification of areas sharing common characteristics.

• The aim of this paper is to present a density based algorithm for the discovery of clusters of units in large spatial data sets.

• In particular the analysis represents a first insight into a wealth of geographical data collected by individuals as activity dairy data, these representing the routes a set of individuals drive in their daily movements for reasons connected to work, children picking, free time, rest.

• In this analysis the attention is drawn on point data sets corresponding to GPS traces driven along a same route in different days.

• Our aim here is to explore the presence of clusters along the route, trying to understand the origins and motivations behind that in order to better understand the road network structure in terms of ’dense’ spaces along the network, these representing road congestion, rather than the presence of junctions or other factors affecting individual mobility.

• In this paper the attention is therefore focused on methods to highlight such clusters and see their impact on the network structure.

• Spatial clustering algorithms are examined (DBSCAN) and a comparison with other non-parametric density based algorithm (Kernel Density Estimation) is performed. A test is performed over the urban area of Trieste (Italy).

Page 3: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

Geographical Data Mining• In recent years geographic data collection devices linked to location-aware

technologies such as the global positioning system allow researchers to collect huge amounts of data.

• Other devices such as cell phones, in-vehicle navigation systems and wireless Internet clients can capture data on individual movement patterns.

• The process of extracting information and knowledge from these massive georeferenced databases is known as Geographic Knowledge Discovery (GKD) or Geographical Data Mining.

• Real-time environmental monitoring systems such as intelligent transportation systems and location-based services are generating geo-referenced data in the form of dynamic flows and space-time trajectories.

• The complexity of spatial objects and relationships in geo-referenced data, as well as the computationally intensity of many spatial algorithms, means that geographic background knowledge can play an important role in managing the GKD process.

Page 4: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

Spatial Clustering Algorithms• DBSCAN Density-Based Spatial Clustering of Applications with Noise and

its extensions judges the density around the neighborhood of an object to be sufficiently dense if the number of points within a distance Epscoord of an object is greater than MinPts number of points.

• It classifies regions with sufficiently high density into clusters, and discovers clusters of arbitrary shape. The density based notion of clustering states that within each cluster the density of the points has to be significantly higher than the density of points outside the cluster

• where D is a data set of points.• The distance function determines the shape of the neighborhood. MinPts is

the minimum number of points that must be contained in the neighborhood of that point in the cluster..

Page 5: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

Kernel Density Estimation

• The function creates a density surface from a distribution of points (events) in space, providing an estimate of events withn its searching function according to their distance from the point where the estimate is computed.

• =‘moving three dimensional functions that weights events within its sphere of influence according to their distance from the point at which the intensity is being estimated’ (Gatrell et al. 1996).

Study RegionR

Event si

Location s

Kernel k( )

Bandwidth τ

n

i

issks

12

1

Gatrell A., Bailey T, Diggle P., Rowlingson B. (1996): “Spatial Point Pattern Analysis and its Application in Geographical Epidemiology”. Transactions of the Institute of British Geographers 21: 256-74.

Page 6: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

The Data and the Study Area• The analysis was performed over an activity dairy dataset as registered by

individual users relying on a commercial GPS receiver for personal car navigation. The full dataset consists of journeys registering home-work, home-school, work free time locations (gym, theatres, etc.) collected over a 6 months period (July 2010 - January 2011) on a nearly daily basis and mainly based on car usage. The data were mainly collected in the Province of Trieste (Italy) although travels in the region, abroad or in other Italian cities were recordered.

• In this initial stage of the research, the attention is dedicated to highlighting hot-spots in a linear dataset, as that represented by GPS traces.

• Hotspots can be of interest for highlighting braking areas along a route and therefore those critical nodes and junctions in an urban road network characterized by traffic or other forms of congestion.

• Also, the aim is that of allowing the realization of maps of fluidity or accessibility given the users travel time in certain areas.

Page 7: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

A data sample: October data (5, 6, 7, 8, 11, 12)

Page 8: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

Anticipation – density analysis on different routes & days

Page 9: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

A data sample: October data (5, 6, 7, 8, 11, 12)Sample POIs

University campus

Gym

Home (2)

Home (1)

Nursery

Page 10: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

• Selected route on different days (on same route)

• 23, 24, 30 August• 1 September

University campus

Home (1)

Nursery

Page 11: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

The Analysis

• DBSCAN• In the application

Epscoord=0.0004 as the bandwidth is 44 m.

• while MinPts= 15 in order to have meaningful clusters (=15 sec).

• KDE• quartic kernel function, • with a 5 m cell resolution • 44 m bandwidth.• Tests:

– 25 m bandwidth (good for highlighting hotspots at small scale)

– 100 m (over smoothed too much the data)

• Euclidean (not network) density estimation

1 059 GPS points collected on 1° September 2010 (Home – Nursery journey)

Page 12: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

Results from DBSCAN (right) and KDE (left)

Page 13: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

The results

• The results from the DBSCAN are six clusters visible from the numbers. They are represented with different colors: cluster 1 with red, cluster 2 green, cluster 3 blue, cluster 4 light blue, cluster 5 pink and cluster 6 yellow.

• The results from the density analysis are visible as darker areas from light grey to black in the map. Clusters here visible as hotspots can be seen as black areas, while denser dark grey areas can be interpreted as slower segments of the journey.

• Non-peak areas deriving from KDE analysis, as the dark grey areas along the journey.

• Slow parts of the journey when compared to the rest of it.• When speed is slower than in the normal journey, and also in other parts of

the route, in urbanized areas

Page 14: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.
Page 15: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.
Page 16: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

University campus – entrance and bus stop

Page 17: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

Junction – state roads

Page 18: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

Nursery – end of the journey

Page 19: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

Conclusions and further development

• The attention is drawn on point datasets corresponding to GPS traces driven along a same route in different days.

• The attention is focused on methods to highlight such clusters and see their impact on the network structure in terms of ’dense’ spaces along the network, these representing road congestion, rather than the presence of junctions or other factors affecting individual mobility.

• The results from the application of both the DBSCAN algorithm and the KDE one, using the same searching radius,are comparable. They show the hotspots in the same places when the speed is slower than in the normal journey, in urbanized areas (beginning and ending of the journey) as well as in out of town road segments in correspondence to major bends.

• understand pattern of movements of a set of individuals in a same day, as well as differences in ’city uses’ in different times of the day, and therefore highlight a city’s ’variable geometry’ in time and space.

• Also, the presence of hotspots replicated on different journeys can allow highlighting the different speeds in an urban environment and also their directions, therefore allowing to understand urban metrics and shapes in a different way than that usually allowed by more simplified data sets.

Page 20: Individual movements and geographical data mining. Clustering algorithms for highlighting hotspots in personal navigation routes.

Thank you for your attention!

Questions?