Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in...

27
Network Complexity and Spatio-Temporal Data Mining (STDM) Dr Tao Cheng + STANDARD team {[email protected]} Senior Lecturer in GeoInformatics Department of Civil, Environmental and Geomatic Engineering (CEGE) University College London

Transcript of Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in...

Page 1: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Network Complexity and Spatio-Temporal Data Mining (STDM)

Dr Tao Cheng + STANDARD team {[email protected]} Senior Lecturer in GeoInformatics Department of Civil, Environmental and Geomatic Engineering (CEGE) University College London

Page 2: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Outline

•  Nature of Network complexity •  Its challenges for STDM •  Case studies from the STANDARD project •  Future directions for NC and STDM

Page 3: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Challenges - Network Complexity 1) Heterogeneity (structure & performance)

- nonlinearlity - nonstationarity (MAUP problem in GIS)

Great progress in describing structure (e.g. power-laws) of

‘what is’, but how to model and predict nonlinear and nonstationary

performance?

Page 4: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Challenges - Network Complexity 2) Dynamics

- changes in physical structure (nodes & links) - implications for supply/capacity changes - changes in movement patterns on the network (density/flow/speed; behaviour)

- leads to changes in demand Much progress in modelling supply - demand interactions at the

macroscopic level, but - lack of clarity about implications for individual behaviours

and their collective effects; - No readily available tools to demonstrate or capture the

transition from free flow to congestion

Page 5: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Challenges - Network Complexity 3) Interactions & Associations

- spatial (upstream/downstream) - temporal (past/present/future) - spatio-temporal - multiple factors (incidents, weather, big events,..) - multiple networks

We accommodate spatial or temporal associations

(autocorrelations), but -  Fail to integrate treatment of spatio-temporal

autocorrelation simultaneously -  Failure to consider multiple networks

Page 6: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Research Frontiers in Network Complexity 1) Forecasting and prediction

- nonlinearlity & nonstationarity 2) Tools to capture/illustrate the processes

- Emergence and tipping points - Simulating behaviour (macroscopic properties alter because of accumulated microscopic changes)

3) Spatio-temporal dependence and interactions - impact of activities on the network

- interactions between networks

BigData – empirical theory and testing

Page 7: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

•  Short-term and long-term journey time prediction –  STARIMA; ANN; Kernel-based approach

•  Early detection of traffic congestion –  clustering: STC; STSS

•  Interactive visualization of journey time reliability and traffic congestion –  2D (hotspot); 3D(wall-map; isosurface)

•  Simulation of non-recurrent congestion –  Agent-based simulation

•  Intervention Analysis (weather, tube strike, road works) –  regression

STANDARD – Spatio-Temporal Analysis of Network Data and Route Dynamics understand traffic congestions in space-time

Page 8: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Space-time prediction & forecasting The challenge lies in the non-stationary (heterogeneity) and non-linearity of space-time data.

Statistical Approaches •  STARIMA models •  space-time geostatistical

models •  spatial panel data models •  space-time GWR How to calibrate the spatio-temporal autocorrelations is the bottleneck.

Machine Learning Approaches •  artificial neural networks

(ANNs) •  self-organized maps •  Genetic algorithms •  support vector machines

(SVMs) •  Kernel-based approach The interpretability of machine learning is low

Page 9: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Real  &me  traffic  forecas&ng  

9  

James Haworth  

Page 10: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

10  

Interval   Naïve   ARIMA   STARIMA   LSTARIMA  5  minutes   49.4   47.4   55.9   46  15  minutes   74.7   68.7   89.1   67.3  30  minutes   93.2   82.1   109   80  

Results  –  Root  mean  squared  error  (seconds/kilometre)  

James Haworth & Jaiqiu Wang: Space-Time Modelling and Prediction  

Page 11: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Space-time clustering To extract meaningful patterns (clusters)

•  To detect outliers or emerging phenomena (epidemic outbreaks or traffic congestion)

•  Considering the spatial, temporal and thematic attributes seamlessly and simultaneously, and the dynamicity in the data is the most difficult challenge in spatio-temporal clustering

•  Spatio-temporal scan statistics (STSS) sheds lights on this aspect

•  Efforts are needed to improve computation efficiency and to reduce the false alarm rate of STSS

Page 12: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Clusters of Congestion 25 May 2010 – State Opening of Parliament

Berk Anbaroglu - STSS for early detection of non-recurrent traffic congestion

Page 13: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Space-time visualisation Explores the patterns hidden in the large data sets

•  using advanced (analytical) visualization and animation –  static 2D maps –  3D wall maps and isosurface (hotspots in space-time)

•  Tools: “Visual Analytics” and “Geovisual Analytics” •  Still, real-time visualization of dynamic processes is still very

challenging due to large volume and high dimensions of the data. •  Methods are needed to show evolution and dissipation in space

and time simultaneously (e.g. crime or traffic congestion)

Page 14: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Space-Time Visualisation: data -> process, story traffic congestion in space-time (1)

Cheng, Emmonds, Tanaksaranond, Sonoiki (2010), Multi-Scale Visualisation of Inbound and Outbound Traffic Delays in London, The Cartographic Journal, 47: 323–329.

Page 15: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Visualization of traffic congestion in space-time (2)

3D Wall maps of inbound roads on 6th – 7th September 2010

Page 16: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Top view

Side view

Isosurface

Page 17: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Visualising Congestion Build-up in London 3D Wall Map Travel Time Interactive Visualization Tool

Garavig Tanaksaranond – Space-Time Visualisation of Traffic Congestion

Page 18: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect
Page 19: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect
Page 20: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

 •  Understanding formation of congestion

through the behaviour of individual drivers •  How do drivers react when faced with road

closure? •  Depends on the urban environment,

individual knowledge of the network and conditions, and behaviour of others

•  Behaviour of individuals (microscopic behaviour) influences the formation and movement of congestion (macroscopic phenomena)

(Manley & Cheng, 2010)

Space-­‐Time  Mul&-­‐Agent  Simula&on    

SPREAD  OF  CONGESTION  

Page 21: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Regent’s Park

Hyde Park

Saturation 0 – 0.2

0.2 – 0.4

0.4 – 0.5

0.5 – 0.6

0.6 – 0.7

0.7 – 0.8

0.8 – 0.9

0.9 – 1.0

1.0 – 1.2

1.2 – 1.5

> 1.5

Ed Manley – Agent-based Simulation

Page 22: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Machine    Learning  

LocaHon  InformaHon  

GPS  

Mode  of  Transport  &  Stops  

h"p://www.homepages.ucl.ac.uk/~ucesadb/video.html  

GPS  Tes=ng  data:  110  par&cipants,  2  Months/  par&cipant  ,  20  second  collec&on  rate  All  par&cipants  based  in  Greater  London  

Adel Bolbol Fernandez - Understanding Travel Behaviours from GPS Data Logs

Page 23: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect
Page 24: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Future Directions of STDM/NC (1) •  New methods and theory are needed for mining crowd sources that

contributed by citizens and volunteers including social media data –  often extremely noisy, biased, and nonstationary, e.g. trajectory data –  Method needed to combine text mining with STDM –  This area is relevant to the recent development of citizen sciences and

VGI in particular.

•  Theory and methods need to be developed to extract meaningful patterns from those individual sensors and put them under the framework of networks and network complexity such as transport and social-networks made up of those individual.

•  Under network, the interaction and dynamic flows should be considered in mining spatio-temporal patterns.

•  This aspect is relevant to the complexity theory and network dynamics in particular.

Page 25: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Future Directions (cont.) •  STDM for emergency and tipping points, i.e. how to generate actionable

knowledge, i.e. finding the emergent patterns and tipping points of economics and epidemics?

•  It is important to find outliers, but more important is finding the critical points before the system breaks down so that mitigating action can be taken to avoid the worst scenarios such as traffic congestion and epidemic transmission.

•  Another challenge of STDM is how to calibrate, explain and validate

the knowledge extracted. •  A good example of this is the calibration of spatial (or spatio-temporal)

autocorrelation. Higher order spatial autocorrelation models have been developed, but the pitfalls have also been found (LeSage and Pace 2011).

•  This makes machining learning more promising in future STDM.

Page 26: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Future Directions (cont.) •  grid computation and cloud computation

–  Key for scaling the algorithm to large network •  Open sources (data + software + algorithms) •  Online computation •  Real-time computation

•  More systematic applications –  CPC

•  …

Page 27: Network Complexity and Spatio-Temporal Data …data is the most difficult challenge in spatio-temporal clustering • Spatio-temporal scan statistics (STSS) sheds lights on this aspect

Acknowledgements      

hKp://standard.cege.ucl.ac.uk  

+  Dr  Andy  Chow    +  Colleagues  in  TfL