Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a...

25
Online Maritime Abnormality Detection using Gaussian Processes and Extreme Value Theory Mark Smith * , Steven Reece , Stephen Roberts , Iead Rezek * ISSG, Babcock Marine & Technology Division, Devonport Royal Dockyard, Plymouth, United Kingdom Department of Engineering Science, University of Oxford, Oxford, United Kingdom

Transcript of Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a...

Page 1: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

Online Maritime Abnormality Detection using Gaussian Processes and Extreme Value Theory

Mark Smith*, Steven Reece†, Stephen Roberts†, Iead Rezek†

* ISSG, Babcock Marine & Technology Division, Devonport Royal Dockyard, Plymouth, United Kingdom † Department of Engineering Science, University of Oxford, Oxford, United Kingdom

Page 2: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

Outline •  Introduction •  Current Techniques •  The Gaussian Process Extreme Value Approach

- The Gaussian Process -  Sequential Gaussian Process Updates -  Extreme Value Theory

•  Applications –  Synthetic Data –  Vessel Track

•  Comparison with Kalman Filters •  Conclusion

26/03/2013 Daniel García Ulloa

Page 3: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

Introduction

Daniel García Ulloa 26/03/2013

Page 4: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

Introduction

•  Maritime traffic is large and complex, consisting of dense volumes of (mostly legal) ship traffic.

•  Spotting abnormalities would help reduce smuggling, terrorism, illegal fishing, etc.

•  The goal is to detect anomalous vessels using an automated approach.

Daniel García Ulloa 26/03/2013

Page 5: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

Introduction

Fundamentally, an anomaly is a data point which stands out in contrast to the other data points around it. Anomalies could be: •  deviations from standard route, •  unexpected port arrival, •  close approach, •  zone entry, •  excessive speed. Daniel García Ulloa 26/03/2013

Page 6: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

Current Techniques

An effective technique should recognize and model data points that occur due to such anomalous events and distinguish these from outliers associated with the tails of the reference distribution of non-anomalous data. Context is also important.

Daniel García Ulloa 26/03/2013

Page 7: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

Current Techniques

•  Current existing techniques include: Neural Networks, Bayesian Networks, Support Vector Machines, and Kalman filters.

•  They all have two phases: -  Accomodation: Creating a model of normality -  Discordancy: Use a metric to identify a point as

abnormal.

Daniel García Ulloa 26/03/2013

Page 8: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

The GP-EVT Approach

Gaussian Process y = f(x) ~ GP(m(x),k(x,x))

Daniel García Ulloa 26/03/2013

Dependent Independent Mean function Covariance

k(x,x) is a function of the distance between the independent variables. In particular, let r=|xp – xq| be the Euclidean distance. Then possibilities for k(x,x) are: Squared Exponential:

Matérn 3/2 Matérn 1/2

Page 9: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

The GP-EVT Approach

The amplitude (σ0) and output (λ) depend on the dynamics of the vessel, so they are learnt from an anomaly free training data. They also assume iid gaussian noise ε2, so the full covariance model is:

Daniel García Ulloa 26/03/2013

Predictions can be made about the function values, f (x*) at any location x*

Page 10: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

The GP-EVT Approach

Daniel García Ulloa 26/03/2013

Where:

Page 11: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

The GP-EVT Approach

Sequential Updates Inverting V is expensive (O(n3)), but V(xp,xq)=V(xq,xp), so V is Hermitian. Moreover, k(x,x)> 0, so V is non-negative. Therefore, V=RTR

Daniel García Ulloa 26/03/2013

Where

Page 12: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

The GP-EVT Approach

•  The Cholesky factorization can be computed iteratively:

Daniel García Ulloa 26/03/2013

Page 13: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

The GP-EVT Approach

Extreme Value Theory The theory for spotting the anomalies focuses on the statistical behaviour of Mn = max{X1, . . . , Xn} where X1,...,Xn is a sequence of i.i.d. variables with a distribution function F

Daniel García Ulloa 26/03/2013

Page 14: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

The theory states that F should be either Gumbel, Fréchet or Weibull, but due to constrains in the GP, they used Gumbel:

Daniel García Ulloa 26/03/2013

The GP-EVT Approach

Which has a scale parameter a and a location parameter b

Page 15: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

The GP-EVT Approach

Daniel García Ulloa 26/03/2013

Page 16: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

The GP-EVT Approach

Daniel García Ulloa 26/03/2013

We can now obtain the extreme quantiles:

The value of p acts as a novelty threshold, below which a test point is classified “abnormal”. They set p= 0.95

Page 17: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

The GP-EVT Approach At some arbitrary point in the future, say x*, we can interrogate the GP and compute the predictive (Gaussian) distribution at that point, conditional on the trajectory’s past samples. This predictive distribution, which now features a context (time) dependent mean, f* , and variance, Var[f* ], allows rescaling of the extreme event quantile e:

Daniel García Ulloa 26/03/2013

Page 18: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

Application

Synthetic Data Matérn 3/2, σ0 = 1, λ=2, 1% noise, 1000 samples. Anomalies were generated by offsetting randomly selected samples that were previously drawn from the GP by a fixed offset value.

Daniel García Ulloa 26/03/2013

Page 19: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

Application

Daniel García Ulloa 26/03/2013

The continuous line shows the predicted mean function and the grey areas show the EVT bound of the GP predictive distribution for p = 0.95. The dashed line shows the error bound produced if we consider the 95% bound from the mean function

Page 20: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

Application

Vessel Track Anomaly Detection Marinetraffic.com Feature Extraction: x0= Beginning of vessel track. Subsequent data points take into consideration both distance and time using Haversine formula to convert GPS data to a 1D vector. Covariance function: After trying several kernels, they decided to use Matérn 3/2 due to goodness of fit and robustness.

Daniel García Ulloa 26/03/2013

Page 21: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

Application

Daniel García Ulloa 26/03/2013

Page 22: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

Comparison with Kalman Filters

The traditional approach uses a Kalman filter (KF) to model the normal behaviour of the ship and then determines that the data is anomalous if it is more than a fixed number of standard deviations from the mean. They further investigate both a traditional KF using the standard deviation approach to exclude anomalies and a KF which uses the EVT in a manner similar to the GP. Daniel García Ulloa 26/03/2013

Page 23: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

Comparison with Kalman Filters

Daniel García Ulloa 26/03/2013

Accuracy for KF using the near constant velocity model (with and without EVT) and GP using a Matérn 3/2 model (again with and without the EVT).

However, the accuracy could be matched by replacing the near constant velocity model in KF by a Markovianised Matérn model.

Page 24: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

Conclusion •  EVT is successful for anomaly detection. •  This method can be updated online to improve accuracy. •  The sequential update of the Gaussian Process

covariance matrix bypasses the need for inverting massive matrices

•  The method is capable of detecting anomalies that resemble mooring or drifting, and unexpected departures from regular movements.

•  The choice of Gaussian Process kernel function becomes less critical with increasing amounts of data, but for smaller sample sizes, the kernel function is critical

• 

Daniel García Ulloa 26/03/2013

Page 25: Online Maritime Abnormality Detection using Gaussian ... · Mark Smith*, Steven Reece ... and a location parameter b . The GP-EVT Approach Daniel García Ulloa 26/03/2013 . ... into

Detecting anomalies could help prevent catastrophes…

Daniel García Ulloa 26/03/2013