Post on 21-Jan-2018
Driving Style Analysis based on Trip Segmentation.
A Comparative Multi-Technique ApproachMarco Brambilla, Andrea Mauri, Paolo Mascetti
@marcobrambi
Agenda
IntroProblem DefinitionDatasetData Exploration and PreliminariesTrip Segmentation TechniquesValidation Conclusions
Intro: Relevance1.24 million traffic-related fatalities occur annually world wideCurrently the leading cause of death for people aged between 15 and 29 yearsMajority of cases due to improper or risky driving behavior
Source: World Health Organisation (WHO)
Intro: Driving Process
Driving Process: driving a car is a complex task that requires to take informed decisions based on information pertaining different levels such as his own state and other drivers’ behavior.
Intro: Relevant Information
Vehicle’s Status
Contextual Info• Road State• Weather
Conditions• Traffic Info• Road Risk• Traffic
Problem Statement
Data-driven driver profilingwith respect to driving risk
Essentially: Multivariate Time Series Segmentation
Application scenarios in insurance, promotingpay-how-you-drive (PHYD) business models
State of the Art and Challenges
State of the art: many works on identification and recognition of behavioural patterns (line following, accelerations, braking etc) and maneuversrecognition, behavioural scoring, prediction of driver intentions.
Supervised Learning techniques require intensive end expensive gathering process.
Proposed Solution
Unsupervised techniques to profile drivers behaviour based on identified recurrent patternson driving path segmentationComparison of 3 different approaches and use of all of them for consolidated results
1. Unsupervised Segmentation Based on Clustering2. Unsupervised Segmentation Based on HMM3. Unsupervised Topic Extraction
Contextual Scenes
Observed driving behaviours that are repeated in each driver's behaviour and also across different drivers.A reduced representation of the original Multivariate Time Series conveying a simplified characterizationFurther reasoning is then applied
ETL Process3 Steps:
Extract: read collected files and selection of candidate featuresTransform:
Filter and Grouping Features computation
Load: produce a unique dataset
PreProcessing
Transform
Globaldataset.csv
Load
TripFile.csv
Extract
DatasetsCollection Device : Xsens MTi-G-710 (27 users)And cell phones (10 users)
Retrieved Signals : Acceleration measurementsAltitudeGPS PositioningSpeedingOrientation
Mounted in-vehicle aligned with direction of movement.
No Ground truth knowledge
Pre-Analysis 2: Application of Driving Safety Existing AnalysesVaiana et.al. Propose a Driving Safety Diagram based on longitudinal and lateral accelerations analysis.
Aggressiveness Index formulation:(A = Aggressive, S = Safe points)
Graphical representation:
1. Unsupervised Segmentation Based on DP-Means Clustering
Problem: Bayesian nonparametric techniques require expensive sampling methods or variational techniques.
DP-means: proposed by Kulis et. al. revisiting k-means: K-means like objective function + penalty
A new cluster is created whenever a point is farther than λ away from every already existing centroid.
Note:Clustering results depends on data ordering.
Unsupervised Segmentation based on HMM
Goal: identify latent structure given observed data points, assuming existance of Gaussian hidden states.Assign to each observed point the corresponding hidden state.Hidden Markov Models (HMM):
Observation and hidden states
Markovian propertiesContinous observation
Unsupervised Segmentation based on HMM
Training:Baum-Welch EM algorithm to learn model parameters
Decoding:Viterbi decoding to assign to each observed point the most likely hidden state
HMM Results
Also a different variation applied: inertial HMM: lower transition probabilities enforcing state persistence. Sensible for driving.
Topic Extraction ApproachWhat is topic extraction ?
Model topical concepts belonging to a set of textual documents.Data are described as documents and the components are distributions of terms that reflect recurring patterns, name Topics.
Hierarchical Dirichlet Processes (HDPs)soft-clustering technique based on non-parametric Bayesian theory.number of topics is not set a priori, but learned from data.Posteriori probability approximated by Variational Inference algorithm by Wang et.al.
Results:Most relevant topics for each document and terms distribution in each topic.
Big Issue: How to Compare?
1) Point-to-point or point distribution2) Resulting grouping of trips3) Perceived user similarity of trips
Solution 2: Moving from Points to TripsCan we cluster trips based on how observation points have been clustered?
à Simple K-means clustering of trips for each approach. à Comparison of overlap of the different clusters
Coherent with original question: grouping of trips (and thus drivers) by driving behavior
Result of overlap analysis
K-means with K=6 clusters.
DP-means vs. HMM: 74% overlapDP-means vs. Topic: 44% HMM vs. Topic: 48%
Human Validation of Trip Groups
Experts (knowledgeable about driving styles and driving paths recorded) identify possible groups of trips in the dataset
Problem: - Unable to distinguish 6 categories of groups- Only 3 categories are feasible- Best matching 6à3 categories for each method
Conclusions
Three different clustering techniques of driving behavior over trips
-> segmentationClustering of trips based on behavior
-> up to 74% overlap over 6 clusters-> 100% overlap over 3 clusters
User Validation-> 96% precision over 3 clusters
Future Work
About collection process:Gathering process including contextual information (road risk, traffic status, weather conditions)Larger dataset to improve inference performance
About implemented methods:Smarter data ordering for DP-meansRelax independency assumption in HMMImprovements in data discretization process for HDP