Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated...

16
Safety & Behavior Data & Analytics

Transcript of Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated...

Page 1: Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated data set NO data preparation –sourcing, profiling, cleansing, structuring, processing

Safety & Behavior Data & Analytics

Page 2: Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated data set NO data preparation –sourcing, profiling, cleansing, structuring, processing

Objective

Dataflix Inc. | Confidential & Proprietary | 2

1. Safety & Behavior Data Set Overview

2. SAB Data – What’s included

3. SAB Data Use Cases

4. SAB Analytics Demo

5. Next Steps

Page 3: Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated data set NO data preparation –sourcing, profiling, cleansing, structuring, processing

Safety & Behavior Data Overview

Dataflix Inc. | Confidential & Proprietary | 3

What is SAB Data set?Safety & Behavior data-set is a curated dataset that contains accidents data for the

last 42 years sourced from NHTSA and its sub-agencies.

Curated Data-set In the cloud Access via ODBC & API

Why SAB Data set? SAB Analytics

Jump start analytics and data science initiatives with curated data set

NO data preparation – sourcing, profiling, cleansing, structuring, processing and storing

Analytics built on SAB data set, and delivered through cloud. Demo in the coming slides

Page 4: Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated data set NO data preparation –sourcing, profiling, cleansing, structuring, processing

SAB Data – What’s included

Dataflix Inc. | Confidential & Proprietary | 4

What’s included in SAB Data set?

Ø SAB Data set is a curated data set built on the data from FARS NHTSA. Ø Data is profiled from Police Accident Report, Death Certificate, Medical Examiners,

State Driver and Vehicle Registration etc.Ø PII not included.Ø Data includes only fatalities, at least one death within first 30 days of the accident.Ø Three main segments of data from 1975 - accidents, vehicle and personØ Core data in SAB -

Ø Accidents and all of its attributes including weather conditionsØ Manner of collision Ø By location – rural vs. urban, road type, work zone, etc.Ø Vehicle info before and after the accident – size, type, people involved, etc.Ø Person/driver behavior (Ex: Alcohol, distracted, by age and gender, etc.)Ø And all other attributes

Ø Speed limit lawsØ Air bag effectivenessØ Vehicle safety designs and many more

Page 5: Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated data set NO data preparation –sourcing, profiling, cleansing, structuring, processing

SAB Data – From Where

Dataflix Inc. | Confidential & Proprietary | 5

From where do we get SAB Analytics

Data

Ø Every year NHTSA (National Highway Traffic Safety Administration a Unit of US Department of Transportation) releases data about the accidents where at least there is one fatality within first 30 days of accident

Ø FARS is a census of fatal motor vehicle crashes with a set of data files documenting all qualifying fatalities that occurred within the 50 States, the District of Columbia, and Puerto Rico since 1975.

Ø The data is available from https://www.nhtsa.gov/ . FTP Location to download the files ftp://ftp.nhtsa.dot.gov/fars/

Ø The files are available in various formats like csv, sas etc. All the latest files from 2015 will be available in csv format.

Page 6: Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated data set NO data preparation –sourcing, profiling, cleansing, structuring, processing

SAB Data – What kind of data?

Dataflix Inc. | Confidential & Proprietary | 6

What kind of Data

Ø NHTSA gets the data primarily from Police Accident Report (PAR), Death Certificate, State Medical Examiners, State Driver and Vehicle Registration records and Emergency Medical Services records.

Ø To qualify as a FARS case, the crash had to involve a motor vehicle traveling on a trafficway customarily open to the public, and must have resulted in the death of a motorist or a non-motorist within 30 days of the crash.

Ø Database contains information about the Accidents and all off its attributes, manner of collision, location, Vehicle Information, Person behavior, Safety Equipment, Violation, Speed Laws and many more. A total of 20 tables are present.

Ø NHTSA also provides attributes/master data for the data above. These attributes/master data is found in the user manual and not in the form of files or tables. Need to manually fill attribute tables and fill them with values. Ex: NHTSA CSV files will have “USA State” Information like 1,2,3 … so on. Need to map 1 to Alabama, 6 to California etc.

Page 7: Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated data set NO data preparation –sourcing, profiling, cleansing, structuring, processing

SAB Data – How do we process data

Dataflix Inc. | Confidential & Proprietary | 7

Technology Stack used Ø CSV files are stored in Google Cloud Storage. Google Cloud Storage acts as persistent layer. There is no modifications in these file.

Ø Data from Persistent layer is loaded to Google BigQuery Tables. This is the initial layer tables and acts as Staging Layer.

Ø From Staging Layer, data is mapped with several Attribute/MasterData tables and the resulting data is stored in Reporting Layer. This is the layer customers can access. This is the Curated dataset.

Ø This above process doesn’t involve and modification of data. Data is As-Is from NHTSA. Just adding attributes for easily reading and reporting.

Ø Security: All the tables from persistent layer to Curated dataset layer are tightly secured by using Google Cloud Platform security standards.

Google Cloud PlatformGoogle Cloud StorageGoogle BigQuerySQL

Page 8: Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated data set NO data preparation –sourcing, profiling, cleansing, structuring, processing

SAB Data – Data Volumes and Future Loads

Dataflix Inc. | Confidential & Proprietary | 8

Data History and Volumes Ø There are 20 tables. Data is present from 1975 till 2017. NHTSA started giving the data for 3 tables (Accidents, Vehicle and People) initially and added more tables and columns to these tables.

Ø The size of the curated dataset is around 8.5GB and total no of records in all 20 tables would be around 9 Million.

Ø Plan for 2018 and future data:2018 data will be released in August 2019 along with the Analytic Document. There might be some columns added or modified. Based on the Analytical Document, backend queries will be updated and the tables will be updated with 2018 data. Same exercise follows for future years as well.

Total No of Tables : 20

Data from 1975 - 2017

Page 9: Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated data set NO data preparation –sourcing, profiling, cleansing, structuring, processing

SAB Data – Important and Key Data

Dataflix Inc. | Confidential & Proprietary | 9

Key Data

There are three tables that are most critical:

Ø Accidents: Contains the nature of accident like no of cars involved, weather conditions, timing and many more.

Ø Vehicle: Details about the vehicle like Make, Body, Model year, impact area of the vehicle, fire explosion from vehicle etc.

Ø Persons: Contains all the information about the persons in the vehicle, driver past history, alcohol consumption details etc.

Page 10: Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated data set NO data preparation –sourcing, profiling, cleansing, structuring, processing

SAB Data Use case

Dataflix Inc. | Confidential & Proprietary | 10

What can the data be used for?

Ø Jump start analytics and data science initiativesØ Historical vehicle safety and driver behavior analysisØ Competitor analysisØ Product design and liability risk analysisØ Warranty and Service Parts implicationsØ Large-truck safety analysisØ Air bag effectivenessØ General use cases

Ø Alcohol-related legislationØ Motorcycle helmet legislationØ Restraint usage legislationØ Speed limit lawsØ Vehicle safety designs

Page 11: Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated data set NO data preparation –sourcing, profiling, cleansing, structuring, processing

Objective

Dataflix Inc. | Confidential & Proprietary | 11

1. Safety & Behavior Data Set Overview

2. SAB Data – What’s included

3. SAB Data Use Cases

4. SAB Analytics Demo

5. Next Steps

Page 12: Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated data set NO data preparation –sourcing, profiling, cleansing, structuring, processing

Geo & Demographics

Page 13: Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated data set NO data preparation –sourcing, profiling, cleansing, structuring, processing

Driver Behavior

Page 14: Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated data set NO data preparation –sourcing, profiling, cleansing, structuring, processing

Gender Analysis

Page 15: Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated data set NO data preparation –sourcing, profiling, cleansing, structuring, processing

Crash Analysis

Page 16: Safety & Behavior Data & Analytics...Jump start analytics and data science initiatives with curated data set NO data preparation –sourcing, profiling, cleansing, structuring, processing

Let’s Talk

Dataflix Inc. | Confidential & Proprietary | 16

Data-Driven Transformation, No Limits

Susan CoxVP | Business Development

[email protected]

www.dataflix.com