FARS (Fatality Analysis Reporting System) Datamining

14
Traffic Fatality Data Mining Final Presentation For IS 383 Knowledge Discovery By Taimur Hassan

description

some interesting facts about highway fatalities that I found using WEKA

Transcript of FARS (Fatality Analysis Reporting System) Datamining

Page 1: FARS (Fatality Analysis Reporting System) Datamining

Traffic Fatality Data Mining

Final Presentation

For

IS 383

Knowledge Discovery

By

Taimur Hassan

Page 2: FARS (Fatality Analysis Reporting System) Datamining

Overview

1. Introduction

2. Understanding business data

3. Prepare and Process data

4. Transform data

5. Data Mining Results

6. Analysis and Interpretation

7. New Knowledge

Page 3: FARS (Fatality Analysis Reporting System) Datamining

Introduction & Understanding Business Data

• KD assignment involved traffic safety• National Highway Traffic Safety

Administration (NHTSA) collects data about any traffic accident that involves one or more fatality

• FARS (Fatality Analysis Reporting System) has been available to public since 1975

• FTP site allows data to be downloaded in multiple formats including database tables (DBF) and Excel Spreadsheet

Page 4: FARS (Fatality Analysis Reporting System) Datamining

Prepare and Process Data

• The agency had done all the preprocessing work• Dataset was available as three separate tables:

– Accident: Contained details about the accident itself– Person: Contained most details about the person injured or

killed involving the accident– Vehicle: Contains data about the vehicle type, make/model,

VIN etc• NO personal information was provided, only information to help

in analysis• A total of 198 fields of data were distributed amongst the three

tables, some showing up in multiple tables• The data was ready to be analyzed

Page 5: FARS (Fatality Analysis Reporting System) Datamining

Transformation of Data

• Values of attributes were coded in numeric form• A user guide was provided with dataset to interpret all the

values of the codes• Missing data values were encoded as such e.g. ‘9’ meant

‘unknown’• Some interesting attributes were separated to compare and run

through algorithms and to save memory and time• In order to consolidate results, some attribute values were

grouped. An example:– AIR_BAG variable, which had about 33 distinct values, however,

they could be consolidated to simple values such as NOT_DEPLOYED, DEPLOYED, NOT_AVAILABLE

– This allowed a better look at the relationship between AIR_BAG and FATALITY (discussed later)

Page 6: FARS (Fatality Analysis Reporting System) Datamining

Results: Restraint and Fatality• Attributes Seat Belt Usage

(REST_USED) and Fatality were derived from the Person table and tested through the Apriori algorithm in Weka for associations.

•  The Seat Belt Usage attribute had the following values:

– 00 None Used/Not Applicable– 01 Shoulder Belt – 02 Lap Belt – 03 Lap and Shoulder Belt – 04 Child Safety Seat – 05 Motorcycle Helmet – 06 Bicycle Helmet – 08 Restraint Used - Type Unknown – 13 Safety Belt Used Improperly – 14 Child Safety Seat Used

Improperly – 15 Helmets Used Improperly – 99 Unknown 

• Fatality is a boolean value. The results revealed four interesting rules:

•Fatality is a boolean value. The results revealed four interesting rules:

Conclusion: We can see from the results that usage of only a shoulder belt is linked to increased mortality rate than usage of it and a lap belt or lap belt alone. It can be hypothesized that a shoulder belt without a lap belt may present a danger to occupants during accidents, it can lead to occupants choking when forced against the belt

Page 7: FARS (Fatality Analysis Reporting System) Datamining

Results: Rollover and Make/Model

• Derived from the Vehicle table, Make Model and Rollover vehicles were tested using the Apriori algorithm to test the associations.

• The values of MAKE_MODEL variable was derived from the merging of values in two separate variables MAKE and MODEL.

• The ROLLOVER variable is a boolean value indicating the vehicle’s orientation after the accident.

• Five rules were found to be very helpful in understanding the tendency of certain make/models to rollover more than others during an accident.

Conclusion: Upon further details on the particular make/models, we find that they are all in the category of light-trunk vehicles.

• From safety reviews we find that center of gravity can affect chances of rollover

• These cars have a high center of gravity with a small wheelbase, making them more vulnerable than a car.

Page 8: FARS (Fatality Analysis Reporting System) Datamining

Results: Air Bag & Fatality

• Two variables AIR_BAG and Fatality produced very interesting results when checked for associations by Apriori. AIR_BAG variable has the values:

• DEPLOYED (The airbag(s) deployed for the victim’s side)

• NOT_DEPLOYED (There was an airbag for the seat, but for various reasons, it did not deploy)

• NOT_AVAILABLE (If motorcycle or late model car etc)

Conclusion: However, we see that for the age groups listed above, it produces an opposite result.

Page 9: FARS (Fatality Analysis Reporting System) Datamining

Results: Pedestrian & State

• Apriori on both the STATE and PEDESTRIAN

• PEDESTRIAN values:– If pedestrians

injured/killed = YES else NO

• we find six states most likely to have pedestrians injured or killed during a vehicle accident.

Conclusion: Washington DC and New York have the highest pedestrian injuries per accident. However, as DC is a city, it makes sense that pedestrians are involved in almost 29% of accidents. However, New York and Hawaii pedestrian accidents still leave room for further investigations into the causes.

Page 10: FARS (Fatality Analysis Reporting System) Datamining

Results: Owner & Drunk Driving

• Apriori on the Owner [self-registered, biz/government, rental etc] and Drunk driving variable found an interesting, but predictable rule.

• Only 6% of fatalities involving Biz/Gov vehicles involved drinking.

Conclusion: From the second statistic, we see that about 35% of all drinking and driving fatalities involved victims driving vehicles not owned by them. Therefore, it can indicate that vehicles should be lent to trustworthy people.

Page 11: FARS (Fatality Analysis Reporting System) Datamining

Results: State & Weather

• Association between states and weather (boolean) revealed that fatalities involving least adverse weather such as rain, snow, fog etc was in Nevada, New Mexico and California.

• The states that reported fatalities involving bad weather were New Jersey, Louisiana and Massachusetts.

• This is not to say that bad weather caused the fatality.

• Further research can reveal further information such as time of day, highway type etc.

Conclusion: It makes sense that Nevada, New Mexico and California being desert-like states would have the lowest incidences of fatality involving adverse weather.

•It is revealing that New Jersey would have the most fatalities of states that occurred in bad weather.

•Further investigation can lead to initiatives that improve the rate of fatality during adverse weather conditions.

Page 12: FARS (Fatality Analysis Reporting System) Datamining

Results: Pedestrian & Light

• Apriori testing with variables HIT_RUN (hit or run), LGT_COND (light condition at time of accident) and PED (pedestrian injured/killed or not)

• Reveals that pedestrians tend to be involved most when light conditions are dark or dark, but lighted (almost 62% combined).

• Hit and and runs tend to happen also during the dark or dark/lighted condition, especially when a pedestrian is involved (70-72%)

Conclusion: we can conclude that being a pedestrian during such hours puts one in great risk of being struck by a vehicle.

• Further research questions can answer questions about the types of highways most likely to have pedestrian accidents.

• Another very important fact that police authorities can predict is that hit and run accidents will very likely involve dark conditions and pedestrians being hit.

•An explanation into the Hit and Run behavior can be that after such an accident drivers may feel it easy to leave the scene as they may think (and rightly so) that people would not have seen what happened or that they may not be able to identify the driver or vehicle.

Page 13: FARS (Fatality Analysis Reporting System) Datamining

Decision Tree: Rollover• Attributes:

– Make of vehicle

– Body type of vehicle

– Travel speed at time of accident

– The method of avoidance

• BRAKE, STEERING, BRAKE + STEERING, NOT_USED, OTHER.

• Class: ROLLOVER (YES or NO)

• The decision tree would help predict what kind of vehicle types are prone to rollover at certain speeds

• What maneuvers can be used to prevent a rollover from occurring.

• The results showed that for 82% of instances, the decision tree proved correct.

• BODY_TYPE = CAR: NO (26167.0/4134.0)• BODY_TYPE = LIGHT_TRUCK• | TRAV_SP = 30-59_MPH: NO (4164.0/854.0)• | TRAV_SP = 75-96_MPH: YES (685.0/211.0)• | TRAV_SP = 60-74_MPH• | | AVOID = BRAKES: NO (143.0/55.0)• | | AVOID = STEERING: YES (446.0/149.0)• | | AVOID = NOT_USED: NO (752.0/303.0)• | | AVOID = STEER_AND_ BRAKES: NO (175.0/80.0)• | TRAV_SP = PARKED: NO (423.0/14.0)• | TRAV_SP = BELOW_30: NO (775.0/63.0)• | TRAV_SP = 96+_MPH• | | AVOID = BRAKES: NO (371.0/94.0)• | | AVOID = STEERING: YES (837.0/387.0)• | | AVOID = NOT_USED: NO (2192.0/459.0)• | | AVOID = STEER_AND_ BRAKES: NO (318.0/142.0)• BODY_TYPE = VAN• | TRAV_SP = 30-59_MPH: NO (821.0/130.0)• | TRAV_SP = 75-96_MPH: YES (85.0/22.0)• | TRAV_SP = 60-74_MPH• | | AVOID = BRAKES: NO (33.0/10.0)• | | AVOID = STEERING: YES (54.0/20.0)• | | AVOID = NOT_USED: NO (139.0/45.0)• | | AVOID = STEER_AND_ BRAKES: NO (29.0/13.0)• | TRAV_SP = PARKED: NO (113.0/3.0)• | TRAV_SP = BELOW_30: NO (260.0/14.0)• | TRAV_SP = 96+_MPH: NO (1397.0/251.0)• BODY_TYPE = HEAVY/LARGE_TRUCK: NO (4194.0/542.0)• BODY_TYPE = MEDIUM_TRUCK: NO (438.0/59.0)

Page 14: FARS (Fatality Analysis Reporting System) Datamining

New Knowledge

• Hit and Runs very frequently happen in conditions of low light

• Pedestrians are very likely to be injured/killed in low light conditions

• Avoid using only steering to control rollover• Make sure you always wear a seat belt

despite having air bags• Position your shoulder belt properly and

always wear in conjunction with lap belt