1 Clustering NTSB Accidents Data Lishuai Li, Rafael Palacios, R. John Hansman JUP Quarterly Meeting...
-
date post
20-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of 1 Clustering NTSB Accidents Data Lishuai Li, Rafael Palacios, R. John Hansman JUP Quarterly Meeting...
1
Clustering NTSB Accidents Data
Lishuai Li, Rafael Palacios,
R. John Hansman
JUP Quarterly Meeting
Jan. 2010
2
Introduction
Aviation safety has been improved significantly over the past 50 years.
It is difficult to improve safety by making up for problems occurred in individual accident for the current systems.
Each accident is often induced by various anomalies. To identify patterns, correlations, and trends in large amounts of aviation accidents data can help us to understand problems and to prevent future incidents.
Boeing, Statistical Summary of Commercial Jet Airplane Accidents, July 2009
1989
1991
1993
1995
1997
1999
2001
2003
2005
2007
0
20
40
60
80
100
120
140
Accident Rates for Civil Aviation (1989-2008)
Commercial Airlines
General Aviation
Year
Ac
cid
en
ts p
er
Mil
lio
n F
lig
ht
Ho
urs
Data Source: National Transportation Safety Board
3
Methodology
Research Method:
• Use data-mining techniques to identify patterns in accidents data• Identify accidents with similar characteristics• Incorporate findings with narratives to find causalities
Data:
• Subset of NTSB accident database system (ADMS2000) Event Type: Accident only, excluding incident FAR Part: Part 91 (General Aviation); Part 121 (Air Carriers) Aircraft Type: Airplanes only Year: from 2000 to 2005
• Other database will be considered in future work
Data-mining tools:
• Clustering (e.g. k-means): use a distance function to search for partitioning of records such that the intra-cluster distance is minimal and the inter-cluster distance is maximum
• Other data-mining techniques will be considered and used in future study
4
Clustering Method
K-means clustering is a partitioning method.
Data can be partitioned into k mutually exclusive clusters.
K-means clustering finds a partition in which objects within each cluster are as close to each other as possible, and as far from objects in other clusters as possible.
Each data point represents an accident. The attributes of that accident determine where the data point is. K-means clustering can be used to find accidents with similar attributes.
-4 -3 -2 -1 0 1 2 3 4-5
-4
-3
-2
-1
0
1
2
3
4
Cluster 1Cluster 2Centroids
-4 -3 -2 -1 0 1 2 3 4-5
-4
-3
-2
-1
0
1
2
3
4
5
Preliminary Results of Clustering NTSB Accidents Data
For this preliminary study, we want to test if k-means clustering can be used to identify accidents with similar attributes specified.
Apply k-means clustering method to the subset of NTSB data (Part 91 & Part 121 Accidents from 2000 to 2005)
Accidents attributes used in clustering:
• Flight Plan Type, Injury Level, Visibility, Phase of Flight• Location, Day of The Year
6
Phase of Flight & Visibility Characteristics for Part 91 Accidents
(2000-2005)
General characteristics of accidents regarding individual variable are commonly known
• Accidents are more likely to happen in very low visibility conditions• High rate of accidents during taking-offs and landings
All events with visibility >10 are put into the same grouped as the ones with visibility =0
0 1000 2000 3000 4000 5000 6000
1
2
3
4
5
6
7
8
9
10
11
Sta
tute
Mile
s
No. of Events
Visibility
0 500 1000 1500 2000 2500
Standing/Taxi/Other
Takeoff
Climb
Cruise
Descent
Maneuver/Hover
Approach
Go-Around
Landing
No. of Events
Phase of Flight
7
Phase of Flight & Visibility Characteristics by Flight Plan Type
0 972 1944 2916 3888 4860
0
1
2
3
4
5
6
7
8
9
10
Sta
tute
Mile
s
VFR/OtherIFR
0 64 128 192 256 320Visibility
0 389 778 1167 1556 1945
Standing/Taxi/Other
Takeoff
Climb
Cruise
Descent
Maneuver/Hover
Approach
Go-Around
Landing
VFR/OtherIFR
0 33 66 99 132 165Phase of Flight
Phase of Flight Distribution of Part 91 Accidents (2000-2005)
VFR vs. IFR
Visibility Distribution of Part 91 Accidents (2000-2005)
VFR vs. IFR
8
Phase of Flight & Visibility Characteristics by Injury Level
0 824 1648 2472 3296 4120
0
1
2
3
4
5
6
7
8
9
10
Sta
tute
Mile
s
Non-FatalFatal
0 212 424 636 848 1060Visibility
0 405 810 1215 1620 2025
Standing/Taxi/Other
Takeoff
Climb
Cruise
Descent
Maneuver/Hover
Approach
Go-Around
Landing
Non-FatalFatal
0 80 160 240 320 400Phase of Flight
Phase of Flight Distribution of Part 91 Accidents (2000-2005)
Non-Fatal vs. Fatal
Visibility Distribution of Part 91 Accidents (2000-2005)
Non-Fatal vs. Fatal
9
VFR/other IFR
Non-Fatal
Fatal
1
2
3
4
5
6
Clustering by Flight Plan Type, Injury Level, Flight Phase, and Visibility
Combine all the information in 4 dimensions to cluster similar accidents
Accidents are clearly separated into 4 categories by Flight Plan Type and Visibility.
IFR accidents and Fatal accidents are more evenly spread over Phase of Flight and Visibility.
VFR/Non-Fatal accidents are concentrated in 3 regions: low visibility, or high visibility in initial phases and landings.
0 2 4 6 8 10Standing/Taxi/Other
Takeoff
Climb
Cruise
Descent
Maneuver/Hover
Approach
Go-Around
Landing
1
2
3
4
5
6
Visibility (statute miles)
Pha
se o
f F
light
10
Accidents Characteristics by Clusters
0 2 4 6 8 101
2
3
4
5
6
7
8
9
1
3
5
Visibility (statute miles)
Ph
ase
of
Flig
ht
Non-Fatal, VFR/Other
0 2 4 6 8 101
2
3
4
5
6
7
8
9
2
Visibility (statute miles)
Ph
ase
of
Flig
ht
Fatal, IFR
0 2 4 6 8 101
2
3
4
5
6
7
8
9
4
Visibility (statute miles)
Ph
ase
of
Flig
ht
Fatal, VFR/Other
0 2 4 6 8 101
2
3
4
5
6
7
8
9
6
Visibility (statute miles)
Ph
ase
of
Flig
ht
Non-Fatal, IFR
FatalVFR/Other
Non-FatalVFR/Other
Non-FatalIFR
FatalIFR
Phase of FlightVisibility
Phase of FlightVisibility Phase of FlightVisibility
Phase of FlightVisibility
0 50 100 150 200 250 300 350 400
Standing/Taxi/Other
Takeoff
Climb
Cruise
Descent
Maneuver/Hover
Approach
Go-Around
Landing
No. of Events
Cluster 4
0 200 400 600 800 1000
0
1
2
3
4
5
6
7
8
9
10
No. of Events
Vis
ibili
ty (
Sta
tute
Mile
s)
Cluster 4
0 20 40 60 80 100 120
0
1
2
3
4
5
6
7
8
9
10
No. of Events
Vis
ibili
ty (
Sta
tute
Mile
s)
Cluster 2
0 50 100 150 200 250
0
1
2
3
4
5
6
7
8
9
10
No. of Events
Vis
ibili
ty (
Sta
tute
Mile
s)
Cluster 6
0 20 40 60 80 100 120
Standing/Taxi/Other
Takeoff
Climb
Cruise
Descent
Maneuver/Hover
Approach
Go-Around
Landing
No. of Events
Cluster 2
0 20 40 60 80 100 120 140
Standing/Taxi/Other
Takeoff
Climb
Cruise
Descent
Maneuver/Hover
Approach
Go-Around
Landing
No. of Events
Cluster 6
0 500 1000 1500 2000
Standing/Taxi/Other
Takeoff
Climb
Cruise
Descent
Maneuver/Hover
Approach
Go-Around
Landing
No. of Events
Cluster 1Cluster 3Cluster 5
0 500 1000 1500 2000 2500 3000 3500 4000
0
1
2
3
4
5
6
7
8
9
10
No. of Events
Vis
ibili
ty (
Sta
tute
Mile
s)
Cluster 1Cluster 3Cluster 5
11
Locations and Day of The Year of Part 91 Accidents (2000-2005)
Total number of accidents included: 6819
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 0
100
200
300
400
500
600
700
800
900
No.
of
Eve
nts
Location Distribution Time Distribution
12
Clustering Part 91 Accidents by Location & Day of The Year
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec0
10
20
30
40
50
60
70
80
90
Num
ber
of
Eve
nts
Cluster 9Cluster 10
Accidents are automatically classified by location and time of the year.
The two variables, location and day of the year, are not enough to create clusters with potential safety implications.
13
Locations and Day of The Year of Part 121 Accidents (2000-2005)
Total number of accidents included: 157
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 0
2
4
6
8
10
12
14
16
18
No.
of
Eve
nts
Location Distribution Time Distribution
14
Clustering Part 121 Accidents by Location & Day of the Year
Accidents sharing similar locations and time information are clustered together (12 clusters)
1 2 3 4 5 6 7 8 9 10 11 120
50
100
150
200
250
300
350
400
Cluster ID
Day
of
Th
e Y
ear
CentroidActual Data
15
Accidents in Cluster 2
Cluster 2 includes 5 Caribbean accidents• Accidents on 4/22/2002, 2/25/2003, 4/6/2003 4/24/2003 were caused by
turbulence• Accident on 2/8/2003 was caused by passenger stair handrail collapsing
16
Summary & Future Work
Data-mining method can combine multiple-dimensional information at the same time.
Accidents can be partitioned by clustering methods with specified attributes.
Future Work:
• Develop a systemic approach to include important variables in clustering method
• Explore other data-mining techniques to review safety data in a new way
• Investigate other possible safety data sources, e.g. accidents, ATC operation errors
• Identify patterns in accidents, or various anomalies, which can reveal subtle causalities underlying in the large amount of data