BI PPT Finale
-
Upload
vipul-neema -
Category
Documents
-
view
328 -
download
2
Transcript of BI PPT Finale
PowerPoint Presentation
Predict Aircraft Damage Upon Bird Strike
Group 01Members: Ketan Bansal Mayank AterkarRitu PandeyRoheen ChaturvediVipul Neema
1
AGENDAProblem definition and backgroundData set descriptionVariable DescriptionPreprocessing and predictive analysisModel comparisonConclusionFuture scope
PROBLEM BACKGROUND AND DESCRIPTION
Bird and other wildlife strikes with aircraft cause over $900 million annual damage to U.S. civil and military aviation. These strikes put the lives of aircraft crew members and their passengers at riskOver 250 people have been killed worldwide as a result of wildlife strikes since 1988
PROBLEM BACKGROUND AND DESCRIPTION
According to Bird Strike Committee USA, a recent Bird strike accident of Transavia B738 at Girona on Jul 11th 2014 led to the plane to be under repair for 10 dayshttp://avherald.com/h?article=4774b5a9&opt=0The data cover the incidents involving Airplane bird strikes in United States
KEY FEATURES OF THE DATASETTotal of 37 Input VariableIt has around 99,404 plus records The dataset source is federal aviation administration websiteTarget variable : Effect_Indicated_Damage
VARIABLE DESCRIPTION
PREPROCESSING AND PREDICTIVE ANALYSIS
FILE IMPORT
FILE IMPORT
Load data in data sourceCheck for variable summary, significance and roleRelevant variable selection will be done by Variable Selection NodeTarget Variable- Effect: Indicated Damage
VARIABLE SELECTIONUsed to identify the variables which are important for predicting the target variableFew of the variables like Flight Date & Record_ID, were rejected manuallyThey didnt have any effect on the outcome of target variable
VARIABLE SELECTION
IMPUTE
IMPUTE
The missing variables are replaced ,instead of removing record altogetherMean, Median and Count are most commonly used method for that.Missing values in Interval variables replaced by the meansCount was used to fill up the missing values in nominal variables
SAMPLE
SAMPLEUsed to get a sample of data, which reflects whole dataset.Over-sampling was required in the datasetRequired to bias the classification of a rare eventThe records of the positive target value was a rare event in original dataset
SAMPLEStratified sampling is used with Equal as the criteriaPuts higher proportion of the rare event observation than in the original one After the node was applied, there were equal records for both, about 15,000 in total
DATA PARTITIONThe database generated from previous nodes is partitioned into training, validation, testing data.Training - Model buildingValidation- Avoid Over-fittingTesting: Final assessment of the modelRatio taken- 50:30:20
Predictive Analytics
Predictive Analysis
Decision Tree
RegressionLogistic regressionSelection Methods : Forward , Backward , Stepwise, None
No need to create dummy variable! For target19
DECISION TREE
Tree ended up with 16 leaf nodes.
Variable Importance in the order as given by SAS:1. Aircraft Airline operator2. Wildlife Size3. Wildlife number struck4. Altitude Bin5. Phase of flight
DECISION TREE
Cumulative lift = 1.84 Accuracy = 73.8
REGRESSIONIt helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while others are fixed.Regression could be performed both linearly and logistically but since our target variable is binary in nature we used logistic regression techniques.
LOGIT REGRESSIONCumulative lift = 1.92
LOGIT REGRESSION
The Confusion Matrix; True Positive and True Negatives adds up to give the Accuracy rate for the modelAccuracy = 79.6 %
Other Logit Regression Methods
Backward RegressionAccuracy = 76.73%
Forward RegressionAccuracy = 76.73%
Stepwise Regression Accuracy = 76.73% The forward-selection technique begins with no variables in the model. For each of the independent variables, the FORWARD method calculates statistics that reflect the variables contribution to the model if it is included
All gives same accuracy ! Why ?
Optimizing the ResultsTried different ratios in Data Partition Node Increase the max. no. of classes in Variable Selector Tried out with different Criterion for Stratification in Sample NodeCame up with optimal levels of Accuracy for all the models
So Which Model Is the Best Fit ?Model Comparison Node tries to answer that!
On what Basis ? Misclassification Rate!
How to interpret results ? Correlation is not Causation !
Comparison of Missclassification Rate for all Modles ModelsMisclassification ratesLogistic Regression.22944Regression stepwise.23143Regression backward.23143Regression forward.23143Decision Tree.24182
MODEL COMPARISON
Result of the Model Comparison Node
MODEL COMPARISON
Comparison of Cumulative Lift charts for all the models Curve for Regression has been highlighted
Shortcomings! Where the Model could have been Improved!
There were large number of missing values originally present in the data set. The accuracy of all the models would have been improved if missing values were low.Also to account for rare event we had to compress our data set to 15% only. So the results were not that accurate
Business ApplicationProblem: 250 people killed since 1988$900 million/year (by defense & civil aviation)
Causes/Reasons & Possible Solutions:
Cause/ReasonSolutionsIntersection of aviation routes and Routes Migration of birds Change in the aviation routes at different routes at various points in year Wildlife Management department at airport should invest more in analyzing the changes in routes due to landing issues and air traffic Airports and aviation settlement near natural habitat and places like pondsIn general the airports and aviation centers are away from the cities which are near to the natural habitat for wildlife and places like pondsHeight of travellingHeights of travel should be changed according to the change in the month due to migration and change in the flocks natural flying heights
Bird Strikes and its effects
Prevention Better than CureExisting Solutions:Spiral Marks on the Turbine Fans
The spiral appears to be dangerous35
Preventive Measures
Technological changes in the manufacturing of the turbine and engines because once the birds are sucked in the engine failsIncreasing in the no. of the engines because of the above given reasons Campaigns like Strike Out started by Aviation Industry in USABird Strike Committee USA Steering Committee
Strike out : helps the planes figure out the birds and the planes nearby preventing collisions from bothSteering Committee : contains members from below organization dedicated to prevent bird strikes on national level Federal Aviation AdministrationU.S. Department of AgricultureDepartment of DefenseU.S. AirportsPrivate Sector ServicesAirlinesAerospace Industry
36
Future ScopeMeasures of Areas majorly affected by the Airplane bird strikesCost of repair using: cluster analysis, hidden-patterns, causation andCorrelation Which solution amongthe existing ones are best
37