Severe weather data analysis
-
Upload
nicholas-brooks -
Category
Data & Analytics
-
view
147 -
download
0
Transcript of Severe weather data analysis
Severe Weather Data Analysis
Nicholas Brooks STAT 4001 12/8/2015
Objective
Analyze severe weather data and show any indications of patterns or trends in severe weather data to better understand the behavior of severe weather
The Problem
Over recent decades there have been some noticeable changes in the behavior of severe weather. Severe weather is becoming increasingly more violent and frequent compared to the past.
The Data
Original dataset had 1.3 million observations Tedious cleaning and aggregating this massive dataset Final dataset consists of Severe weather observations from 1950 to
2015 Includes variables such as Damage($), Event type, Temperature, and
Total Count of severe weather event
Descriptive Statistics I
Descriptive Statistics I
Descriptive Statistics II
Descriptive Statistics II
Descriptive Statistics Summary
Total event count plotted overtime shows a rapid increase in severe frequency
Property damage($) plotted overtime shows a noticeable increase in damage dealt
alarming increases starting in the 1990s 2010 – 2019 could prove to be the decade with the most severe
weather activity over the next 5 year at this rate
Time series forecast The time series plot of Property Damage($) on the left is before log transformation and
after log transformation on the right
Information for linear modeling
Data used in the following simple and multiple linear regression models are from the month of April throughout 1955-2015
Unfortunately these models don’t consider the yearly April data as a time series
proceed with the simple and multiple linear models regardless of time
Linear modeling I
lm(formula = DAMAGE_PROPERTY ~ TotalCount) Coefficients: (Intercept) TotalCount -27213230 214440 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -27213230 78320213 -0.347 0.729 TotalCount 214440 26280 8.160 2.94e-11 *** Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Multiple R-squared: 0.5302, Adjusted R-squared: 0.5222 F-statistic: 66.58 on 1 and 59 DF, p-value: 2.939e-11
Linear modeling I
Total Count of severe weather events is moderately correlated to Property Damage ($)
> AIC(modelfff) [1] 2607.031 Although this is a high AIC level for a linear model we will use it in
comparison to a multiple linear regression model based of this simple linear model
Linear modeling I
> b=boxcox(modelfff) > y1<-DAMAGE_PROPERTY^.2 > modelffh=lm(y1~TotalCount) > modelffh
Call: lm(formula = y1 ~ TotalCount)
Coefficients: (Intercept) TotalCount 34.784599 0.004806
Linear modeling I
lm(formula = y1 ~ TotalCount) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.478e+01 1.824e+00 19.076 < 2e-16 *** TotalCount 4.806e-03 6.119e-04 7.855 9.62e-11 *** Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Multiple R-squared: 0.5112, Adjusted R-squared: 0.5029 F-statistic: 61.7 on 1 and 59 DF, p-value: 9.619e-11
> AIC(modelffh) [1] 462.8133
Linear modeling II
The multiple regression equation is used to predict property damage($) with more independent variables than the simple linear model
lm(formula = DAMAGE_PROPERTY ~ DAMAGE_CROPS + Hail + Tornado + Heavy.Rain + Flash.Flood + Flood + Wildfire + High.Wind + Heavy.Snow + Drought + Heat + Tropical.Storm)Coefficients: (Intercept) DAMAGE_CROPS Hail Tornado Heavy.Rain -3.389e+08 -9.500e-01 1.491e+05 3.922e+06 -8.237e+06 Flash.Flood Flood Wildfire High.Wind Heavy.Snow -2.674e+06 3.911e+06 -1.542e+07 1.549e+06 -3.781e+06 Drought Heat Tropical.Storm 2.616e+06 -7.827e+06 9.985e+08
Linear modeling II
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -3.389e+08 7.462e+07 -4.541 3.77e-05 *** DAMAGE_CROPS -9.500e-01 4.109e-01 -2.312 0.025103 * Hail 1.491e+05 8.191e+04 1.821 0.074903 . Tornado 3.922e+06 5.065e+05 7.743 5.40e-10 *** Heavy.Rain -8.237e+06 3.769e+06 -2.185 0.033771 * Flash.Flood -2.674e+06 1.077e+06 -2.484 0.016540 * Flood 3.911e+06 9.482e+05 4.124 0.000147 *** Wildfire -1.542e+07 5.306e+06 -2.906 0.005519 ** High.Wind 1.549e+06 5.276e+05 2.936 0.005085 ** Heavy.Snow -3.781e+06 1.532e+06 -2.468 0.017187 * Drought 2.616e+06 7.477e+05 3.499 0.001017 ** Heat -7.827e+06 3.053e+06 -2.563 0.013553 * Tropical.Storm 9.985e+08 3.893e+08 2.565 0.013504 * Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Linear modeling II
Multiple R-squared: 0.8522, Adjusted R-squared: 0.8153 F-statistic: 23.07 on 12 and 48 DF, p-value: 6.5e-16 > AIC(modelhhh) 2558.475 < 2607.031 AIC level of the multiple linear regression is less than the simple
linear regression supporting low multicollinearity in multiple linear regression model
Conclusion
Data analysis of severe weather support reasoning to believe patterns and relationships within the data do exist.
Although rapid increases in data observation over time were shown graphically, the catalyst or catalysts that explain this behavior in the data remains enigmatic
In the future we should anticipate the probability that severe weather may occur more frequently and violently
Works Cited
Storm Events Database. National Oceanic Atmospheric Administration. Web. 7 Dec. 2015. http://www.ncdc.noaa.gov/stormevents/ftp.jsp