Data Science: Can weather predict Bikeshare usage?

14
Mother Nature’s Impact on Bike Ridership Jackie Zajac Kays Fattal Naumaan Nasir

description

A project for the UMD data science course using freely available data from Capital Bikeshare.

Transcript of Data Science: Can weather predict Bikeshare usage?

Page 1: Data Science: Can weather predict Bikeshare usage?

Mother Nature’s Impact on Bike Ridership

Jackie Zajac

Kays Fattal

Naumaan Nasir

Page 2: Data Science: Can weather predict Bikeshare usage?

Does weather have a relationship with bike ridership?

Can we predict bike usage based on weather?

Page 3: Data Science: Can weather predict Bikeshare usage?

INTRODUCTION

• Our team

• Research questions

• Picking datasets

• Our audience

Page 4: Data Science: Can weather predict Bikeshare usage?

METHODOLOGY

• Why linear regression?

• How we manipulated the data

• MySQL engine aggregated 3M table into sum of rental counts and duration

• Mashed up with 731 rows of weather data (2011, 2012)

• Added a Year field• Tools: Excel, MySQL database,

R (Rattle)

Page 5: Data Science: Can weather predict Bikeshare usage?

METHODOLOGY

• Picking our best configuration

• Categoric vs. numeric variables• Must decide how to measure bike usage • Must pick best variables

• Error analysis

Page 6: Data Science: Can weather predict Bikeshare usage?

PHASE I

• Began with a broad study of six regressions

• Two target variables (rental counts, duration)• Three temperature measures• Minimum, Average, Maximum• Chunked the day into three time ranges to reflect

temperature during bike rides• Evaluated multiple weather variables’ affect on

regressions

• Ignored Date field

Page 7: Data Science: Can weather predict Bikeshare usage?

Plots

Page 8: Data Science: Can weather predict Bikeshare usage?

PHASE II

• Combining the data sets

• Picking best variables:

• Bike rental counts as sole target variable• Maximum temperature • Utilized date/year field • Switched Snow to categoric variable

• Analyzed and refined our regression

• Higher accuracy – R-squared = .8374 or 83.74%

Page 9: Data Science: Can weather predict Bikeshare usage?

MSE and R-squared• A measure of accuracy in one dataset

predicting another• Relationship between R-squared and MSE

Page 10: Data Science: Can weather predict Bikeshare usage?

X X

X

Page 11: Data Science: Can weather predict Bikeshare usage?

FINAL MODELWeight Variable

-4004.501 Intercept

62.118 Maximum Temperature

-132.741 Average Wind

93.162 Precipitation

416.818 Visibility

2063.069 Year

-161.038 Snow [0.0-1.2] inches

-4.945 Snow [1.2-2.0] inches

-588.349 Snow [2.0-3.1] inches

-5.390 Snow [3.1-3.9] inches

Y=

Page 12: Data Science: Can weather predict Bikeshare usage?

LESSONS LEARNED

• Too many independent variables to incorporate crime dataset in addition to weather dataset

• Means Squared Error (MSE), R-squared

• Only two years’ worth of data was available due to Bikeshare’s short history (2011, 2012)

• Final model would be even more accurate with additional historical data

Page 13: Data Science: Can weather predict Bikeshare usage?

CONCLUSION

• Our hypotheses proved true: weather does affect bike ridership

• Why is Maximum Temperature better?

• Why does the Year improve accuracy?

• The categorical range of snow inches

Page 14: Data Science: Can weather predict Bikeshare usage?

QUESTIONS?

Thanks!