Data Science: Not Just For Big Data
-
Upload
revolution-analytics -
Category
Technology
-
view
107 -
download
1
description
Transcript of Data Science: Not Just For Big Data
Revolution Confidential
Data ScienceNot just for big
data!David SmithRevolution Analytics@revodavid
October 16, 2013
Revolution Confidential
2
Big Data: the new oil?
Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0
Revolution Confidential
3
Big Data is just raw material
Data Distillation Extract quantities of interest Find complete cases Derive missing information
Big Data Pitfalls: Data cleanliness & accuracy Observational bias
Do the data I have represent the population I’m interested in?
Revolution Confidential
4
Surveys & Experiments
Even with Big Data, the data you need isn’t always in the building!
… so ask (survey)! Survey design Stratified sampling
… or experiment! A/B Testing Experimental Design
Revolution Confidential
5
Data Exploration & Visualization
Limited by pixels Big data = a big black
blob Extract signal from
noise Aggregations Heat maps Smoothing Small multiples
Revolution Confidential
6
Statistical Modeling & Forecasting
You don’t always need big data Sampling can help with observational bias
Model selection Feature extraction Confounding? Interactions?
Model validation Overfitting
Prediction Extrapolation Confidence
http://xkcd.com/605/
Revolution Confidential
7
Summary
Big Data is great, but think of it as the “raw materials” for data science After refining, “big” isn’t always so “Big”
Use statistical insight to avoid pitfalls: Inferences: Observational bias / Sampling bias Predictions: Confounding / Overfitting Think about variances and means (risk!)
Some data scientists may miss these issues Look for statistical expertise
Further reading: ComputerWorld: 12 predictive analytics screw-ups