Data Science with Hadoop at Opower
Transcript of Data Science with Hadoop at Opower
![Page 2: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/2.jpg)
What is Opower?
![Page 3: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/3.jpg)
A study:
![Page 4: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/4.jpg)
$$$
Turn off AC & Turn on Fan
Environment
Turn off AC & Turn on Fan
Citizenship
Turn off AC & Turn on Fan
![Page 5: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/5.jpg)
Zero Impact on Consumption
$$$
Turn off AC & Turn on Fan
Environment
Turn off AC & Turn on Fan
Citizenship
Turn off AC & Turn on Fan
![Page 6: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/6.jpg)
Zero Impact on Consumption
$$$
Turn off AC & Turn on Fan
Environment
Turn off AC & Turn on Fan
Citizenship
Turn off AC & Turn on Fan
![Page 7: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/7.jpg)
Zero Impact on Consumption
$$$
Turn off AC & Turn on Fan
Environment
Turn off AC & Turn on Fan
Citizenship
Turn off AC & Turn on Fan
Neighbors
Turn off AC & Turn on Fan
![Page 8: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/8.jpg)
6% Drop in Consumption
Zero Impact on Consumption
$$$
Turn off AC & Turn on Fan
Environment
Turn off AC & Turn on Fan
Citizenship
Turn off AC & Turn on Fan
Neighbors
Turn off AC & Turn on Fan
![Page 9: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/9.jpg)
Opower Details Customer Engagement Platform for Utilities
Company • ~300 employees
• Cleantech Company of the Year 2012!
• 75 utility partners covering > 50M households
• > 1.5 Terawatt hours saved
Our DNA • Data analytics
• Behavioral science
![Page 10: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/10.jpg)
What is Opower?
![Page 11: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/11.jpg)
What is Opower?
One giant big data problem
![Page 12: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/12.jpg)
Advanced Analytics
![Page 13: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/13.jpg)
Advanced Analytics provides consumer insights
13
Our charter is to provide consumers with insights that give context and control over how they use energy.
![Page 14: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/14.jpg)
We use machine learning and predictive modeling
14
Our charter is to provide consumers with insights that give context and control over how they use energy.
Use machine learning, signal processing, and predictive modeling to provide energy usage insights.
![Page 15: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/15.jpg)
Jan Apr Jul Oct Jan Apr Jul Oct
Baseload
Heating Cooling
We provide insights into individual energy use
![Page 16: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/16.jpg)
Data science
![Page 17: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/17.jpg)
Data scientists extract meaning
17
Data science is a discipline … with the goal of extracting meaning from data and creating data products.
Wikipedia: http://en.wikipedia.org/wiki/Data_science
![Page 18: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/18.jpg)
Data scientists are statisticians
18
Data science is a discipline … with the goal of extracting meaning from data and creating data products.
Wikipedia: http://en.wikipedia.org/wiki/Data_science
In other words, machine learning, statistics, and pretty charts.
![Page 19: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/19.jpg)
Data scientists want to extract meaning
19
Data science is a discipline … with the goal of extracting meaning from data and creating data products.
Wikipedia: http://en.wikipedia.org/wiki/Data_science
In other words, machine learning, statistics, and pretty charts.
![Page 20: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/20.jpg)
Data scientists are data mungers
20
Data science is a discipline … of data munging.
![Page 21: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/21.jpg)
Data scientists prepare data
21
Data science is a discipline … of data munging.
Data munging is the process of converting data from one form into another for more convenient consumption.
Wikipedia: http://en.wikipedia.org/wiki/Data_wrangling
![Page 22: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/22.jpg)
Data scientists are plumbers
22
Data science is a discipline … of plumbing.
Plumbing is difficult.
![Page 23: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/23.jpg)
It’s temporary, I swear!
23
Data science is a discipline … of plumbing.
http://funmeme.com/post/2009/08/02/Plumbing-FAIL-e28093-Funny-Pic.aspx
Move data from here to there.
Hack to get the data how you want it.
![Page 24: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/24.jpg)
It works. For now.
24
Data science is a discipline … of plumbing.
http://www.ontimeplumber.com.au/plumbing_disasters/plumbing_disasters.html
Multiple sources are tricky to handle.
Construct a series of tubes.
![Page 25: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/25.jpg)
Needs user testing
25
Data science is a discipline … of plumbing.
http://www.funnyjunk.com/funny_pictures/234485/Awkward/
Sometimes you have to start over when you think you’re done.
![Page 26: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/26.jpg)
Data science is mostly plumbing
26
Data science is a discipline … of plumbing.
![Page 27: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/27.jpg)
It’s where we spend all of our time
27
Data science is a discipline … of plumbing.
We spend 80% of our time on data munging and other infrastructure work.
![Page 28: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/28.jpg)
Fun stuff only 20% of the time
28
Data science is a discipline … of plumbing.
We spend 80% of our time on data munging and other infrastructure work.
Sprinkle on some modeling and charts for the other 20%.
![Page 29: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/29.jpg)
Data science in practice
![Page 30: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/30.jpg)
Electric tankless water heater 10% off
30
http://www.homedepot.com/Plumbing-Water-Heaters-Tankless-Electric/h_d1/N-5yc1vZc1ty/R-203210874/h_d2/ProductDisplay?catalogId=10053&langId=-1&storeId=10051
![Page 31: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/31.jpg)
Who should get this promotion?
31
http://www.homedepot.com/Plumbing-Water-Heaters-Tankless-Electric/h_d1/N-5yc1vZc1ty/R-203210874/h_d2/ProductDisplay?catalogId=10053&langId=-1&storeId=10051
![Page 32: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/32.jpg)
Maximize take-up rate
32
http://www.homedepot.com/Plumbing-Water-Heaters-Tankless-Electric/h_d1/N-5yc1vZc1ty/R-203210874/h_d2/ProductDisplay?catalogId=10053&langId=-1&storeId=10051
![Page 33: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/33.jpg)
Minimize marketing cost
33
http://www.homedepot.com/Plumbing-Water-Heaters-Tankless-Electric/h_d1/N-5yc1vZc1ty/R-203210874/h_d2/ProductDisplay?catalogId=10053&langId=-1&storeId=10051
![Page 34: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/34.jpg)
Data science in practice
Identify likely purchasers
![Page 35: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/35.jpg)
Data science in the past
35
How would we have solved this before Hadoop?
![Page 36: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/36.jpg)
Past is same as the present: construct a model
36
How would we have solved this before Hadoop?
Construct a model of likely purchasers.
![Page 37: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/37.jpg)
Predict purchase behavior with a model
37
Probability(purchase) = β1 Electric Heat +
β2 Similar Purchases +
β3 Neighbors Purchased +
β4 Response Rate +
β5 Type Of Message
We can model purchase behavior at the consumer level.
Include predictors that indicate heavy winter electric usage, neighbor influences, and responsiveness to past communications.
![Page 38: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/38.jpg)
Housing heat type correlates with water heat type
38
Probability(purchase) = β1 Electric Heat +
β2 Similar Purchases +
β3 Neighbors Purchased +
β4 Response Rate +
β5 Type Of Message
Does the consumer use electric heat?
Households with gas heat are unlikely to purchase an electric water heater. (Natural gas is cheap.)
![Page 39: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/39.jpg)
Willingness to invest in efficient products
39
Probability(purchase) = β1 Electric Heat +
β2 Similar Purchases +
β3 Neighbors Purchased +
β4 Response Rate +
β5 Type Of Message
Has the consumer participated in similar program promotions?
Past purchase behavior is a good predictor of future behavior.
![Page 40: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/40.jpg)
Neighbor effects can be powerful
40
Probability(purchase) = β1 Electric Heat +
β2 Similar Purchases +
β3 Neighbors Purchased +
β4 Response Rate +
β5 Type Of Message
Is the product popular about their neighbors?
Neighbor effects may influence purchase behavior.
![Page 41: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/41.jpg)
Responsiveness proxies engagement
41
Probability(purchase) = β1 Electric Heat +
β2 Similar Purchases +
β3 Neighbors Purchased +
β4 Response Rate +
β5 Type Of Message
Has the consumer responded to past communications?
Past responsiveness indicates high engagement.
![Page 42: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/42.jpg)
Home Energy Reports influence usage perceptions
42
Probability(purchase) = β1 Electric Heat +
β2 Similar Purchases +
β3 Neighbors Purchased +
β4 Response Rate +
β5 Type Of Message
What type of message has the consumer received on their Home Energy Reports?
The relative positioning of past energy usage may influence willingness to invest in future lower usage.
![Page 43: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/43.jpg)
We have a model. Let’s get the data.
43
Probability(purchase) = β1 Electric Heat +
β2 Similar Purchases +
β3 Neighbors Purchased +
β4 Response Rate +
β5 Type Of Message
![Page 44: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/44.jpg)
Disparate data sources
44
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 45: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/45.jpg)
Let’s start plumbing
45
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 46: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/46.jpg)
Pipe utility data
46
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 47: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/47.jpg)
Pipe customer interaction data
47
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 48: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/48.jpg)
Finally, pipe Home Energy Report data
48
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 49: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/49.jpg)
Now we’re ready to model
49
Probability(purchase) = β1 Electric Heat +
β2 Similar Purchases +
β3 Neighbors Purchased +
β4 Response Rate +
β5 Type Of Message
![Page 50: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/50.jpg)
There’s a problem
50
Probability(purchase) = β1 Electric Heat +
β2 Similar Purchases +
β3 Neighbors Purchased +
β4 Response Rate +
β5 Type Of Message
We know these predictors
![Page 51: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/51.jpg)
Heat type is sparse and inaccurate
51
Probability(purchase) = β1 Electric Heat +
β2 Similar Purchases +
β3 Neighbors Purchased +
β4 Response Rate +
β5 Type Of Message
We know these predictors
This is harder
![Page 52: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/52.jpg)
Model electric heat to compensate for bad data
52
Probability(purchase) = β1 Electric Heat +
β2 Similar Purchases +
β3 Neighbors Purchased +
β4 Response Rate +
β5 Type Of Message
Parcel data coverage of heat type is sparse and inaccurate.
We need another data source for heat type.
![Page 53: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/53.jpg)
We construct a model to predict heat type
53
Probability(purchase) = β1 Pr(Electric Heat) =
δ1 Weather Sensitivity +
δ2 Neighbors Heat +
δ3 Natural Gas Price
We can model the presence of electric heat.
Include predictors of weather sensitivity, area prevalence, and local natural gas price.
![Page 54: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/54.jpg)
Sensitivity of electricity usage to cold weather
54
Probability(purchase) = β1 Pr(Electric Heat) =
δ1 Weather Sensitivity +
δ2 Neighbors Heat +
δ3 Natural Gas Price
How sensitive is the consumer’s electricity usage to cold weather?
High sensitivity to cold weather is our best indicator of electric heat.
![Page 55: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/55.jpg)
Heat Type Is Related to Geography
55
Probability(purchase) = β1 Pr(Electric Heat) =
δ1 Weather Sensitivity +
δ2 Neighbors Heat +
δ3 Natural Gas Price
Is electric heat popular in the consumer’s area?
Heat type tends to have specific geographic distributions.
![Page 56: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/56.jpg)
Gas Prices May Affect Heat Type Adoption
56
Probability(purchase) = β1 Pr(Electric Heat) =
δ1 Weather Sensitivity +
δ2 Neighbors Heat +
δ3 Natural Gas Price
How expensive is the alternative?
Natural gas may be hard to get in certain areas.
![Page 57: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/57.jpg)
We have another model. Let’s get the data.
57
Probability(purchase) = β1 Pr(Electric Heat) =
δ1 Weather Sensitivity +
δ2 Neighbors Heat +
δ3 Natural Gas Price
![Page 58: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/58.jpg)
Our plumbing so far
58
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 59: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/59.jpg)
Pipe neighbor heat type
59
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 60: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/60.jpg)
Pipe natural gas prices
60
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 61: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/61.jpg)
Now we’re ready to model (x2)
61
Probability(purchase) = β1 Pr(Electric Heat) =
δ1 Weather Sensitivity +
δ2 Neighbors Heat +
δ3 Natural Gas Price
![Page 62: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/62.jpg)
There’s a problem (x2)
62
Probability(purchase) = β1 Pr(Electric Heat) =
δ1 Weather Sensitivity +
δ2 Neighbors Heat +
δ3 Natural Gas Price We know these
predictors
![Page 63: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/63.jpg)
We don’t know weather sensitivity
63
Probability(purchase) = β1 Pr(Electric Heat) =
δ1 Weather Sensitivity +
δ2 Neighbors Heat +
δ3 Natural Gas Price We know these
predictors
This is harder
![Page 64: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/64.jpg)
Jan Apr Jul Oct Jan Apr Jul Oct
Baseload
Heating Cooling
Luckily, we know how to do this
![Page 65: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/65.jpg)
Jan Apr Jul Oct Jan Apr Jul Oct
Baseload
Heating Cooling
We have a disaggregation algorithm. Let’s get the data.
![Page 66: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/66.jpg)
Disaggregate heating and cooling
66
Probability(purchase) = β1 Pr(Electric Heat) =
δ1 Weather Sensitivity =
Jan Apr Jul Oct Jan Apr Jul Oct
Correlate electricity usage with weather.
Let’s grab the data.
![Page 67: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/67.jpg)
Our plumbing so far (x2)
67
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 68: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/68.jpg)
Pipe electricity usage data
68
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 69: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/69.jpg)
Pipe thermostat data
69
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 70: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/70.jpg)
Pipe weather data
70
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 71: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/71.jpg)
Starting to feel like Inception
71 http://www.chartgeek.com/wp-content/uploads/2012/04/inception-explained-chart.jpg
![Page 72: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/72.jpg)
Now we’re ready to model (finally)
72
Probability(purchase) = β1 Pr(Electric Heat) =
δ1 Weather Sensitivity =
Jan Apr Jul Oct Jan Apr Jul Oct
Construct disaggregation algorithms.
Calculate sensitivity for all households.
![Page 73: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/73.jpg)
Disaggregate and store results
73
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 74: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/74.jpg)
We know each customer’s heating sensitivity
74
Probability(purchase) = β1 Pr(Electric Heat) =
δ1 Weather Sensitivity =
Jan Apr Jul Oct Jan Apr Jul Oct
Let’s continue with our electric heat model.
![Page 75: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/75.jpg)
We have the data to finish our heat type model
75
Probability(purchase) = β1 Pr(Electric Heat) =
δ1 Weather Sensitivity +
δ2 Neighbors Heat +
δ3 Natural Gas Price
Construct electric heat model.
Impute heat type for all households.
![Page 76: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/76.jpg)
Impute heat type and store results
76
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 77: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/77.jpg)
We know each customer’s heat type
77
Probability(purchase) = Pr(Electric Heat) =
δ1 Weather Sensitivity +
δ2 Neighbors Heat +
δ3 Natural Gas Price
Let’s continue with our water heater purchase model.
![Page 78: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/78.jpg)
We now have the data to finish our purchase model
78
Probability(purchase) = β1 Electric Heat +
β2 Similar Purchases +
β3 Neighbors Purchased +
β4 Response Rate +
β5 Type Of Message
Construct purchase behavior model.
Calculate likelihood to purchase for all households.
![Page 79: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/79.jpg)
Calculate likelihood to purchase and store results
79
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 80: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/80.jpg)
We have our desired result
80
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 81: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/81.jpg)
Data science is plumbing
81
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
http://www.ontimeplumber.com.au/plumbing_disasters/plumbing_disasters.html
![Page 82: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/82.jpg)
New request: Who would buy an efficient pool pump for 10% off?
82
http://www.amazon.com/Pentair-Intelliflo-Variable-230-Volt-16-Ampere/dp/B007E4VWNO/ref=sr_1_3?ie=UTF8&qid=1350601695&sr=8-3&keywords=variable+speed+pool+pump
![Page 83: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/83.jpg)
I remember what the last model took…
83
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 84: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/84.jpg)
… and I start searching the want-ads
84
http://careers.stackoverflow.com/jobs?searchTerm=data+scientist&location=
![Page 85: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/85.jpg)
But it gets better
85
http://careers.stackoverflow.com/jobs?searchTerm=data+scientist&location=
Now we have Hadoop!
![Page 86: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/86.jpg)
Past is same as the present: construct a model
86
How would we have solved this with Hadoop?
Construct a model of likely purchasers.
![Page 87: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/87.jpg)
Hadoop has a key advantage
87
How would we have solved this with Hadoop?
Construct a model of likely purchasers.
Integrated data warehousing and data crunching
![Page 88: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/88.jpg)
Data and analytical capabilities in a single place
88
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
2
4
3 1
Hadoop Cluster
Algorithms
2
4
3 1
![Page 89: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/89.jpg)
Hadoop solves plumbing problem
89
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
2
4
3 1
Hadoop Cluster
Algorithms
2
4
3 1
All the data
![Page 90: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/90.jpg)
Fully integrated data crunching
90
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
2
4
3 1
Hadoop Cluster
Algorithms
2
4
3 1
All the data
Analytical capabilities
![Page 91: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/91.jpg)
Our model is the same. Let’s start building it.
91
Probability(purchase) = β1 Electric Heat +
β2 Similar Purchases +
β3 Neighbors Purchased +
β4 Response Rate +
β5 Type Of Message
![Page 92: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/92.jpg)
Still need weather sensitivity
92
Probability(purchase) = β1 Pr(Electric Heat) =
δ1 Weather Sensitivity =
Jan Apr Jul Oct Jan Apr Jul Oct
Calculating sensitivity is much easier with Hadoop.
Let’s get the data.
![Page 93: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/93.jpg)
Fetch your data with Hive views
93
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
2
4
3 1
Hadoop Cluster
Algorithms
2
4
3 1
Views
![Page 94: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/94.jpg)
Views provide fresh data on demand
94
Hive is a SQL-like interface to Hadoop.
Hive views are saved queries that you treat like a table.
Build views on top of views to setup complex analyses.
Querying a view takes longer to execute, but it ensures fresh data.
![Page 95: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/95.jpg)
View syntax is plain SQL
95
CREATE VIEW analytics.disaggregation_inputs_view AS SELECT w.temperature, r.usage_value FROM analytics.weather w JOIN analytics.reads r on w.zip_code = r.zip_code ;
![Page 96: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/96.jpg)
Views are data on demand
96
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
2
4
3 1
Hadoop Cluster
Algorithms
2
4
3 1
Views
More data without the storage
![Page 97: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/97.jpg)
Views point at data without storing it
97
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
2
4
3 1
Hadoop Cluster
Algorithms
2
4
3 1
Views
More data without the storage
![Page 98: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/98.jpg)
Views on top of views for complex analyses
98
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
2
4
3 1
Hadoop Cluster
Algorithms
2
4
3 1
Views
More data without the storage
![Page 99: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/99.jpg)
Setup a view to get disaggregation data
99
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
2
4
3 1
Hadoop Cluster
Algorithms
2
4
3 1
Views
![Page 100: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/100.jpg)
We have our disaggregation data
100
Probability(purchase) = β1 Pr(Electric Heat) =
δ1 Weather Sensitivity =
Jan Apr Jul Oct Jan Apr Jul Oct
We need to calculate the model and store the results.
Hadoop is built to do both.
![Page 101: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/101.jpg)
Setup a view to run disaggregation algorithms
101
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
2
4
3 1
Hadoop Cluster
Algorithms
2
4
3 1
Views
![Page 102: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/102.jpg)
Hadoop streaming + Views = Power
102
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
2
4
3 1
Hadoop Cluster
Algorithms
2
4
3 1
Views
Use Hadoop streaming
within the view
![Page 103: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/103.jpg)
Hadoop streaming can calculate anything
103
Stream data through any script.
Pipe any data through standard input and send any data to standard output.
Integrate with any language: R, Python, Ruby, Bash, Java, etc.
SELECT TRANSFORM command in Hive is an easy way to use Hadoop streaming.
![Page 104: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/104.jpg)
Hadoop streaming is easy to implement in Hive
104
CREATE VIEW analytics.disaggregation_outputs_view AS SELECT TRANSFORM ( diw.temperature, diw.usage_value ) USING ‘weather_disaggregation.R’ FROM analytics.disaggregation_inputs_view diw ;
Executable reads from stdin and
writes to stdout
![Page 105: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/105.jpg)
Simple SQL syntax to produce any result
105
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
2
4
3 1
Hadoop Cluster
Algorithms
2
4
3 1
Views
Algorithm in any language
![Page 106: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/106.jpg)
We know each customer’s heating sensitivity
106
Probability(purchase) = β1 Pr(Electric Heat) =
δ1 Weather Sensitivity =
Jan Apr Jul Oct Jan Apr Jul Oct
Let’s continue with our electric heat model.
![Page 107: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/107.jpg)
We’re ready to model electric heat
107
Probability(purchase) = β1 Pr(Electric Heat) =
δ1 Weather Sensitivity +
δ2 Neighbors Heat +
δ3 Natural Gas Price
Let’s get our data.
![Page 108: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/108.jpg)
Setup a view to fetch data for electric heat model
108
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
2
4
3 1
Hadoop Cluster
Algorithms
2
4
3 1
Views
![Page 109: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/109.jpg)
Implement electric heat model in a view
109
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
2
4
3 1
Hadoop Cluster
Algorithms
2
4
3 1
Views
![Page 110: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/110.jpg)
We know each customer’s heat type
110
Probability(purchase) = Pr(Electric Heat) =
δ1 Weather Sensitivity +
δ2 Neighbors Heat +
δ3 Natural Gas Price
Let’s continue with our water heater purchase model.
![Page 111: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/111.jpg)
We’re ready to model purchase behavior
111
Probability(purchase) = β1 Electric Heat +
β2 Similar Purchases +
β3 Neighbors Purchased +
β4 Response Rate +
β5 Type Of Message
Let’s get our data.
![Page 112: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/112.jpg)
Setup a view to fetch data for purchase behavior model
112
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
2
4
3 1
Hadoop Cluster
Algorithms
2
4
3 1
Views
![Page 113: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/113.jpg)
Implement purchase behavior model
113
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
2
4
3 1
Hadoop Cluster
Algorithms
2
4
3 1
Views
![Page 114: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/114.jpg)
We have our desired result
114
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
2
4
3 1
Hadoop Cluster
Algorithms
2
4
3 1
Views
One query to get the list
![Page 115: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/115.jpg)
Major plumbing in the old world
115
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 116: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/116.jpg)
Some considerations on the past vs now
116
Refresh data
Score new households
Add new data source
Build new model
Past Now
![Page 117: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/117.jpg)
Refreshing data is a breeze
117
Refresh data Major plumbing Single query
Score new households
Add new data source
Build new model
Past Now
![Page 118: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/118.jpg)
Easy to calculate insights for new households
118
Refresh data Major plumbing Single query
Score new households
Major plumbing Single query
Add new data source
Build new model
Past Now
![Page 119: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/119.jpg)
New data? No problem.
119
Refresh data Major plumbing Single query
Score new households
Major plumbing Single query
Add new data source
Major plumbing Couple lines of SQL
Build new model
Past Now
![Page 120: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/120.jpg)
Re-use previous work for new models
120
Refresh data Major plumbing Single query
Score new households
Major plumbing Single query
Add new data source
Major plumbing Couple lines of SQL
Build new model
Major plumbing Re-use views
Past Now
![Page 121: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/121.jpg)
Hadoop radically reduces plumbing
121
Refresh data Major plumbing Single query
Score new households
Major plumbing Single query
Add new data source
Major plumbing Couple lines of SQL
Build new model
Major plumbing Re-use views
Past Now
![Page 122: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/122.jpg)
Big data
![Page 123: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/123.jpg)
Big data
Quantity
![Page 124: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/124.jpg)
Big data
Variety + Quantity
![Page 125: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/125.jpg)
It doesn’t have to be like this
125
Utility usage data
Thermostat data
Weather data
Customer interaction
history
Additional data
streams
Analytics server
![Page 126: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/126.jpg)
You could look for a new job
126
http://careers.stackoverflow.com/jobs?searchTerm=data+scientist&location=
![Page 127: Data Science with Hadoop at Opower](https://reader031.fdocuments.us/reader031/viewer/2022021307/62074c7a49d709492c3006c9/html5/thumbnails/127.jpg)
Hadoop
Big data plumbing