Approximate l -fold Cross-Validation with Least Squares...

12
Approximate l -fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression Dr. Richard Edwards (UT, Amazon) Hao Zhang (UT) Dr. Joshua New (ORNL) Dr. Lynne Parker (UT)

Transcript of Approximate l -fold Cross-Validation with Least Squares...

Page 1: Approximate l -fold Cross-Validation with Least Squares ...web.eecs.utk.edu/~jnew1/presentations/2013_ICMLA... · 2 Presentation name Energy is the Defining Challenge of Our Time

Approximate l -fold Cross-Validation with Least Squares SVM and Kernel Ridge Regression

Dr. Richard Edwards (UT, Amazon) Hao Zhang (UT) Dr. Joshua New (ORNL) Dr. Lynne Parker (UT)

Page 2: Approximate l -fold Cross-Validation with Least Squares ...web.eecs.utk.edu/~jnew1/presentations/2013_ICMLA... · 2 Presentation name Energy is the Defining Challenge of Our Time

2 Presentation name

Energy is the Defining Challenge of Our Time

• Buildings in U.S. – 41% of primary energy/carbon,

72% of electricity, 34% of gas

• Buildings in China – 60% of urban building floor

space in 2030 has yet to be built

• Buildings in India – 67% of all building floor space

in 2030 has yet to be built

Global energy consumption will increase 50% by 2030

“Upgrading the energy efficiency of America’s buildings is one of the fastest, easiest, and cheapest ways to save money, cut down on harmful pollution, and create good jobs…”

President Obama, December 2, 2011, while announcing Better Buildings Challenge

Page 3: Approximate l -fold Cross-Validation with Least Squares ...web.eecs.utk.edu/~jnew1/presentations/2013_ICMLA... · 2 Presentation name Energy is the Defining Challenge of Our Time

3 Presentation name

Figure 1. U.S. Primary energy consumption, 2006 Source: Building Energy Data Book, U.S. DOE, Prepared by D&R International, Ltd., September 2008.

Page 4: Approximate l -fold Cross-Validation with Least Squares ...web.eecs.utk.edu/~jnew1/presentations/2013_ICMLA... · 2 Presentation name Energy is the Defining Challenge of Our Time

4 Presentation name

The Autotune Idea Making building energy models more useful by calibrating them to data

.

.

.

E+ Input Model

Presenter
Presentation Notes
First, let’s briefly discuss the cost-prohibitive, business-as-usual approach to building energy modeling and how technology being developed aims to remove those barriers. A person builds an input file, *CLICK advance* runs it through a simulation engine, and compares it to sensor data. *CLICK advance* They never match. *CLICK advance* So you have this expensive feedback loop where an energy modeling expert tediously edits THOUSANDS of parameters - typically in a text file - for occupancy schedules, HVAC system configuration, material properties, etc. This is so expensive, validation of energy models is rarely performed. What everyone really wants is… *CLICK advance* An easy button *CLICK advance* Where instead maybe we spin this off, for a computer to solve for us. *CLICK advance* The base E+ input file won’t be accurate at first *CLICK advance* But our algorithms successively get better *CLICK advance* And better *CLICK advance* Until eventually *CLICK advance* It’s within an acceptable tolerance *CLICK advance* We then return that tuned model to the user.
Page 5: Approximate l -fold Cross-Validation with Least Squares ...web.eecs.utk.edu/~jnew1/presentations/2013_ICMLA... · 2 Presentation name Energy is the Defining Challenge of Our Time

5 Presentation name

ORNL High Performance Computing Resources

Multi-million dollar cost share and infrastructure on 6 supercomputers including the world’s fastest Currently use 128,000+ cores to run over 530,000 EnergyPlus simulations and write 45TB of data in 68 minutes

Jaguar: 224k cores, 360TB memory, 10PB of disk, 1.7 petaflops Cost: $104 million DOE BTO: 500k hours granted (CY12)

Nautilus: 1024 cores, shared-memory

DOE BTO: 30k hours granted (CY11) 200k hours granted (CY12) 150k hours (CY13)

Frost: 2048 SGI Altix; 136 nodes 200k hours granted (CY13)

Lens cluster: 77 nodes – 45x128GB, 32x 64GB with NVIDIA 880 and Tesla dual-GPU EVEREST visualization (CY13)

Gordon (12,608 cores): 250k hours (CY13)

Kraken (112,896 cores): 100k hours (CY13)

Page 6: Approximate l -fold Cross-Validation with Least Squares ...web.eecs.utk.edu/~jnew1/presentations/2013_ICMLA... · 2 Presentation name Energy is the Defining Challenge of Our Time

6 Presentation name

Titan fully utilized

Page 7: Approximate l -fold Cross-Validation with Least Squares ...web.eecs.utk.edu/~jnew1/presentations/2013_ICMLA... · 2 Presentation name Energy is the Defining Challenge of Our Time

7 Presentation name

Computational Complexity

E+ Input Model

Problems/Opportunities: Thousands of parameters per E+ input file We chose to vary 156 Brute-force = 5x1052 simulations

main_Tot None_Tot(

1) None_Tot(

2) HP1_in_To

t HP1_out_

Tot HP1_back

_Tot HP1_in_fa

n_Tot HP1_comp

_Tot HP2_in_To

t HP2_out_

Tot HP2_back

_Tot HP2_in_fa

n_Tot 1172.5 0 0 6.75 18.75 0 0 0 6.75 18 0 0

E+ parameters

The Universe: 13.75 billion years?

Need 2.8x1028 of those

Presenter
Presentation Notes
The problem is that E+ input files have thousands of parameters (think back to the lines in your *.inp or *.idf files) *CLICK* We’ve identified ~156 that need to be varied (humans rarely more than 6…primarily infiltration, equipment schedules, etc.) *CLICK* And the experts have defined a min, max, and step-size that they would like to permute…yielding combinatorial explosion of 5x10^52 simulations *CLICK* Thankfully we have Jaguar, but… *CLICK* It would take all 299k processors running full tilt 4.5x10^31 lifetimes of the universe to brute force it…this problem would bring the 6th fastest supercomputer in the world to its knees…and this is just for 1 building
Page 8: Approximate l -fold Cross-Validation with Least Squares ...web.eecs.utk.edu/~jnew1/presentations/2013_ICMLA... · 2 Presentation name Energy is the Defining Challenge of Our Time

8 Presentation name

MLSuite

Nautilus Supercomputer

• Matlab+packages, R, libSVM • Support Vector Machines • Genetic Algorithms • FF/Recurrent Neural Networks • (Non-)Linear Regression • Self-Organizing Maps • C/K-Means • Ensemble Learning

Presenter
Presentation Notes
The godfather of modern AI once said “the world is the best model of itself” In that spirit, I proposed that we use sensor data to quantify the state of the world, then use machine learning algorithms to predict building characteristics (such as whole building energy consumption) In partnership with UT, we have developed an XML-base ML_Suite with which we can easily define different AI tasks and run pattern recognition using UT’s Nautilus supercomputer
Page 9: Approximate l -fold Cross-Validation with Least Squares ...web.eecs.utk.edu/~jnew1/presentations/2013_ICMLA... · 2 Presentation name Energy is the Defining Challenge of Our Time

9 Presentation name

Big Data Opportunities • EnergyPlus - Whole building energy sim – 600k lines Fortran • Input: 1,000-3,000 parameters for a standard building

– Geometry, equipment, schedules, weather, ~8 properties/material – We vary a subset of these ~156

• Output: annual at 15 min intervals – ~35 MB csv file (35k rows, 96 fields)

• Four types of buildings – Residential – ZEBRAlliance house #1 : 5M simulations – Warehouses : 1M – Stand-alone retail : 1M – Medium office : 1M

• 8M simulations*35MB = 270TB, http://autotune.roofcalc.com

Page 10: Approximate l -fold Cross-Validation with Least Squares ...web.eecs.utk.edu/~jnew1/presentations/2013_ICMLA... · 2 Presentation name Energy is the Defining Challenge of Our Time

10 Presentation name

Richard’s slides

• Theoretical contributions to learners

Page 11: Approximate l -fold Cross-Validation with Least Squares ...web.eecs.utk.edu/~jnew1/presentations/2013_ICMLA... · 2 Presentation name Energy is the Defining Challenge of Our Time

11 Presentation name

Autotune calibration of building energy models

MLSuite - HPC-enabled suite of 12+ machine learning algorithms for large data mining

ASHRAE G14 Requires

Autotune Results

Using Monthly utility data

CV(RMSE) 30% 0.318% NMBE 10% 0.059%

Using Hourly utility data

CV(RMSE) 15% 0.483% NMBE 5% 0.067%

Autotune could have saved 2+ man-months of effort (over 2 calendar years) modeling 1 field demonstration building

Within 30¢/day (actual use $4.97/day)

Residential Commercial

Hourly – 8% Monthly – 15%

Average error of each input

parameter

Page 12: Approximate l -fold Cross-Validation with Least Squares ...web.eecs.utk.edu/~jnew1/presentations/2013_ICMLA... · 2 Presentation name Energy is the Defining Challenge of Our Time

12 Presentation name

Jibo Sanyal

Mahabir Bhandari Som

Shrestha

Joshua New Aaron Garrett

Buzz Karpay

Richard Edwards

The Autotune Team

http://autotune.roofcalc.com