DRAFT Workplan for Program Year 2019 HVAC Roadmap · California Public Utilities Commission HVAC...

DRAFT Workplan for Program Year 2019

HVAC Roadmap

CALIFORNIA PUBLIC UTILITIES COMMISSION

EM&V Group A

June 15, 2020

DNV GL - ENERGY

SAFER, SMARTER, GREENER

California Public Utilities Commission HVAC Roadmap Workplan

Page i

Table of Contents

1 OVERVIEW ................................................................................................................... 1

Programs and measures ..................................................................................... 1

2 COMMON WORKPLAN DELIVERABLES .............................................................................. 7

HVAC deliverable 1: Workplan and updates ........................................................... 7

HVAC deliverable 2: Progress reports and updates ................................................. 7

HVAC deliverable 3: Kickoff meetings ................................................................... 7

HVAC deliverable 4: Monthly progress meeting ...................................................... 7

HVAC deliverable 5: Quarterly stakeholder workshops & webinars ........................... 7

HVAC deliverable 6: Annual EM&V Master Plan update—gaps & emerging

issues report ..................................................................................................... 8

3 HVAC-SPECIFIC WORKPLAN DELIVERABLES ..................................................................... 9

HVAC sector deliverable 7. Data collection and sampling approach ........................... 9

HVAC sector deliverable 8: Program analysis & recommendations ........................... 17

HVAC sector deliverable 9: Gross savings estimates .............................................. 17

HVAC sector deliverable 10: Net savings estimates ............................................... 22

HVAC sector deliverable 11: Impact evaluation reports .......................................... 22

4 WORKPLAN SCHEDULE ................................................................................................. 26

5 APPENDIX A - SAMPLE DESIGN AND SELECTION ............................................................. 27

6 APPENDIX B - DATA COLLECTION FRAMEWORK DEVELOPMENT ......................................... 30

7 APPENDIX C - GROSS METHODS .................................................................................... 33

Approach ......................................................................................................... 33

Gross savings methods ...................................................................................... 33

Determining the most appropriate baseline .......................................................... 37

Developing specific gross savings methods and approaches ................................... 38

Scope .............................................................................................................. 38

8 APPENDIX D - TWO-STAGE BILLING ANALYSIS METHODOLOGY ......................................... 40

Stage 1. Individual premise analysis .................................................................. 40

Stage 2. Cross-sectional analysis ....................................................................... 42

Decomposition of whole-home savings ................................................................ 43

9 APPENDIX E - NET-TO-GROSS METHODS ........................................................................ 46

Approach ......................................................................................................... 46

Scope .............................................................................................................. 49

Tasks .............................................................................................................. 50

10 APPENDIX F - WORKPLAN COMMENTS ............................................................................ 57

Table of Exhibits

Figure 1. Example of application to California upstream HVAC program ......................................... 49


Page ii

Table 1. PY2019 evaluated measure groups ................................................................................ 1

Table 2. PY2019 first-year gross savings claims for HVAC ESPI and Non-ESPI measure groups .......... 3

Table 3. Savings of selected commercial HVAC measure groups programs ...................................... 4

Table 4. Savings of selected residential HVAC measure groups programs ........................................ 5

Table 5. Data collection and sampling tasks ................................................................................ 9

Table 6. Estimated population and sample sizes for the PTAC Controls measure groups ................... 11

Table 7. List of residential HVAC measure groups with census approach......................................... 11

Table 8. Summary of data sources and applicable measure groups................................................ 13

Table 9. Residential HVAC measure evaluation groups and periods in PY2019 evaluation ................. 19

Table 10. Summary of the residential HVAC measure savings analysis plan .................................... 20

Table 11. Residential HVAC measure groups NTG evaluation activities ........................................... 22

Table 12. Summary of milestones and deliverables for the PY2019 HVAC workplan ......................... 26

Table 13. Typical frames and stratification variables for different populations sampled ..................... 27

Table 14. Standard survey modules to be developed ................................................................... 31

Table 15. Installation verification (deemed) gross savings strengths, limitations, and applications .... 33

Table 16. Basic rigor gross savings methodologies, strengths, limitations, and applications .............. 34

Table 17. Enhanced rigor gross savings methodologies, strengths, limitations, and applications ........ 35

Table 18. Methods applicable to established baselines ................................................................. 38

Table 19. Primary NTGR methods, limitations, and potential improvements .................................... 47

Table 20. Timing, efficiency, and quantity by measure ................................................................. 50

Table 21. HVAC Roadmap NTG evaluation activities by measure group .......................................... 52

Table 22. Question themes across 3 causal pathways for distributors and buyers ............................ 55

Table 23. Workplan comments .................................................................................................. 57


Page 1

1 OVERVIEW

This workplan describes the heating, ventilation, and air conditioning (HVAC) measure groups that

DNV GL will evaluate for Program Year 2019 (PY2019) and the methods we will use.

Our evaluation activities include:

1. Evaluating the gross and net peak demand (kW), electrical energy (kWh), and gas energy (therm)

savings for selected measure groups through energy consumption analysis of interval data,

targeted input parameter data collection, revision of California Database for Energy Efficiency

Resources (DEER) prototype measure analysis, and in-depth interviews with distributors,

contractors/ installers, and end users.

2. Determining reasons for deviations from expected savings due to different-than-expected measure

potential or implementation effectiveness.

3. Using these results, and the primary data collected to support these efforts, to assist with

updating ex ante workpapers and the DEER values.

Table 1 lists the measure groups, their 2019 Efficiency Savings and Performance Incentive (ESPI)

status, and whether they will receive gross, net, or both treatments.

Table 1. PY2019 evaluated measure groups

Measure Group Sector 2019 ESPI Gross

Savings Evaluation

Net Savings

Evaluation

HVAC PTAC Controls Commercial Yes Yes Yes

HVAC Rooftop/Split System Commercial No Yes No

HVAC Motor Replacement Residential Yes Yes Yes

HVAC Duct Sealing Residential Yes Yes Yes

HVAC Refrigerant Charge

Adjustment (RCA) Residential Yes Yes Yes

HVAC Maintenance Residential Yes Yes Yes

HVAC Controls Time Delay Relay Residential No Yes Yes

HVAC Coil Cleaning Residential No Yes Yes

HVAC Furnace Residential No Yes Yes

Programs and measures

DNV GL consulted the most up-to-date program year (PY2019) tracking data available on California

Energy Data and Reporting System (CEDARS) and the 2019 ESPI uncertain measure list to identify the

research priorities for HVAC sector.

This section describes the programs and measures covered by this evaluation.


Page 2

1.1.1 Savings by measure group

For PY2019 we will evaluate both gross and net saving impacts for five ESPI measure groups and

three non-ESPI measure groups. We will also evaluate only gross savings impacts from one non-ESPI

measure group. The measure groups selected for this evaluation effort were chosen based on several

considerations, primary among them:

ESPI status in PY2019 and, to a lesser extent, in subsequent years

The measure group’s ranked contribution to first year and lifetime savings

Year-over-year trends in savings contributions

Previous evaluation activity and findings

The ESPI measure groups being evaluated for the 2021 Bus Stop, by sector, are:

Commercial sector ESPI

PTAC Controls. These measures involve retrofit add-on controls to package terminal air conditioner

(PTAC) units in lodging guest rooms. The controls turn off or modify setpoints of the guest room PTAC

unit when the room is unoccupied.

Residential sector ESPI

Motor Replacement. These measures involve the replacement of existing permanent split

capacitor (PSC) supply (i.e., furnace, indoor, or air handler unit) fan motors with high-efficiency

brushless fan motors in residential applications that use central air-cooled direct expansion cooling

and/or furnace HVAC equipment.

Duct Sealing. These measures involve testing and sealing residential ductworks to reduce

leakage to specified levels.

Maintenance. This measure group used to be a bundle of individual Quality Maintenance

measures such as coil cleaning and RCA; in recent years it has been streamlined to include only

the initial ACCA 4 assessment and maintenance contracts. The measures that used to be included

are now separate measures with their own savings claims. Many of these separate measures are

part of this evaluation.

Refrigerant Charge Adjustment (RCA). This measure group involves optimizing an HVAC

system’s performance by adding or removing refrigerant from residential HVAC systems to meet

manufacturer recommendations.

The non-ESPI measure groups selected for gross and net savings impacts are as follows.

Commercial sector Non-ESPI

Rooftop or Split Systems. These measures, higher efficiency package rooftop (RTUs) or split HVAC

systems, are delivered primarily through upstream, distributor-focused programs and are generally a

one-to-one replacement of existing HVAC units. This measure group was selected for gross savings

evaluation due to its large contribution to the HVAC portfolio, recent ESPI status, and previous

evaluation findings.


Page 3

Residential sector Non-ESPI

Time Delay Relay Controls. These measures are retrofit add-on devices that delay the

evaporator fan cycle off time to take advantage of the residual liquid refrigerant remaining in the

evaporator after the compressor cycles off, thus increasing the cooling efficiency of the HVAC

system. This measure group was selected for gross and net savings evaluation because it is

commonly claimed by residential-focused HVAC programs with ESPI measure groups.

Coil Cleaning. HVAC system coils, both evaporators (indoor) and condensers (outdoor),

accumulate debris on their surfaces, which reduces their convective heat transfer performance

through fouling. These measures involve HVAC technicians cleaning the coils to remove this

fouling, restoring the performance of the coils. This measure group was also selected for gross and

net savings evaluation due to it commonly being claimed by residential-focused HVAC programs

with ESPI measure groups.

Furnace. These measures, higher efficiency residential furnaces, are delivered primarily through

upstream programs and are aimed at one-to-one replacements of existing furnaces. This measure

group was selected for gross and net savings evaluation because of its significant contribution to

gas energy savings for the HVAC portfolio.

Table 2 shows the HVAC ESPI and non-ESPI measure groups selected for evaluation in PY2019 and

the consolidated remaining measure groups. The table also shows the kW, kWh, and therm savings

claimed in PY2019 based on the available CEDARS data.

Table 2. PY2019 first-year gross savings claims for HVAC ESPI and Non-ESPI measure

groups

ESPI Uncertain

Measure List Measure Groups kW % kW kWh

% kWh

Therms %

Therms

ESPI

HVAC PTAC Controls 6,280 21% 17,831,593 27% 0 0%

HVAC Motor Replacement 5,872 19% 7,475,795 11% -36,834 -3%

HVAC Refrigerant Charge Adjustment (RCA)

2,386 8% 2,381,667 4% 727 0%

HVAC Duct Sealing 2,964 10% 2,230,496 3% 166,435 14%

HVAC Maintenance 0 0% 0 0% 0 0%

Non-ESPI

HVAC Rooftop/ Split Systems 5,614 19% 10,866,530 17% -51,830 -4%

HVAC Controls Time Delay Relay

3,699 12% 7,455,723 11% 0 0%

HVAC Coil Cleaning 611 2% 615,112 1% -58 0%

HVAC Furnace 0 0% 0 0% 355,146 31%

HVAC measure groups not evaluated

2,793 9% 16,340,202 25% 723,645 63%

Total Deemed HVAC 30,220 100% 65,197,119 100% 1,157,232 100%

Note: Savings claims for PY2019 by measure group and program group will be included when final claims data become available in June 2020.

Measures prioritized for evaluation are of significant importance either because they are on the ESPI list (shaded in blue in Table 1) or

because they are significant contributors to HVAC energy efficiency portfolio claims.


Page 4

1.1.2 Savings by program

Table 3 lists the programs offering the commercial HVAC measure groups being evaluated along with

the measures’ first year and lifecycle savings.

Table 3. Savings of selected commercial HVAC measure groups programs

Program ID, Name Measure

Group

First Year

Gross kW

First Year

Gross kWh

Lifecycle

Net kWh

First Year Gross Therm

Lifecycle Net

Therm

PGE210112,

School Energy Efficiency

HVAC

Rooftop/

Split

Systems

7 66,249 894,362 0 0

PGE210143,

Hospitality Program

HVAC PTAC

Controls 4,947 14,473,895 47,231,453 0 0

PGE21015,

Commercial HVAC

HVAC

Rooftop/

Split

Systems

3,116 5,942,247 71,306,964 -32,320 -387,845

PGE2110051, Local Government

Energy Action Resources (LGEAR)

HVAC PTAC

Controls 99 217,216 809,890 0 0

PGE211007,

Association of Monterey Bay Area Governments (AMBAG)

HVAC PTAC

Controls 116 293,185 952,851 0 0

PGE211023,

Silicon Valley

HVAC PTAC

Controls 66 212,107 689,348 0 0

PGE211024,

San Francisco

HVAC PTAC

Controls 633 1,484,175 4,823,569 0 0

SCE-13-SW-002F,

Nonresidential HVAC Program

HVAC

Rooftop/

Split

Systems

1,882 3,723,445 46,663,527 -14,733 -188,077

SDGE3224, SW-COM-Deemed

Incentives-HVAC Commercial

HVAC PTAC

Controls 420 1,151,015 3,740,799 0 0

HVAC

Rooftop/

Split

Systems

509 1,042,634 13,204,557 -1,619 -21,382

Totals 11,794 28,606,168 190,317,319 -48,673 -597,304


Page 5

Table 4 lists the programs offering the residential HVAC measure groups being evaluated along with

the measures’ first year and lifecycle savings.

Table 4. Savings of selected residential HVAC measure groups programs

Program ID, Name

Measure Group First Year Gross kW

First Year Gross kWh

Lifecycle Net kWh


Lifecycle Net Therm

BAYREN08,

Single Family

HVAC Duct Sealing 46 35,773 178,151 15,129 75,344

HVAC Furnace 0 0 0 118,946 1,186,589

PGE210011, Residential Energy Fitness program

HVAC Controls Time

Delay Relay 979 1,391,827 4,215,739 0 0

HVAC Motor

Replacement 1,146 1,293,193 2,353,087 -18,858 -34,194

HVAC RCA 523 451,108 1,124,753 -67 -167

HVAC Coil Cleaning 198 174,324 316,657 -24 -44

HVAC Duct Sealing 2 1,370 3,411 260 648

HVAC Maintenance 0 0 0 0 0

PGE21006, Residential

HVAC

HVAC Controls Time

Delay Relay 920 1,575,185 4,725,903 0 0

HVAC RCA 590 641,915 1,598,368 -60 -150

HVAC Motor

Replacement 186 205,740 370,645 -3,217 -5,797


PGE21008,

Enhance Time

Delay Relay

HVAC Motor

Replacement 354 635,982 1,196,748 -4,702 -8,847

HVAC RCA 39 72,374 180,879 -17 -43


HVAC Controls Time

Delay Relay 2 2,708 8,347 0 0

PGE21009,

Direct Install

for Manufactured and Mobile Homes

HVAC Motor

Replacement 1,336 1,397,449 3,080,241 -10,057 -20,743

HVAC Controls Time

Delay Relay 331 417,868 1,538,128 0 0


HVAC RCA 269 265,622 699,545 1 3

HVAC Coil Cleaning 12 11,135 27,239 0 0


SCE-13-SW-

001G, Residential Direct Install Program

HVAC Motor

Replacement 2,214 3,167,562 10,019,829 0 0

HVAC Controls Time

Delay Relay 1,057 2,934,386 9,659,657 0 0


SCE-13-TP-

001, Comprehensive Manufactured Homes

HVAC Controls Time

Delay Relay 410 1,133,751 4,076,630 0 0

HVAC Motor

Replacement 637 775,869 2,630,605 0 0

HVAC Duct Sealing 634 441,368 1,136,128 19,346 50,026


Page 6

Program ID, Name

Measure Group First Year Gross kW

First Year Gross kWh

Lifecycle Net kWh


Lifecycle Net Therm

SCG3702,

RES-Residential Energy Efficiency

Program

HVAC Furnace 0 0 0 23,972 287,662

SCG3706,

RES-

Residential HVAC Upstream

HVAC Furnace 0 0 0 210,823 2,529,880

SCG3765,

RES-Manufactured

Mobile Home

HVAC Duct Sealing 1,159 841,954 4,192,933 56,300 280,375

SCG3820,

RES-Direct

Install Program

HVAC Duct Sealing 413 315,903 4,719,597 25,353 378,781

SDGE3207,

SW-CALS-MFEER

HVAC RCA 406 320,538 2,660,462 -57 -473


SDGE3211,

Local-CALS-

Middle Income Direct Install (MIDI)

HVAC RCA 16 10,588 87,880 -4 -37


SDGE3212,

SW-CALS-Residential HVAC-QI/QM


SDGE3279,

3P-Res-

Comprehensive Manufactured-Mobile Home

HVAC RCA 470 457,254 3,795,210 -5 -38


SDGE3302,

SW-CALS - Residential HVAC Upstream

HVAC Furnace 0 0 0 1,405 16,859

Totals 15,369 19,887,791 66,758,702 484,481 4,861,390

1.1.3 Workplan organization

This workplan is organized into four sections covering the following content:

Section 1 (this section) provides an overview of the workplan.

Section 2 describes the deliverables that are common to all four roadmaps.

Section 3 describes the HVAC-specific evaluation approach.

Section 4 describes the HVAC sector workplan schedule.


Page 7

2 COMMON WORKPLAN DELIVERABLES

HVAC deliverable 1: Workplan and updates

The primary measure groups selected for this evaluation are from the statewide list of ESPI uncertain

measures. This evaluation will build on the methods from the 2010-2012, 2013-2015, and 2017-2018

program year HVAC evaluations. We will meet the 2021 EM&V Bus Stop for program year 2019 by

estimating gross savings by a combination of approaches as appropriate for each measure group.

These include billing and advanced metering infrastructure (AMI) data analysis, remote data

collection/ verification, simulation modeling, and others.

We plan to reconsider the net-to-gross ratio (NTGR) estimation methods for the March 2021 EM&V

Bus Stop. Key considerations are making methodologies more consistent across the various measure

groups, responding to stakeholder comments related to the PY2017 and PY2018 methods, and

diagnostics of the methods used in the PY2017 and PY2018 evaluations where applicable.

We will work with Commission staff to modify the methodology for estimating the NTGR and produce a

memo to Commission staff detailing the approach. DNV GL’s team will execute the methodology after

receiving Commission staff approval. DNV GL’s team expects to continue to use customer and

contractor survey responses as a core source of data for the NTGR estimates.

HVAC deliverable 2: Progress reports and updates

DNV GL’s team will provide monthly progress reports and updates that focus on milestone tracking

and all deliverables within this workplan.

HVAC deliverable 3: Kickoff meetings

DNV GL participated in a kickoff meeting involving key members of our team and Commission staff

during the week of May 8, 2020. The primary objectives of this meeting were to discuss and refine the

draft workplan, to reorient our team to the Commission’s administrative and technical expectations, to

affirm communications protocols, to review objectives and methods, to discuss task and subtask

prioritization, and to address other items.

HVAC deliverable 4: Monthly progress meeting

Key members of DNV GL’s team expect to participate in meetings with Commission staff and other

EM&V contractors on an ad-hoc basis or regular schedule. We also expect to participate in

administrative check-in discussions with Commission staff every two weeks (or per schedules

determined by the Energy Division Project Manager [EDPM]) to report on contract status, budget, and

other relevant matters. We will work closely with our EDPM to identify mutually agreeable dates and

times for these meetings.

HVAC deliverable 5: Quarterly stakeholder workshops &

webinars

Key members of DNV GL’s team will:

Conduct stakeholder workshops or webinars for deliverable milestones and collaborate via monthly

project coordination group (PCG) meetings

Address and document responses to stakeholder comments


Page 8

Develop summaries and presentations for technical documents and other briefing materials in

layman’s terms per Commission staff instructions

Manage all communications and summarize work products for decision-makers and key

stakeholders per Commission staff requests

Summarize data and information from the completed stakeholder engagement activities

DNV GL’s team has committed at least two key team members to participate in all key stakeholder

engagement activities.

HVAC deliverable 6: Annual EM&V Master Plan update—

gaps & emerging issues report

Deliverable 2.6 involves preparation of a Gaps and Emerging Issues Report and providing support to

Commission staff for updating the Annual EM&V Master Plan.

Gaps and Emerging Issues Report. This report has been replaced by a series of memos connected

to Deliverable 8 that will identify and describe major changes, challenges, and emerging issues in

the industry and gaps in the current year’s EM&V activities and methods pertaining to the four

Group A sectors. The purpose is to simplify Commission staff decision-making processes regarding

future EM&V research by clearly identifying outstanding research questions and issues/challenges

that Commission staff may wish to address in the EM&V Master Plan.

The first document will be a memo that covers an analysis of the ABALs. The DNV GL team will

provide a draft to the CPUC by January 23, 2020.

The second document will be a report that consolidates the PY18 impact evaluations. The DNV GL

team will plan to provide a draft of this document by early June (barring any unforeseen

challenges).

The third document will be another memo on a yet-to-be-determined topic with a due date by the

end of 2020.

EM&V Master Plan Update. This subtask also involves assisting Commission staff with the plan

update, as needed


Page 9

3 HVAC-SPECIFIC WORKPLAN DELIVERABLES

This section describes the HVAC sector-specific deliverables (Deliverables 7 and 9-11) for this

evaluation.

HVAC sector deliverable 7. Data collection and sampling

approach

We will design the data collection and sampling work under Deliverable 7 to meet the needs of

Deliverable 1 (Research and Evaluation Workplans), Deliverable 8 (Program Analysis and

Recommendations), Deliverable 9 (Gross Savings Estimates) and Deliverable 10 (Net Savings

Estimates). As part of Deliverable 7, we will develop streamlined data collection strategies to serve the

needs of multiple deliverables at the required rigor levels.

Table 5. summarizes the subtasks to complete Deliverable 7. A more thorough discussion of each

subtask follows the table.

Table 5. Data collection and sampling tasks

Task Description Coordination with Other

Deliverables Key Activities

1 Planning/

Workplan

Coordination

1, 8, 9, 10

Determine required data collection activities, sample

design parameters, inter-dependencies, and level of

coordination.

2 Data Management

and Quality Control 8, 9, 10

Data requests to Program Administrators (PAs).

Secure data access management.

Data transfer to/from Data Management

Contractor.

Conduct cross-deliverable, cross-sector data

collection.

3 Sample Design and

Selection 8, 9, 10

Prepare gross and net sample frame(s) according

to study objectives.

Select samples to meet precision requirements.

4 Instrument Design

Frameworks 8, 9, 10 Prepare guidance documents and templates.

5 Develop Data

Collection

Instruments

8, 9, 10 Prepare standard modules.

Vet sector/program-specific modules.

6 Training 8, 9, 10

Train DNV GL staff on information protection and

confidentiality procedures, customer contacts,

instrument administration, and data management.

The DNV GL PM and the EDPM will discuss the

scope of field staff training. DNV GL staff will

conduct the training and invite Commission staff.

7 Statistical

Estimation 8, 9, 10

Generate sampling weights.

Calculate sample-based estimates and precision.


Page 10

3.1.1 Subtask 1. Planning and coordination

The HVAC Data Collection and Analysis team will work together with the leads for Deliverables 1, 8, 9,

and 10 to identify what types of data collection are needed from which respondents and existing

sources. We will review our data collection objectives, identify opportunities to consolidate data

collection within and across subsectors, and identify competing objectives and needs. This step

includes:

Review of approved sector metrics

Review of tracking data and the uncertain measure list to determine savings contributions,

uncertainty contributions, and new programs and measures

Determination of study priorities and rigor levels

Review of the types of information needed by each deliverable

Review of the data collection methods needed by each deliverable

3.1.2 Subtask 2. Data management and quality control

As part of this task, the first step in each cycle will be to retrieve the program tracking data and

request consumption data for participants. For planning, we will obtain monthly and annual

consumption data. For consumption data analysis, we will use daily and hourly data. We will also

verify the parameters that contribute to uncertainty against the current uncertain measures list. Any

additional data requirements identified in Subtask 1 will be submitted and integrated into the database

during Subtask

3.1.3 Subtask 3. Sample design and selection

From the nine selected measure groups, only the commercial PTAC/PTHP measure groups will use a

stratified ratio estimation approach for sample design. The remaining measure groups will use a

census approach where the entire program population will be evaluated.

We will sample from program year 2019 claims to meet the March 2021 EM&V Bus Stop. Beginning

with program year 2019 we do have the opportunity for quarterly or semi-annual sampling. We will

work with Commission staff to determine which measures and interventions will implement rolling

samples.

PTAC Controls measures

For the PTAC Controls measure group, DNV GL’s team will design the sample to achieve +/-10%

relative precision for each evaluated measure group at the 90% confidence level. We will stratify the

program population by PAs’ programs and sampling of the participant population will be at the

measure, unit, or site level, depending on the granularity of the data. For this measure groups we will

use an error ratio of 0.6 based on our previous experience with similar studies.

Table 6 shows the PY2019 populations and anticipated sampling sizes for the 2021 Bus Stop. These

figures are preliminary, as we await delivery of the finalized tracking and the opportunity to develop a

stratified sample frame. The finalized populations, claims, and sample sizes will be published in the

sampling and data collection memo.


Page 11

Table 6. Estimated population and sample sizes for the PTAC Controls measure groups

Measure Group PA PY2019

Participant Population

Anticipated Participant Sample Sizes for 2021

Bus Stop

HVAC PTAC Controls PG&E 180 60

SDG&E 20 10

Totals 200 70

Rooftop-Split measures

Rooftop-Split measure group was evaluated as part of the PY-2018 evaluation. In PY-2019 the

evaluation team will perform a discrepancy analysis between ex-post and ex-ante savings on this

measure group and true-up the unit energy savings (UES) values for this measure groups that doesn’t

require sampling.

The remaining HVAC measure groups (residential sector measure groups) will be evaluated using a

billing analysis approach where all sites in the program population will be evaluated. Table 7 shows

the HVAC measure groups that will use census approach for PY2019.

Table 7. List of residential HVAC measure groups with census approach

Measure Group

HVAC Motor Replacement

HVAC Duct Sealing


HVAC Maintenance


HVAC Coil Cleaning

HVAC Furnace

The detailed methodology of the sample design and section are described in Section 5 Appendix A of

the workplan.

3.1.4 Subtask 4. Data collection framework development

As part of this task the evaluation team will develop a data collection framework to improve

consistency, facilitate comparison of results across data collection efforts, reduce the time for survey

development, minimize review time, and facilitate quality assurance and quality control. The

framework will include:

Guidance and templates for instrument development

Standard question modules for common survey batteries

Recommendations on quality assurance/quality control (QA/QC) procedures

Guidance on data collection management

Guidance on sample management


Page 12

The details guidance of developing data collection framework is described in Section 6 Appendix B of

the workplan.

3.1.5 Subtask 5. Data collection instruments

Where appropriate, we will base data collection on our existing Commission-approved data collection

instruments. We will work with Commission staff and other stakeholders to assess, revise, and

approve these data collection instruments prior to collecting any data.

3.1.5.1 Commercial measure groups

Packaged Terminal Air Conditioner/Heat Pump (PTAC/PTHP) Controls

For the program year 2019 evaluation of PTAC/PTHP Controls measures, we will conduct interviews

with end users (primarily over the phone, supplemented with web-based interviews if required) of

participating facilities using utility-provided contact and equipment information. The phone interview

will include questions to verify measure installation and persistence and to establish the equipment’s

baseline control scheme. The information collected will be used to update installation rates and refine

gross savings estimates for PTAC/PTHP Controls measures.

The phone interview with contacts at participating end user facilities will be the primary mechanism for

data collection to assess gross savings. At the time of this writing, the evaluators assume that on-site

visits will not be feasible for PY2019 data collection, due to the ongoing COVID-19 pandemic. A

sample data collection plan for PTAC control measures will include:

Installation Characteristics: The most critical characteristics evaluators will inquire about

include the facility type, building vintage, and installed unit quantity per site. A list of additional

items to be recorded will be included in the sampling and data collection memo.

Equipment Nameplate: Evaluators will confirm the characteristics of the installed PTAC

controllers as well as the PTAC units being controlled. Evaluators will request the contact to

provide photographs of the equipment and nameplates and/or submit documentation to

objectively verify installation and characteristics.

Operating Characteristics: Evaluators will ask the facility contact about typical room operation

and set-point schedules. Trended operating data will be requested to be shared directly from the

site or through the installation vendor. The evaluator will obtain the heating and cooling

temperature set-point schedules for weekdays, weekends and holidays as well as temperature set-

points for occupied and non-occupied periods. The evaluator will ask for a list of holidays observed

at the facility (if applicable) as well as typical occupancy patterns and any notable changes in

operation from before and after the project took place.

Additional data: These include any documentation confirming measure installation or providing

additional insight into how the units are controlled before and after the project took place.

Rooftop/Split Systems

No data collection is proposed for Rooftop/Split System measure group. The evaluation team will

leverage PY 2018 evaluation data to address the discrepancy between the ex ante and ex post savings

estimate via simulation and eventually propose to true up the UES of this measure group based on the

simulation results.

3.1.5.2 Residential HVAC measure groups


Page 13

Coil Cleaning, Time Delay Relay Controls, Furnaces, Maintenance, Fan Motor Replacement, &

Duct Sealing

For program year 2019 we will use energy consumption analysis for estimating gross energy savings

for these measure groups. Gross savings estimates will be based on metered consumption data and

will not require data collection forms. See Section 3.3 for a discussion of our methodology for

producing gross savings estimates.

We will complete the gross savings estimates deliverable by January 2021 and incorporate the results

into the evaluation report. We will submit the draft gross savings deliverable to Commission staff prior

to finalization. Subsequent program periods will follow the same schedule for gross savings for the

measures discussed here.

3.1.5.3 Net attribution data collection

We will perform gross and net evaluations for measure groups listed previously in Table 1 in green.

To support our net savings estimates we propose to interview customers, contractors, and HVAC

distributors. Some of the specific efforts under this plan are:

Reviewing secondary sources for market share information pertaining to the upstream program

Conducting market actor interviews (participating distributors, contractors, customers, and end

users) focused on market structure for all units and participant distributor interviews to assess

program influence

Reviewing the program PIP and conduct interviews with program managers to discuss program

theory on influencing alternate equipment types where applicable

Conducting end-user interviews to assess free ridership for the downstream programs

DNV GL’s team has demonstrated effective stakeholder management in previous evaluation cycles by

including a review process for all data collection instruments—not only with the EDPM, but also with

PA program evaluation staff and other stakeholders. This process is particularly beneficial for

evaluations of newer programs or programs where there have been significant changes that

necessitate input from PA staff to refine and improve instruments. We will post data collection

instruments to Basecamp or other CPUC collaboration site.

3.1.5.4 Data sources

Data sources and applicable measure groups are summarized in Table 8Error! Reference source not

found. below.

This table shows some of the data sources and data collection activities across the measure groups for

this sector. Data will be used to provide a robust, accurate, and defensible ex post estimate of

measure impacts. Remote data collection efforts will focus on verifying the simulation model inputs

and short-term monitoring of critical equipment. We provide additional detail below the table.

Table 8. Summary of data sources and applicable measure groups

Data Sources Description Applicable Measure Group(s)


Page 14

Data Sources Description Applicable Measure Group(s)

Program

Tracking Data

IOU Program data includes number of

records, savings per record, program

type, name, measure groups, measure

description, incentives etc.

PTAC Controls

Rooftop & Split System

Fan Motor Replacement

Duct Sealing

RCA

Maintenance

Time Delay Relay

Coil Cleaning

Furnace

Program

Monthly Billing

Data

PA billing data including kWh

PTAC Controls


Duct Sealing

RCA

Maintenance

Time Delay Relay

Coil Cleaning

Furnace

Program

Advanced

Metering

Infrastructure

(AMI) Data

Detailed, time-based energy consumption

information

PTAC Controls


Duct Sealing

RCA

Maintenance

Time Delay Relay

Coil Cleaning

Furnace

Project Specific

Information

Project folders include scope of work,

energy audit reports, equipment model

and serial numbers, nominal efficiency,

test results, project costs, etc.

PTAC Controls


Manufacturer

Data Sheet

Data sheets Include equipment

specifications such as horsepower (HP),

efficiency, capacity, etc.

PTAC Controls


Telephone/Web

Surveys

Includes surveys of customers,

distributors, other market actors, and PA

program staff.

PTAC Controls


Duct Sealing

RCA

Maintenance

Time Delay Relay

Coil Cleaning

Furnace

On-site Surveys

Includes verifying measure installation,

gathering measure performance

parameters such as efficiency, schedules,

setpoints, building characteristics etc.

N/A

End-use

metering

Includes performing spot measurements,

short-term metering with data loggers,

performance measurements

N/A

The following list defines the data sources identified above in Table 8:


Page 15

Program tracking data. Each of the Program Administrators (PAs) will provide and upload

program tracking data onto a centralized server. We will then analyze, clean, re-categorize, and

reformat these datasets, if necessary. For programs and measures, the impact evaluation team

will review PA monthly reports and actual program tracking data to reconcile actual versus

reported claims, thereby validating PA tracking data uploads.

Project-specific information. The PAs maintain a paper and/or electronic files for each

application or project in their energy efficiency programs. These can contain various pieces of

information such as email correspondence written by the utility’s customer representatives

documenting various aspects of a given project such as the measure effective useful life (EUL),

incremental cost, measure payback with and without the rebate. As part of the file review process,

we will thoroughly review these documents to assess their reasonableness.

Data sheets from equipment manufacturers. As part of the gross data collection, we will

request technical specifications of the evaluated equipment from manufacturers and equipment

vendors. These data sheets typically include performance parameters of the equipment such as

horsepower, efficiency, capacity, energy efficiency ratio (EER).

Telephone/web surveys of participating customers and distributors. Both gross and net

deliverables will require telephone/web surveys. We will perform surveys with customers,

distributors, other market actors, and PAs.

On-site surveys. Because of the COVID-19 pandemic, DNV GL is not planning any on-site visits

during this evaluation period.

End-use metering. Because of the COVID-19 pandemic, DNV GL is not planning end-use

metering during this evaluation period.

3.1.6 Subtask 6. Training

DNV GL will conduct several kinds of training.

For data collection staff for each study, we will conduct training on customer contact procedures to

ensure professionalism in all interactions. We will review the purpose of each study and provide

training on each question. For data collection to be completed by staff who might not be familiar

with energy efficiency topics, the training includes high level explanations of these topics, and we

provide “cheat sheets” for those interviewers to reference during calls. For example, in the past,

we have found that subcontracted computer-aided telephone survey operators do not necessarily

understand the differences between an LED lamp, CFL, and incandescent lamp. Our cheat sheets

provide pictures of these different types of lamp to help the operators describe the differences to

respondents who might also be confused. We will correct any inconsistencies or confusing points

identified during the training.

For DNV GL team staff who are designing data collection instruments or using the results, we will

train on:

− Data formats required

− Quality control processes

− Data security procedures

− High-level sampling, weighting, and estimation methods


Page 16

3.1.7 Subtask 7. Weighting and estimation

After data collection is completed, DNV GL’s team will develop revised sampling weights to be used to

expand the sample results to the population. The sampling weights will reflect the sample stratification

and population counts and completed sample counts. The sampling weights may also incorporate

sample and population characteristics not used for explicit stratification. This approach allows us to

adjust more accurately for nonresponse, without requiring a deeply stratified sample.

As described above, response rates to all types of customer collection have been declining, and even

with the best practice methods there is the potential for the responding sample to be systematically

different from the overall population of interest. DNV GL’s sample expansion procedures incorporate

advanced non-response adjustment methods into our weighting and calibration. These methods allow

us to make maximal use of available population characteristics to produce tailored case expansion

weights for each respondent, resulting in substantial bias reduction for the final population estimates.

We will calculate the sample case weight as the product of three factors:

The inverse of the probability of selection into the targeted sample

The nonresponse adjustment, accounting for the selected units that did not respond

Post-stratification adjustment, calibrating the full sample to known population totals not included

in stratification.

This approach is far more effective at mitigating nonresponse bias than relying only on the selection

probability (factor 1), or on a combination of selection probability and post-stratification to control

totals (factor 3).

Analysis under other deliverables will use the collected data to determine values such as gross and net

savings for each sampled unit. Using the sample expansion weights and the design, work under

Deliverable 7 will develop estimates of the targeted population parameters, along with 90%

confidence intervals. For example, if verified gross savings is determined for each sampled customer

under Deliverable 9, and net savings for each customer under Deliverable 10, the overall realization

rate, NTGR, and confidence intervals for these will then be determined under Deliverable 7.


Page 17

HVAC sector deliverable 8: Program analysis &

recommendations

DNV GL’s team will conduct analyses across programs to produce recommendations regarding

potential program improvements related to costs, innovation, participation, and/or operational

efficiencies. To generate these results, we will conduct original program-tailored analyses and leverage

ongoing impact evaluation efforts. Specifically, we will review budgets and program implementation

plans, coordinate with existing impact data collection efforts or field targeted surveys, and possibly

analyze customer application and project approval processes or business plan metrics. Subsequent

communications will provide further detail regarding the inputs to and outcomes from this process.

These efforts will culminate in a May 2021 memorandum that addresses the Deliverable 8 objectives

HVAC sector deliverable 9: Gross savings estimates

The gross savings deliverable will be completed by January 2021 for the evaluation period and

incorporated into the final evaluation report and other deliverables. The draft gross savings deliverable

will be submitted to Commission staff for review prior to finalization. Subsequent program periods will

follow the same schedule for gross savings for the programs discussed here.

Below we review the subtasks associated with the gross savings estimates task for the HVAC sector.

3.3.1 Non-residential HVAC gross savings estimates

DNV GL’s HVAC team will calculate gross savings as energy savings and peak demand reduction by

using a combination of basic and enhanced rigors for the selected program year 2019 measure groups.

From the two selected measure groups, PTAC controls measure groups will use enhanced rigor to

estimate gross savings whereas the rooftop/split measure group will be evaluated utilizing basic rigor.

3.3.1.1 PTAC Controls Measure Group

The PTAC Controls measure group was included in the 2019 ESPI uncertain list and contributed over

19% of the total HVAC portfolio first year electric energy savings. For the program year 2019

evaluation, the evaluation team will use an enhanced rigor approach to evaluate the savings of this

measure group. The following section describes the aspects of determining gross savings estimates

that are specific to this measure group.

Unique analysis methods

Due to the COVID-19 pandemic, on-site data collection may not be feasible or allowed. Therefore, our

data collection activities will consist of remote verification of measure installation and key parameters,

as well as interviews to quantify basic program attribution.

We will conduct in-depth phone/web-based interviews with the site contact to verify the installation,

collect equipment specific nameplate information (e.g., make, model numbers, capacity, and

cooling/heating efficiencies) from the affected PTAC/PTHP units, assess the baseline operation, and

obtain details about pre- and post- installation occupancy rates, equipment run times and temperature

set-point schedules of the guest rooms. We will also request data logged by on-site guest room

energy management systems (GREMS) from the vendor and facility contact, if necessary.

The PG&E and SDG&E workpapers specify retrofit add-on (REA) as the event type. This will be the

default presumed basis for each measure. The workpapers document that for this REA event type, the

pre-existing HVAC units have no controls installed to modify the operation of the unit (compressor


Page 18

runtime or fan speed) based on space occupancy or temperature set-points. The evaluation team will

verify that the site-specific pre-existing conditions are consistent with this approach before use.

Developing the baseline model: We will utilize the collected data to adjust critical measure-specific

operational input parameters in the baseline eQUEST DEER prototype models. The appropriate DEER

prototype model based on building type, building vintage, and climate zone will be selected for each

project for this exercise. A baseline model will be constructed that represents how the guest room

energy systems were operated in the pre-installation scenario, including HVAC, lighting, and

appliances. We will also use pre-installation monthly and AMI billing data obtained for the facility to

verify seasonality and daily occupancy/usage patterns of guest rooms estimated by the baseline

eQUEST models.

Developing the as-built model: Once an appropriate baseline model is developed for each project, we

will develop a similar site-specific as-built model in eQUEST by modifying independent variables such

as post-installation equipment set-point schedules and occupancy rates gathered from data collection

and requested EMS logs. Finally, we will use the post-installation (and pre-COVID-19 pandemic)

monthly and AMI billing data obtained for the facility to verify seasonality and daily occupancy/usage

patterns of guest rooms estimated by the as-built eQUEST models.

These two models will form the basis of evaluating the savings for this measure. For each project in

the sample, the adjusted baseline and as-built models will be simulated to produce ex post unit

energy savings (UES) estimates to be multiplied with the number of units installed (for PG&E projects)

or capacity of PTAC/PTHP units affected by the measure (in tons, for SDG&E projects) to estimate the

ex post energy savings at the project level.

We will complete the gross savings deliverable by January 2021 and incorporate results into the final

evaluation report and other deliverables to meet the March 2021 bus stop. We will submit the draft

gross savings deliverable to Commission staff for their review prior to finalization.

3.3.1.2 Rooftop and Split Systems

For PY2019 we will use basic rigor to evaluate the savings of the Rooftop/Split System measure group.

We will use on-site data previously collected under the PY2018 evaluation, and further supporting data

such as from the workpaper archive, to develop ex-post UES values by adjusting critical DEER eQUEST

model input parameters. The adjusted models will be simulated to produce ex post savings estimates

for each climate zone building type and unit type combination.

3.3.2 Residential HVAC gross savings estimates

3.3.2.1 Coil Cleaning, Time Delay Relay Controls, Furnaces, Maintenance, Fan

Motor Replacement, & Duct Sealing

For PY2019, we will use energy consumption analysis and simulation modeling to estimate savings of

the residential HVAC measure groups. Our analysis will use 12 months of pre- and post-installation

kWh and therms data for the analysis. These energy use data will be weather normalized so that pre-

and post-installation normalized annual consumption (NAC) is analyzed to estimate savings for these

measures. We will use eQUEST simulation modeling of the DEER residential prototypes to generate

measure savings estimates that will inform the disaggregation of meter-level savings to measure

group savings.


Page 19

The NAC basic rigor method as described in the 2006 California Energy Efficiency Evaluation Protocols

(California Protocols) does not specify the use of a comparison group for aggregate program analysis.

We will use the recommended normalized metered energy consumption (NMEC) methods with a

comparison group to control for underlying trends when conducting consumption analysis. As a result,

our consumption analysis approaches will be high rigor.

3.3.2.1.1 Applicable protocol

Applicable protocols, for the proposed HVAC residential measures evaluation, are described in the UMP

Chapter 8 Whole-Building Retrofit with Consumption Data Analysis Evaluation Protocol.1 The protocols

provide guidance on quasi-experimental designs including two-stage methods and pooled fixed-effects

modeling approaches. Furthermore, the site-level modeling part of the proposed approach will be

consistent with CalTrack methods that have been prescribed for pay-for-performance programs. These

approaches are also consistent with California Protocol Enhanced rigor.

3.3.2.1.2 Impact methodologies

As shown earlier in Table 4, HVAC measures for residential use were offered by 16 different residential

energy efficiency programs across five PAs in PY2018 and PY2019. These programs delivered the

measures they offered using different delivery channels (e.g., rebates/incentives, direct install, and

upstream distributor incentives). However, the non-smart thermostat HVAC residential measures,

such as fan motor replacements and coil cleaning, were primarily delivered through direct install and

upstream distributor incentives.

The disruptions to residential routines precipitated by the outbreak of COVID-19 are going to result in

a structural break in energy use in 2020, which is the post period for households that installed

residential HVAC measures in PY2019. The primary focus of DNV GL’s PY2019 evaluation will thus be

on estimating HVAC measure savings among homes that installed these measures in program year

2018 through direct install programs.

PY2019 evaluation (which will be based on installations of 2018 HVAC measures) will provide a

complete picture of residential HVAC measure savings per household available in different housing

types and program delivery channels. In this case, first year post periods cover 2018 and 2019.

Energy use from this period is unaffected by COVID-19 disruptions. DNV GL will extend the analysis of

residential HVAC measure savings by examining changes in a second-year post period, which covers

2020. DNV GL’s PY2019 evaluation will thus involve two different post periods.

Table 9 summarizes the groups and time periods DNV GL’s PY2019 residential HVAC measures

evaluation will involve.

Table 9. Residential HVAC measure evaluation groups and periods in PY2019 evaluation

Participant

group

Installation

period Comparison Group Post period I Post period II

Multifamily

Direct Install 2018

Future (PY2019)

participants, matched

comparison group

2019 2020

1 Agnew, K.; Goldberg, M. (2017). Chapter 8: Whole-Building Retrofit with Consumption Data Analysis Evaluation

Protocol, The Uniform Methods Project: Methods for Determining Energy Efficiency Savings for Specific Measures. Golden, CO; National Renewable Energy Laboratory. NREL/SR-7A40-68564. http://www.nrel.gov/docs/fy17osti/68564.pdf


Page 20

Participant group

Installation period

Comparison Group Post period I Post period II

Manufactured

Direct Install 2018

Future (PY2019)


comparison group

2019 2020

All Residential

Direct Install 2018

Future (PY2019)


comparison group

2019 2020

Upstream

Furnace 2018

Future (PY2019)


comparison group

2019 2020

We will conduct a consumption data analysis to provide gross savings per unit separately for single

family, multifamily, and manufactured homes, and by climate zone to the extent available in the data.

We will combine PA data in the same climate zone in order to produce a single and consistent savings

per household estimate for the climate zone. We will extrapolate from these results to any climate

zone not robustly estimated directly in the consumption analysis, using methods similar to those that

have been applied in the ex ante process.

We will thoroughly review and assess tracking data to choose the homes that will be included in the

residential HVAC measures evaluation.

A summary of our savings analysis plan is presented in Table 10, below.

Table 10. Summary of the residential HVAC measure savings analysis plan

Workplan Component Included in the Analysis Output

Consumption data

analysis using data from

direct install programs

Customers participating in

PY2018 direct install programs

that deliver multiple measures

Gross savings per household for

direct install participants by climate

zone, in 2018/2019 and 2020 post

periods

Gross savings

extrapolation

Gross impacts for all PY2019

participants are estimated by

applying results from PY2018

participants (extrapolating unit

gross results from 2018

participants to the 2019

participants) to avoid

interference from COVID-19

disruptions

Gross savings per residential HVAC

measure by climate zone, in

2018/2019 and 2020 post periods

Surveys with customers Samples of customers from

each PY2019 program offering

residential HVAC measures

Samples of matched non-

participants used as

comparators

Verified installations PY2019

NTGR by program PY2019

Prevalence of residential HVAC

measures among the comparison

groups

Changes in household that impact

energy use for all customers

included in the billing analysis

3.3.2.1.3 Comparison groups


Page 21

DNV GL is conducting billing analysis using data from PY2018 participants on the assumption that

gross savings per household is the same for both PY2018 and PY2019 participants within the same

dwelling type, climate zone and program delivery. As indicated earlier, this decision is motivated by

the disruptions in energy use precipitated by COVID-19 in 2020 (the post period for PY2019

participants) that is expected to make pre- to post-period energy use comparisons and analysis of

program measure savings inappropriate.

The billing analysis is based on a quasi-experimental design that uses energy consumption data from

PY2018 participants and matched comparison non-participants. We plan to use two different

comparison groups. First, DNV GL will use future (PY2019) participants as comparison groups as they

are expected to be similar to current participants along dimensions that drive such households to self-

select into participating in programs offering residential HVAC measures. Even in the case of direct

install programs, where the decision to participate may be made by property managers rather than

occupants, current and future participants are likely to be the same along other unobservable

characteristics that affect the use of these measures.

DNV GL will also construct matched comparison groups from general population customers for the

two-stage consumption data analysis. This effort will involve matching algorithms that use

consumption data within strata defined by characteristics such as fuel type and geography. The

matching will also take trends in energy use into consideration.

3.3.2.1.4 eQUEST modeling to inform disaggregation of household-level savings

We will develop estimates of residential measure impacts installed at the same time by the programs

using DEER prototypes in eQUEST. These estimates will inform statistically adjusted engineering (SAE)

models, which will be used to disaggregate savings per household to the measure level, as described

in Section 8.3 in Appendix D. The residential DEER prototype models will be adjusted using the best

data available from workpapers, past evaluation studies and previous evaluation findings. We will

develop impact estimates, by building type and climate zone, for the 6 residential HVAC measures

under evaluation in PY2019 and we may also model additional commonly installed measures if we

observe their frequency and impacts are non-trivial. Applying eQUEST simulation results will provide

more realistic inputs to SAE models, which enables these models to separate the effects of different

measures more accurately.

3.3.2.1.5 Load shapes

DNV GL will also estimate hourly load and savings shapes for residential HVAC measures. Such

estimates will provide an understanding of when demand savings (in kW) occur from the program.

Savings load shape will identify the average hourly and 8,760 hourly load savings and, thus, the

periods during which program savings occur.

DNV GL will use customer-level regressions and difference-in-difference models to estimate savings

load shapes for the program. Details on the approach are provided in Section 7 Appendix C.

The findings have the potential to inform program improvement and the extent to which program

energy savings can be used as a resource. Since the data requirement will be substantial, DNV GL will

conduct the study using data for a sample of households from each program offering residential HVAC

measures. These samples will be a subset of the sites used in the consumption data analysis and will

be selected to be representative of all usage quartiles and climate regions.

3.3.2.1.6 Effective Useful Life (EUL)/Remaining Useful Life (RUL)


Page 22

The residential HVAC evaluation will use the ex-ante claimed EUL/RUL values for the evaluated

measures. We will coordinate with the cross-cutting ex-ante and EUL deliverable teams to determine

whether EUL/RUL update studies will be conducted in 2020.

We will complete the gross savings deliverable by January 2021 in the first evaluation period and will

incorporate results into the final evaluation report and other deliverables. We will submit the draft

gross savings deliverable to Commission staff prior to finalization.

HVAC sector deliverable 10: Net savings estimates

The net savings estimates for the PY2019 HVAC measure groups will be completed by January 2021.

The draft net savings results will be delivered to the Commission staff prior to finalization.

Commercial PTAC Controls measure group: the PTAC controls measure group will receive standard

rigor treatment. This measure group delivered to the commercial customers via PA’s direct install

delivery mechanism and discussion with the program staff of this measure group revealed the primary

influencer to be the end-user. Therefore, our team will conduct end-user surveys to assess program

effects on key decision makers based on the program design.

Residential HVAC Measure Groups: Across the five PAs, residential HVAC measure groups offered to

the end-users either via direct install or through downstream programs. Hence, for these measure

groups, we will conduct a combination of market actor (with installation contractors) and end-user

surveys. We will combine the NTG estimates for these different market streams to assess the program

effect on the market actors, the market actor effects on the end-users, and the product of those two

causal pathways. Table 11 shows the residential HVAC measure groups selected for NTG evaluation

along with their evaluation activities.

Table 11. Residential HVAC measure groups NTG evaluation activities

Measure Group Activities

HVAC Motor Replacement

Web-based surveys with end-users and

phone-based surveys with property

managers, where applicable

HVAC Duct Sealing


HVAC Maintenance


HVAC Coil Cleaning

HVAC Furnace Phone-based interviews with participating

equipment distributors

The details of our net-to-gross methodology are in Section 9 Appendix E of the workplan.

HVAC sector deliverable 11: Impact evaluation reports

In this section, we detail our approach to completing the impact evaluation reports. The primary

objective of this deliverable is to provide high quality, clearly written impact evaluation reports, which

include findings for the uncertain measures list each year for the HVAC sector by the deadlines the

Commission sets forth. Note that HVAC sector will produce two stand-alone impact reports: One

report will cover commercial PTAC controls measure group and the other report will comprise all HVAC

measure groups installed in residential structures.


Page 23

To achieve the primary objective, we will:

Conduct a staged review process with key reporting deliverables spread out weeks apart to allow

for feedback and revisions from Commission staff, key stakeholders, and the public

Start reporting as early as possible in the evaluation cycle to stay on schedule and maintain high

quality in all reporting deliverables

Craft clearly written methodologies sections for each report, including sample design, data

collection, analysis, and any other methodologies required for each study

Report study results that thoroughly address each of the research questions set forth in the final

research plans

Write concise and clearly written executive summaries so that study results are accessible to non-

technical audiences and are available for public consumption

Produce informative graphics to allow readers to quickly and easily interpret results and key

findings

To successfully complete the impact evaluation reports, we propose a set of reporting deliverables that

allow for review and feedback from Commission staff, stakeholders, and the public. The key reporting

deliverables include the following:

Draft and final outlines for the impact evaluation reports

Draft impact evaluation reports due to Commission staff

Draft impact evaluation reports due to stakeholders and the public

Stakeholder presentations/workshops

Final impact evaluation reports

The outlines, draft reports, stakeholder presentations, and final reports impact evaluation reports are

due at distinct stages in the reporting process to allow for adequate time for Commission and

stakeholder feedback and revisions. We provide further details on the reporting deliverables timeline

in the Schedule and Deliverables section.

3.5.1 Report layout and content

Each impact evaluation report will include, at minimum, the following sections:

Executive summary

Study approach and methodology

Data sources: document data sources used in report

Study results

Conclusions and recommendations

Appendices


Page 24

The reports will thoroughly address each of the objectives defined in the final research plan for each

study. The overall report will follow overarching style guidelines in the CPUC’s most recent

Correspondence and Reference Guide.

Executive summaries will be accessible to non-technical audiences. Language in the executive

summary will be clear, concise, and easily understandable and will be approximately 10% of the

length of the report it describes. DNV GL’s internal reviewers will include staff not involved with the

study who will provide guidance and editing support on the readability of the executive summary and

other sections of the report. We will also ensure that each executive summary follows Guide to Writing

an Effective Executive Summary, Navy and Marine Corps Public Health Center (updated June 2017).

Key stylistic elements we will apply in the executive summaries include:

Using clear language and minimizing the use of technical words or industry jargon

Keeping sentences short and to the point

Avoiding overly complex sentences with multiple ideas

Avoiding or minimizing the use of acronyms and clearly defining any acronyms used

Keeping the executive summary to 10 pages or less

For methodology sections, we will describe our study approach as simply as possible and ensure that

the description of our methodology is transparent and that our methodology can be replicated by

others. We will document the data sources used for each impact evaluation either in the main body of

the report or as a separate section in the appendices. The main body of each report will also include a

study results section that fully addresses the objectives laid out in the final research plan and end with

conclusions and recommendations. Appendices will include any data collection instruments used for

each impact evaluation and other key information relevant to the evaluation.

Appendices will conform to the guidelines in CPUC’s Energy Division and Program Administrator

Energy Efficiency Evaluation, Measurement and Verification Plan 2018-2020 (Version 9). These

sections will come from Deliverable 8, 9, and 10 respectively and will be compiled into an overall

database for reporting purposes.

3.5.2 Report editing

Devoting adequate time and resources to report editing is critical for producing high-quality final

reports. We will provide the key elements in our editing:

Readability, accessibility, flow, and logic

Grammar and style

Technical and peer review

Graphic design

Readability is essential for the reports to be accessible to non-technical audiences. The executive

summary will be clear, concise, and easily readable for non-technical audiences. A DNV GL

professional copyeditor will review and edit each draft and final report to ensure that Commission staff

and stakeholders can focus their reviews on the content of the reports rather than on grammatical

errors. The copyeditors at DNV GL have at least a decade of experience copyediting prior reports

delivered to the CPUC as well as reports delivered to other large clients. All draft reports will include


Page 25

peer review from independent technical experts. All reports will also include graphic designs to allow

for data visualization and easier consumption of information.

3.5.3 Report format

DNV GL will electronically transmit an Adobe PDF to the Commission that can be uploaded by

Commission staff to the CPUC website for distribution to the public. This PDF will include the final

graphic design and layout of the evaluation report. The PDF of the report will be in a print-ready

format so that the Commission can submit the file for mass printing.


Page 26

4 WORKPLAN SCHEDULE

Table 12 summarizes the milestones and deliverables for the HVAC sector workplan including all

subsectors.

Table 12. Summary of milestones and deliverables for the PY2019 HVAC workplan

Month/

Year MILESTONE/ DELIVERABLE

2020 2021

January Data Requests: monthly billing & AMI

data

Gross and Net Analysis

Program Assessment

February PY2019 Draft Impact Evaluation

Report

March PY2019 Final Impact Evaluation

Report

April PY2019 Final Impact Evaluation

Report CALMAC Posting

May PY2019 Evaluation Measure Selection

Program Interviews

June Data Requests: program documentation

(implementation plans, manuals, etc.)

ESPI Savings

Update Workplan

Sample design

Data collection instrument development

July Sample design

Data collection instrument development

Data Requests: claim documentation

(tracking data, project-specific

information)

NTG web-surveys

Remote data collection

August NTG web & phone Surveys


September Data Requests: monthly billing & AMI data

NTG web & phone Surveys


October NTG web & phone Surveys



November NTG web & phone Surveys


December Gross and Net Analysis

Program Assessment


Page 27

5 APPENDIX A - SAMPLE DESIGN AND SELECTION

The sampling process has two overarching steps:

1. Define the population(s) frame. The target population for an evaluation study refers to the group

of entities (usually customers, projects, or measures) about which the study is designed to draw

inferences. This comes from a population database (or databases) listing each unit in the

population and providing relevant information for each unit. The population database can often be

extracted from program tracking systems, utility billing systems, or secondary data sources.

Secondary information may be appended to the population database and leveraged for sample

design or enhanced analysis. The project team will look for innovative opportunities to design,

coordinate and collect information serving multiple objectives to improve the overall efficiency of

the sampling and data collection effort.

2. Define and analyze the sampling frame. The sampling frame is the list from which the sampled

units will be selected. This may be the same as the population frame or may be a subset of the

population frame. Table 13 indicates the primary sampling frames we will use.

Table 13. Typical frames and stratification variables for different populations sampled

Population/

Respondent Type Frame Stratification Variables

Participating customers Program tracking

data

IOU, climate zone (CZ), participation date, measure

types, savings magnitude, CARE participation,

dwelling unit type, neighborhood socio-

demographics, consumption characteristics

Nonparticipating customers

Billing data

records

IOU, CZ, participation date, CARE participation,

dwelling unit type, neighborhood socio-

demographics, consumption characteristics

Participating retailers

or contractors

Program tracking

data IOU, savings magnitude, number of employees

Nonparticipating retail stores

California retail

store databases IOU, channel

Nonparticipating

contractors Info USA IOU, business type, number of employees

Manufacturers Contact lists from

prior studies Typically not stratified

Participating customers Program tracking

data

IOU, CZ, participation date, measure types, savings

magnitude, CARE participation, dwelling unit type,

neighborhood socio-demographics, consumption

characteristics


Page 28

The following are the six main steps used to select the final sample:

1. Determine the values to be estimated. For impact evaluation, the key values to estimate are

usually a realization rate and an NTGR. For population characteristics, the key value of interest is

often the average response or the distribution of responses to the survey questions. For program

assessments, the values of interest are often categorical variables (e.g., satisfaction level) that for

the purposes of sampling can be represented as a Bernoulli distribution.

2. Determine the appropriate target precision. Target precision requirements for key variables are

defined by the Protocols and rigor levels assigned. There can be different target precision levels

for different variables, subgroups, or measures. For instance, we might target 90/10 (that is a 10%

relative error with 90% confidence) for the statewide realization rate estimates, but only require

90/15 for each of the individual PA estimates. Likewise, while we might require 90/10 for net

savings, a looser standard may apply for program analysis characteristics.

3. Stratify the population. Stratification offers three main benefits: obtaining results for subgroups of

the population (for instance, we expect to stratify by the three PAs, by sector, and by program),

ensuring that certain subgroups are sufficiently represented in the sample (e.g., strata with large

savings but low participation such as CHP), and increasing the precision of estimates by reducing

the variance in the sample. The last reason usually involves stratifying on a measure of size or

other quantity that is correlated with the quantity we want to estimate. For impact estimation, the

verified savings are correlated with the reported savings. Thus, if we stratify the population based

on reported savings, we can get a more precise total verified savings, realization rate, and NTGR.

Table 13 indicates the key stratification variables we generally plan to use for each type of sample.

While many variables are available for stratification, it is not necessarily important to stratify

explicitly for all of these. Stratifying by too many dimensions can lead to fielding difficulties and

can increase rather than decrease variance. DNV GL’s team uses a combination of statistical

techniques to ensure that the sample is systematically distributed across dimensions of interest,

without applying explicit sampling quotas for all these dimensions.

4. Estimate variances. To design efficient samples and calculate sample sizes, the variance of the

estimate must be available, estimated, or assumed. For estimating population proportions, the

variance is dependent only on the value of the proportion itself. This value is maximized for a

proportion of one half (50%). However, to estimate a quantity or a ratio such as a realization rate,

the required sample size depends on the variance of the quantity being estimated, which can also

be expressed as an error ratio. DNV GL can apply our past California evaluation results to provide

very robust assumptions for error ratios for all the anticipated data collection.

5. Calculate sample sizes. Sample sizes will be calculated based on the variance or error ratio and

depend on the target precision. The error ratio is the ratio-based equivalent of a coefficient of

variation (CV). The CV measures the variability (standard deviation or root-mean-square

difference) of individual evaluated values around their mean value, as a fraction of that mean

value. Similarly, the error ratio measures the variability (root-mean-square difference) of

individual evaluated values from the ratio line, i.e., [Evaluated = (Ratio* Reported)], as a fraction

of the mean evaluated value. Thus, to estimate the precision that can be achieved by the planned

sample sizes, or conversely the sample sizes necessary to achieve a given precision level, it is

necessary to know (or estimate) the error ratio for the sample components.


Page 29

In practice, we cannot determine error ratios until after we collect the data and evaluate savings.

We therefore will base the sample design and projected precision on error ratios estimated from

experience with similar work. A study looking to measure annual or peak consumption would have

a higher estimated error ratio based on past metering studies, somewhere between 0.7 and 1.0

depending on buildings and climates covered. A simple verification study may use an error ratio of

0.5.

We have access to the error ratios of the evaluated results in past studies that we will use to

inform the assumptions made during sampling for any specific program.

Sampling for cases where the primary variable will be a proportion is simpler. In these cases, the

precision depends only on the value of the evaluated proportion. Target sample sizes can be

calculated by assuming the “worst case” example, i.e., a proportion resulting in an estimate of

50%. As a simple example, to estimate a proportion (for instance, the percentage of self-reported

free riders, which we assume to be 50% for the sake of the sample plan) for a large population

with a 5% error, at 90% confidence, the sample size should be 271. If this is the target precision

for each of the three PAs, the sample size for each of the PAs would need to be 271, for a total of

813. If the actual observed proportion is either smaller or greater than 50%, then the achieved

precision will be better or lower than the planned precision.

Most of our samples will be designed to serve multiple objectives. For many purposes, it will be

sufficient to design the samples to meet a single design objective, such as designing for precision

of a proportion, and related objectives will also be satisfied. In other cases, we will need to design

explicitly for multiple objectives. The multiple objectives may be a precision target at more than

one level of aggregation (e.g., 90/10 statewide, 90/20 at the PA level) or precision targets on

more than one variable. DNV GL’s team has existing sampling tools that allow us to jointly

optimize on multiple criteria.

6. Select the sample. We will randomly choose primary sample points from the population in each

stratum, based on the sample sizes calculated in the previous step. We will select a sample large

enough to achieve the targeted number of completed cases, after the response rates are

considered. We also select backup sample at this point in case additional sample points are

needed to reach the target completes.


Page 30

6 APPENDIX B - DATA COLLECTION FRAMEWORK

DEVELOPMENT

DNV GL will develop data collection framework using the following guidance.

Guidance for survey development

Elements of the survey development guidance will include the following:

Survey design template requiring the designer to:

− Specify all survey objectives

− Specify data elements to be collected to meet the survey objectives

− Specify analysis that will meet the objective using the collected data elements

− Flag new data collection/analysis approaches that will require extra

Data format requirements per the Data Management Contractor

Templates for recording in-depth interview (IDI) responses systematically in a spreadsheet

Principles for effective instrument design, including elements such as:

− Framing

− Avoid double barreled questions

− Avoid leading questions

Steps to complete the survey development, including:

− Draft instrument using established modules where applicable

− Identify purpose for each question included: framing, analysis, or consistency check

− Confirm all objectives have been met or identify tradeoffs made

− Confirm DMC format requirements are met

Pre-test procedures

Standard question modules

DNV GL’s team will prepare standard questionnaire modules to be used across surveys collecting the

same types of information. Standard modules will include:

Introductory scripts and contact screening

Demographics/firmographics, designed to align with the California RASS or CEUS, with U.S.

Census questions, and/or with the U.S. DOE’s Residential and Commercial Energy Consumption

Surveys

Standard coding for response categories such as not applicable, don’t know, refused, or skipped

Standardized Likert scales and guidance on how to write the question to avoid priming

respondents to answer toward one end or the other

Additional standard modules will be developed by the analysis Deliverables teams and vetted for

conformance to good survey practices and consistency with the general guidance. Table 14

summarizes the standard modules to be developed.


Page 31

Table 14. Standard survey modules to be developed

Deliverable Title

Standard Modules Developed

Participants Market Actors

7 Data Collection

and Sampling

Introductory scripts

Contact screening

Demographics/firmographics

Introductory scripts

Contact screening

8

Program Analysis

and

Recommendations

Program awareness

Program experience

Motivations/barriers

Program awareness

Program experience

Motivations/barriers

9 Gross Savings Installation verification

Δ Sales levels

Δ Sales practices

Δ High efficiency

recommendations

10 Net Savings

Participant Self-Report

Attribution and Scoring

Algorithm

Program attribution of Δs in

gross savings and algorithm for

combining with participant self-

report

Quality control

Quality control procedures specified in the guidance will include survey design checks and checks

during data collection. Survey design checks include:

Checking data collection elements against the stated objectives: Is every objective satisfied and

does every included element serve a stated objective?

Reviewing questions for conformance to good question design standards.

Pre-testing to confirm programmed wording and skip patterns match approved instrument

For CATI phone surveys, we will conduct a “soft launch” of phone surveys by making calls for one or

two days against a very small initial sample with a goal of completing 10 to 20 calls. Our data

collection specialists will listen to the calls to check on operator delivery of the instrument, respondent

confusion and resistance, and skip patterns. Web surveys will follow a similar soft launch approach

and a data collection specialist inspects the initial responses for skip patterns and signs of confusion.

For IDIs completed over the phone, we will have a check-in meeting with all callers and the project

manager to discuss similar matters that may have come up during the first day or two of calling. If

necessary, we will adjust the data collection instruments based on what we hear in these initial calls.

We will conduct daily monitoring of survey and IDI progress while they are in the field:

Are surveys completed consistently?

Are response frequencies in line with expectations?

Have respondents raised any issues that need to be immediately conveyed to the PAs (e.g., a

safety issue)?

Checks while fielding on-site data collection will include:


Page 32

Weekly checking individual on-site cases filed for completeness and consistency

Continuous feedback among the field staff to share best practices and issues

Review the utility bills compared to expected consumption from the observed equipment

Spot checks of accuracy via phone follow up

Guidance on data collection management

No matter what media is used – email, phone calls, web surveys, or traditional mail – all methods of

contacting customers for surveys introduce some degree of sampling bias. Responses are limited to

customers who are willing to respond to that method of contact. In our experience, maintaining

response rates sufficient to provide statistically robust results is becoming more difficult. We

continuously work to refine our toolbox of data collection modes and approaches to maintain response

rates. Some strategies we use include postal mail advance letters and postcards with Commission

logos, offering incentives for participation, making multiple calls or invitations across several weeks

and at different times of day, identifying refusals and non-answers and make phone calls using very

experienced callers to attempt to convert the refusals.

Guidance on data cleaning

The framework will also provide guidance on data cleaning procedures. These procedures will include:

Checking for missed skip logic

Verifying post-coding of open-ended questions

Checking for impossible values and outliers on numeric answers

Checking specific values that were assigned when interviewers bracketed

Dealing with responses of “don’t know” and refusals to answer

Consistency checks

For consumption data analysis and for hourly AMI data, DNV GL has established protocols and

software for data cleaning prior to analysis. We will use these tools for our analysis.


Page 33

7 APPENDIX C - GROSS METHODS

Approach

This deliverable will provide estimates of the gross load savings impacts (kWh, kW, and therms) at the

measure and program level, consistent with the California Energy Efficiency Protocols. Key elements of

the work include establishing baselines including dual baselines, determining count adjustments and

hours of operation, and providing inputs to ex ante parameter updates.

The work will deploy the samples and data collection methods developed under Deliverable 7. Our

general approach to gross savings estimation follows the California and DOE UMP protocols.

Gross savings methods

In this section, we review the gross savings methodologies, the options and implications regarding

baselines, and the overall strategy for selecting the appropriate methodology. In the subsequent

section, we present the step-by-step approach for calculating gross savings.

Reviewing the available methods

The California Energy Efficiency Evaluation Protocol lays out minimum required methods at basic and

enhanced rigor.

The tables below summarize the strengths and limitations of verification-only approaches and of

evaluation approaches at different rigor levels.

Table 15 summarizes features of the approach based on verifying installation and passing through

deemed savings.2

Table 16 and Table 17 present strengths and limitations for methods that meet basic and enhanced

rigor standards.

Table 15. Installation verification (deemed) gross savings strengths, limitations, and

applications

Method (Rigor)

Strengths Limitations Proposed

Improvements

Installation Verification (Deemed)

Low cost

Broad scale

assessment of less-

complex measures

Fast: assurance

toward meeting the

March 2019 Bus

Stop

Only updates

installation rates

Relies on customer

reports

Online and email

methods to further

improve costs and

increase sample size

Collection of

additional data to

inform ex ante

assumptions

2 As stated in the Protocols, “Field-verified measure installation counts applied to deemed savings estimates do not

meet the requirements of this (IPMVP Option A, Basic Rigor) Protocol.”


Page 34

Table 16. Basic rigor gross savings methodologies, strengths, limitations, and applications

Method (Rigor)


Improvements

Simple Engineering Model Option A (Basic)

Moderate cost

Broad scale

assessment of less-

complex measures

Direct feedback loop

between EM&V and

ex ante estimates

Fast: assurance

toward meeting the

March 2019 Bus

Stop

Rely on assumptions

such as baseline

characteristics, usage

patterns

Measured parameter

data collection and

re-simulation of

DEER or other

models goes beyond

options available in

2006 protocols

Coordinating with

NTGR methods and

activities, develop

baseline in terms of

alternative

technology adopted

if not the program

measure

Utilize runtime data

from advanced

control measures,

including setting up

periods with and

without the control

system enabled.


Page 35

Method (Rigor)


Improvements

Normalized Annual Consumption Option C (Basic)

Low cost

Large sample size

Fast

Only applicable when

savings are significant

portion of usage

(signal-to-noise)

While allowed under

California protocols,

requires appropriate

comparison group for

best practice under DOE

UMP, or explicit non-

routine event

identification and

adjustment as IPMVP

Option C

Typically provides

savings relative to

existing conditions,

adjustments required

for savings relative to

replace on failure

baseline

Analysis of daily-

level consumption

rather than monthly

bills provides more

robust options

Potential for hourly-

level options,

explored in 2013 by

DNV GL leading to

2014 and 2015

papers, hourly

measure load shapes

then produced in Res

QM analysis (March

2017 Bus Stop).

Successful methods

to adjust savings to

appropriate baseline

such as the method

in DOE UMP for

Furnaces led by Ken

Agnew.

“M&V Plus” methods

(See next table)

Table 17. Enhanced rigor gross savings methodologies, strengths, limitations, and

applications

Method (Rigor)


Improvements

Calibrated Simulation Modeling Option D (Enhanced)

Site-specific or

prototype based

Captures savings

from complex,

multiple, and/or

interacting

components under

specific

circumstances of

the installation

High cost

Results often highly

sensitive to small

changes in modeling

assumptions

Detailed data

requirements

Hybrid approaches to

calibrate prototype

simulations;

developing samples

for site data

collection and end

use metering to

develop and input

modification and

calibration of models

to program

population NAC


Page 36

Method (Rigor)


Improvements

Retrofit

Isolation Option B (Enhanced)

High level of

accuracy for site-

level savings

estimates

Detailed information

on discrepancies

between tracking

system and actual

installations

Provides high-value

inputs to estimates

of deemed savings

estimates & load

shapes

High cost

High customer and

program burden to

coordinate pre- and

post-install

measurement

Must be deployed in

waves; timing inflexible

Only appropriate for

isolatable measures

baseline equipment

characteristics not

observable in post-only

data collection.

Successful

coordination for

HVAC coil cleaning

measures using new

measurement suites

in 2016-2017

Utilize data from

advanced control

measures, including

setting up periods

with and without the

control system

enabled.

Fully Specified

Consumption Regression for individual premises (Enhanced)

Captures complex,

comprehensive, and

operations/mainten

ance measures

Usually leverages

existing data

streams and does

not require

extensive

measurement,

metering, or on-site

visits

Susceptible to bias and

inaccuracy due to non-

routine events, or

independent variables

not captured in the

regression

Only applicable when

savings are significant

portion of usage

(signal-to-noise)

Difficult to assign

savings results to

specific measures

Directly applicable only

relevant if baseline is

existing conditions

“MV Plus” tools to

screen & adjust for

non-routine events,

identify best

normalization model,

and determine need

for supplemental

information.

Successful methods



such as the method

in DOE UMP for

Furnaces led by Ken

Agnew


Page 37

Method (Rigor)


Improvements

Fully Specified Consumption Regression across premises

(Enhanced)

Moderate, Low cost

Large sample size

May allow isolation

of effects for

different measure

groups and

subgroups

May be applicable

when savings are

not a significant

portion of usage

(signal-to-noise)

UMP Method using

comparison groups

well vetted

Susceptible to bias and

inaccuracy due to

sample attrition or

activities that are not

specified in the

regression

Requires appropriate

comparison group or

data streams of

additional model

variables that may have

unknown uncertainty

Typically provides

savings relative to

existing conditions,

adjustments required

for savings relative to

replace on failure

baseline

“MV Plus” tools to

screen & adjust for

non-routine events,

separate premises

into those where

regression methods

are meaningfully

applied and those

requiring additional

information

Successful methods



such as the method

in DOE UMP for

Furnaces led by Ken

Agnew

True Program Experimental

Design (Enhanced)

Low cost for

evaluation once set

up

Fast: assurance

toward meeting the

March 2019 Bus

Stop

True experimental

design relies on RCT

during program

implementation

Not applicable to most

full-scale delivery

methods

Does not provide

measure-specific detail

Apply recently

developed methods

that have produced

evaluation of all

California HER

programs with

strong PA buy-in of

results

Supplemental

analysis to translate

non-consumption

parameter changes

based on RCT into

savings estimates.

Oversight process to

assure new RCT

designs are correctly

randomized and

randomization is

followed.

Determining the most appropriate baseline


Page 38

In addition to rigor, the baseline is a key consideration of the gross savings analysis, as it establishes

the floor against which we calculate efficiency savings. In the cases where the CPUC has issued

guidance on assigning specific measures in the Ruling R.13-11-005, the DNV GL team will apply the

prescribed baseline. In cases where the Ruling leaves the baseline up to discretion, the DNV GL team

will advise the use of most appropriate baseline for each evaluated program and or measure. DNV

GL’s team will then vet these decisions through CPUC staff and stakeholders. Table 18 provides an

overview of the methods that can calculate gross savings relative to each baseline.

Table 18. Methods applicable to established baselines

Baseline

Basic Enhanced

Simple

Engineering Model

Option A

Normalized Annual

Consumption

Building Simulation Option D

Retrofit Isolation Option B

Fully Specified

Consumption Regression

True Program Experimental

Design

Existing

condition baseline

X X X X X

Code baseline X X EBA*

Dual

baseline X X EBA X

* EBA: Engineering Baseline Adjustment methods to adjust to code or ISP efficiency

Note that in Table 18 we refer to Engineering Baseline Adjustment methods (EBA). These methods

adjust regression-based savings at relative to existing conditions to savings relative to code, based on

the difference between existing efficiency and code baselines.

Developing specific gross savings methods and

approaches

Under this workplan, we will determine the programs and measures for which we will evaluate gross

savings, the respective evaluation rigor for each (basic or enhanced) and specific gross savings

estimation methods. These plans will be developed in coordination with Deliverables 7, 8, 9, and 10.

Our team planned the PY2019 program impact evaluations to complete all gross savings scope by

December 2020 and will deliver a draft report to the CPUC by March 1, 2021. Our workplans will utilize

CPUC-vetted methodologies, data collection tools, and analyses wherever applicable. Our team will

also continue to vet and pilot new methods, including opportunities that will leverage our experience

with NMEC models, calibrating option D models to AMI data, as well as dig deeper into the increasing

presence of third-party programs. The following section presents the steps that the DNV GL team will

take to complete the gross savings deliverables. Note that billing analyses and true experimental

design methodologies may or may not require that we collect field data to calculate gross savings.

Scope


Page 39

This section outlines the scope of the process and methodologies that we propose to use in this

evaluation. The following subsections describe our approach to completing Deliverable 9.

7.5.1 Subtask 1. Develop samples for data collection

DNV GL’s team will develop survey sample designs under Deliverable 7 based on the planned gross

savings methodology. The approaches for data collection and sampling will be coordinated with

deliverables 1, 8, and 10, and 13.

7.5.2 Subtask 2. Develop survey instruments for data collection

Development of gross savings data collection instruments will follow the framework described in

Deliverable 7.

7.5.3 Subtask 3. Test the approach

See the Deliverable 7 section for our general approach to testing the data collection methodologies.

Specific to the gross savings deliverable, the DNV GL team will leverage existing analytic code to

ensure to check early data returns and verify that data is accurate and complete.

7.5.4 Subtask 4. Collect data

Following each data collection’s pre-test period, projects will run their field efforts in full. They will

communicate with Commission staff and stakeholder groups regarding progress toward sample targets,

and ongoing questions as they arise from the field.

DNV GL will collect primary data through multiple phone, online, and on-site field work efforts as

described in Deliverable 7. Each evaluation team will use these data as needed to calculate the ex post

gross savings.


Page 40

8 APPENDIX D - TWO-STAGE BILLING ANALYSIS

METHODOLOGY

DNV GL will estimate energy savings from residential HVAC measures using a two-stage approach

detailed below. This approach is from the UMP which served as the primary basis for the CalTRACK

methodology. DNV GL will use daily data for the analysis, which is also consistent with the CalTRACK

consumption data analysis approach. Detailed step-by-step methods to perform the two-stage

approach are described below:

Stage 1. Individual premise analysis

For each premise in the analysis, whether in the participant or comparison group,

Fit a premise-specific degree-day regression model (as described in Step 1, below) separately for

the pre and post periods.

For each period, pre and post, use the coefficients of the fitted model with CZ2018 degree-days to

calculate normalized annual consumption (NAC) for that period (as described in Step 2, below).

Calculate the difference between the premise’s pre- and post-period NAC (as described in Step 3,

below).

The site-level modeling approach was originally developed for the Princeton Scorekeeping Method

(PRISM™) software.3 The theory regarding the underlying structure is discussed at length in materials

for and articles about the software.4

Step 1. Fit the basic stage 1 model

The degree-day regression for each premise and year (pre or post) is modeled as:

��

where:

�� = Daily consumption per day m or average consumption per day during interval m;

��

= Specifically, �� , average daily heating degree-days at the base temperature

� �� during meter read interval �, based on daily or daily average temperatures over

those dates;

�

= Specifically, �� , average daily cooling degree-days at the base temperature � � during meter read interval �, based on daily or daily average temperatures over those

dates;

� = Average daily baseload consumption estimated by the regression;

�� , � = Heating and cooling coefficients estimated by the regression;

3 PRISM (Advance Version 1.0) Users’ Guide. Fels, M.F., and k Kissock, M.A. Marean and C. Reynolds. Center for

Energy and Environment Studies, Princeton New Jersey. January 1995.

4 Energy and Buildings: Special Issue devoted to Measuring Energy Savings: The Scorekeeping Approach.

Margaret F. Fels, ed. Volume 9 Numbers 1&2, February/May 1986.


Page 41

�� = Regression residual.

Step 2. Select individual models fixed versus variable degree-day base

In the simplest form of this model, the degree-day base temperatures � �� and � � are each pre-

specified for the regression. For each site and time period, only one model is estimated, using these

fixed, pre-specified degree-day bases.

The fixed base approach can provide reliable results if the savings estimation uses NAC only, and the

decomposition of usage into heating, cooling, and base components is not of interest. When data used

in the Stage 1 model span all seasons NAC is relatively stable across a range of degree-day bases.

However, the decomposition of consumption into heating, cooling, or base load coefficients is highly

sensitive to the degree-day base.

The alternative is a variable degree-day approach. The variable degree-day approach entails the

following: (1) estimating each site-level regression and time period for a range of heating and cooling

degree-day base combinations, including dropping heating and/or cooling components; and (2)

choosing an optimal model (with the best fit, as measured by the coefficient of determination ��) from

among all of these models.

The variable degree-day approach fits a model that reflects the specific energy consumption dynamics

of each site. In the variable degree-day approach, for each site and time, the degree-day regression

model is estimated separately for all unique combinations of heating and cooling degree-day bases, ��

and � , across an appropriate range. This approach includes a specification in which one or both

weather parameters are removed.

Degree-days and fuels

For the modeling of natural gas consumption, it is unnecessary to include a cooling degree-day term.

For the modeling of electricity, a model with heating and cooling terms should be tested, even if the

premise is believed not to have electric heat or air conditioning. Thus, the range of degree-day bases

must be estimated for each of these options:

Electricity Consumption Model

− Heating-Cooling model (HC)

− Cooling Only (CO)

− Heating Only (HO)

− No degree-day terms (mean value)

Gas Consumption Models

− Heating Only (HO)

− No degree-day terms (mean value)


Page 42

Degree-days and set-points

If degree-days can vary, the estimated heating degree-day base � will approximate the highest

average daily outdoor temperature at which the heating system is needed for the day. The estimated

cooling degree-day base will approximate the lowest average daily outdoor temperature at which

the house cooling system is needed for the day. These base temperatures reflect both average

thermostat set-points and building dynamics such as insulation, internal and solar heat gains, etc. The

average thermostat set-points may include variable behavior related to turning on the air conditioning

or secondary heat sources. If heating or cooling are not present or are of a magnitude that is

indistinguishable amidst the natural variation, then the model without a heating or cooling component

may be the most appropriate model, using the �� model selection rule.

For each premise, time, and model specification (HC, HO or CO), the final degree-day bases (values of

� and ) that give the highest ��, along with the coefficients �, �� , � estimated at those bases will be

selected. Models with negative parameter estimates should be removed from consideration, although

they rarely survive the optimal model selection process.

Step 3. Calculate NAC using stage 1 models

To calculate NAC for the pre- and post-installation periods for each premise and timeframe, combine

the estimated coefficients �, �� , and � with the annual normal-year or typical meteorological year (TMY)

degree-days �� and � that have been calculated at the site-specific degree-day base(s), � and .

Thus, for each pre and post period at each individual site, use the coefficients for that site and period

to calculate NAC.

�� ∗ 365 � �� ∗ �� ∗ �

This example puts all premises and periods on an annual and normalized basis. The same approach

can be used to put all premises on a monthly basis and/or on an actual weather basis. Using this

approach to produce consumption on a monthly and actual weather basis is an alternative approach to

calendarization that may be preferable to the simple pro-ration of billing intervals under some

circumstances.

Step 4. Calculate the change in NAC

For each site, the difference between pre- and post-program NAC values (Δ��) represents the

change in consumption under normal weather conditions.

Stage 2. Cross-sectional analysis

Difference-in-difference whole house savings model

The first-stage analysis estimates the weather-normalized change in usage for each premise. The

second stage combines these to estimate the aggregate program effect by using a cross-sectional

analysis of the change in consumption relative to premise characteristics based on a difference-in-

difference model.

The difference-in-difference model is given by:

Δ��


Page 43

In this model, � subscripts a household and � is a treatment indicator that is 1 for residential HVAC

measure households and 0 for comparison homes. The effect of the program is captured by the

coefficient estimate of the term associated with the treatment indicator, �.

Decomposition of whole-home savings

Engineering models that simulate savings for measures and measure bundles offered by the direct

install programs will form the basis of the decomposition of whole home savings. The engineering

models will be based on DEER residential prototypes adjusted as appropriate from recent evaluation

results. These models will provide estimates of the percent reduction in cooling and heating load from

their respective baselines, for individual measures and for measure bundles offered by direct install

programs. Separate percent savings will be produced by climate zone and housing type.

The estimated reductions will be for measures offered both on a first in (standalone) basis and as part

of a bundle on a last in (incremental/marginal) basis. The following lists the types of relative measure

savings (in percent terms) the engineering simulation models provide that will be used to

disaggregate whole-home estimated savings:

First in (standalone) measure savings.

Bundle savings for the bundles claimed in the PY2019 DI programs, accounting for the majority of

savings (about 80%) across the included programs.

Marginal savings for each measure in a bundle re-scaling the last-in marginal savings so the total

matches the bundle savings.

Marginal savings for each measure in a bundle re-scaling the first-in marginal savings so the total

matches the bundle savings.

Results from engineering simulations are used as inputs in statistically adjusted engineering (SAE)

models to decompose whole-home savings (obtained customer-level DID regression model) to

measure-level savings. Engineering simulation results provide more realistic inputs to SAE models,

which enables these models to separate the effects of different measures more accurately. The

common SAE model is specified as:

Δ�� !��"��

� !�"��

� ��

where "�� and "�� are engineering or ex ante estimates of annual heating and cooling savings for

measure � and customer �, and !�� and !� are coefficients of the model that measure heating and

cooling saving realization rates.

In this study, the engineering-based savings estimates are developed as fractions of pre-program

annual cooling and heating load, because it’s not practical to develop simulation models for every

customer individually. To produce the energy savings quantities, "�� and "�, for each customer, it’s

necessary to multiply the simulation-based savings fractions of each measure and load type by the

pre-installation heating and cooling usage estimated from the customer-level DID regression model.

However, if we make that basic substitution in the common SAE model, we would have pre-program

normalized annual heating and cooling included on both sides of the equation—on the left as part of

#$%�� and on the right as scalars factors in "�� and "�. This relationship creates an endogeneity


Page 44

problem, that is, a built-in correlation between the regressors and predictors.5 Endogeneity leads to

biased estimates of the coefficients. We estimate a log SAE model, described below, to circumvent this

endogeneity.

The log SAE model DNV GL intends to use is based on the savings percentages from the simulation

models described above to decompose whole-home, heating and cooling savings into measure savings.

This model differs from common SAE models in two ways:

1. Because the engineering estimates are on a percentage basis, rather than having customer-

specific estimates in energy units, the regression is of the change in log NAC against the

engineering estimates of percent savings.

2. Because the percent savings from the engineering models are most meaningful as percentages of

heating and cooling rather than whole-home load, the dependent variable uses heating or cooling

load with separate log regression models estimated for each. The model for each is given by:

log�)*+,��-�� . log�)$%��-�� /- . !�-0�-� � ��

where:

log �Post��-�� = Log of post period NAC for customer � and load type / (/ = heating or cooling

load)

log �Pre��-�� = Log of pre period NAC for customer � and load type / (/ = heating or cooling

load)

// = Non-program related change

0�-� = The simulation-based savings fractions or percentages for load type / (heating

and cooling) of measure �, for the climate zone and building type and

measure bundle type of customer � ; not this term is 0 for non-participant

households used as matches

!�/ = the heating or cooling realization rate for measure

�� = Regression residual

Total savings for measure � and load type / is given by:

"�- � %6#�!�/7 0�/� � +%2/2� ∗ )$%��/��

where the summation is over all customers with the measure.6 Unit savings per measure then is this

estimated total saving divided by the number of customers with the measure. This approach will be

applied to direct install programs offering smart thermostats as well as other measures.

5 To see the endogeneity more clearly, we expand the basic SAE model as:

#*+,�� . #$%�� #*+,�� . �#$%��:;+%� � #$%�� #$%�� < � !��0��#$%��

� !�0�#$%��

Here ��:�"� is normalized annual baseload; �� is normalized annual heating load; �� is normalized annual

cooling load; and 0�� and 0� are simulation-based savings fractions for heating and cooling for measure m. Here

we see the components �� and �� on both sides of the equation.

6 Since the model used to estimate load savings is in log terms, it requires exponentiation to go from log scale back

to the original (energy) units. This back transformation requires the use of a bias correction factor +%� 2⁄ , where +% is the standard error of the regression.


Page 45

Some of the details remaining to be resolved include:

If we have engineering estimates of fractional savings 0�- based both on a first-in assumption and

on a last-in assumption, we will review how different these are after re-scaling. We may use only

one, only the other, or a blend.

It’s difficult to estimate separate realization rates for different measures or measure groups,

particularly if some of the estimated savings are small. We may group some measures together in

the SAE model to produce a common realization rate that will be applied to the engineering

estimates from each.


Page 46

9 APPENDIX E - NET-TO-GROSS METHODS

Approach

DNV GL’s team’s general approach to NTGR estimation follows the California and UMP protocols. Our

general NTGR estimate principles include the following:

Maintain a flexible approach – Different programs require different methods. We choose the

methods best suited to the program design (e.g., include upstream market actor interviews when

vendor influence important), needed rigor, available data sources, and the population of potential

respondents. This approach requires us to:

− Review program design and logic to understand how the program is intended to alter customer

choices

− Understand the market including program market effects, natural market evolution, and

external influences such as regulations

− Understand who can provide what data

− Integrate with Deliverables 1, 7, and 9

Draw on an array of methods that includes market and customer perspectives and experimental

design.

− Leverage and build on the data collection instruments that we’ve developed for California in

previous evaluation cycle

− Continue to develop and validate the other net savings approaches to provide an expanded

toolbox when self-report methods are not as applicable (e.g., in upstream designs).

Transparency and defensibility in methods – There is no single “right” way to estimate NTGRs for a

particular context. DNV GL’s team strives to make our methods transparent and provide clear

rationale for our choices. This includes:

− Identifying the methods’ limits and risks up front

− Testing and validating methods

− Building quality control and “sanity” checks into analyses

Utilize multiple methods and build a “preponderance of evidence” – We triangulate the results

from multiple methods when enhanced rigor is required. This includes combining supply-side and

demand-side perspectives when possible.

Provide segmented results (at least by measure within program). This level of detail allows NTG

research to inform program design and evolution, and also supports development of measure-level

ex ante NTGR.

Self-report surveys have been a major component of previous NTG methods for California HVAC

programs. Our plan for 2018 is to reuse data collection instruments we have used in past evaluations.

We will review and make minor modifications to these instruments as well as add additional questions

to be used to test our proposed new approaches. The net savings calculations for the survey


Page 47

conducted in 2018 will follow the original methods. In early 2019, we will analyze the results of the

test questions to inform decisions about the changes to make for instruments that will be fielded in

2019 and 2020. Instrument (re)design in 2019 will begin in May, and training and fielding will begin in

June.

Table 19 lists the primary methods we propose to implement for the HVAC programs along with

limitations and how we propose improving on the methods. By themselves, each of the methods listed

is a standard rigor method, except for (non-enhanced) participant self-report surveys, which is basic

rigor.

Table 19. Primary NTGR methods, limitations, and potential improvements

Candidate Method (Rigor)

Limitations Proposed Improvements

Participant Self-report

(basic)

Long, complex surveys

Low response rates

Not as useful for upstream

or behavioral programs

Use specific rather than general prompts of

alternative efficiency levels

Adjust partial free-ridership wording to ask

their most likely alternative and the influence

of the program on moving from that to actual

situation

Test alternative scoring algorithms that make

different assumptions about intermediate

efficiency levels

Enhanced Participant Self-report

(standard)

Same as participant self-

report

Expense

Same as participant self-report

Conduct cognitive interviews to assess

consumer and vendor ability to provide

standard practice baselines

Market Actor Surveys (standard)

Response bias (including

gaming the system)

Reluctance to provide

“proprietary information” like

detailed sales

Leverage RASS to determine absolute size of

market

Specify explicit scoring algorithm

Participant self-reports and enhanced self-reports ask about program awareness and the

decision-making process to get participants thinking about that time, then ask how much the program

affected the timing, efficiency, and quantity of the installed measure(s). Another way we develop self-

report surveys to develop net savings estimates is through discrete choice methods, where customers

express their product characteristic preferences, including price. By combining the program’s effect on

prices and customers’ price preferences, these methods can calculate the likely market outcomes if

rebates did not exist. For programs designed to influence contractor sales practices, these surveys

also include questions on contractor influence. Key limitations of these methods include long, complex


Page 48

surveys with questions that participants may find difficult to understand, increasing difficulty obtaining

viable response rates, and limited applicability to upstream program designs. DNV GL’s team will

improve on these methods:

Expand upon our work on the 2016 HUP impact evaluation survey by designing modular

questionnaire batteries to utilize a consistent theoretical and computational approach to self-report

methods across surveys and program while customizing batteries to specific programs and

measures. In particular, these instruments will allow us to ask about timing, quantity, and

efficiency levels only when they are relevant.

Based on our recent Massachusetts work about how well participants interpret prompts about

efficiency levels, use precise language (e.g., 12-13 SEER, 14-15 SEER, 16+ SEER) rather than

general language (e.g., code, intermediate efficiency, high efficiency) when asking about

intermediate efficiency levels.

Expand on our Massachusetts work of adapting NTG methods when industry standard practice

(ISP) baselines are relevant. In particular, we will conduct additional tests on how different self-

report approaches interact with gross savings that are based on market-level and participant-level

ISP baselines.

Market actor surveys ask upstream (manufacturers, distributors, architects and design firms) and

midstream market actors (retailers and installation contractors) how the program affects the

availability, pricing, and sales approaches of high efficiency products on the market. A specific variety

of market actor survey is the shelf survey, which produces data about how retail stores stock and

organize the products on their shelves and how those practices affect the market. Our standard

approaches include:

Blending in the results from questionnaires targeting downstream market actors when program

designs affect both

Providing explicit scoring flowcharts for market actor surveys detailing how we will combine scores

when we have both market actor and participant surveys (e.g., Upstream HVAC, see Figure 1 for

example) and asking market actors to provide information in terms of percentage changes

Leverage the RASS data to determine objective sales volumes.

Key challenges this method faces include reluctance to provide “proprietary” information such as

objective sales volume, as well as the potential for “gaming the system” by answering survey

questions in a way that inflates program attribution. Improvements include:

Refine survey wording based on critical review and QA activities applied to previous iterations

deployed in California (e.g., Upstream HVAC) and other jurisdictions to make sure that

respondents can answer questions as intended


Page 49

Figure 1. Example of application to California upstream HVAC program

Scope

The steps for net savings methods based on primary data collection from customers and market actors

to be used for the HVAC programs include the following:

Sample selection - see Deliverable 7 for approaches.

Instrument Design and Testing – We will follow the data collection framework as described in

Deliverable 7 for general procedures. DNV GL’s team’s proposed enhancements to survey methods

specifically for net savings analysis are described in more detail below.

Survey fielding and data collection will follow the general procedures described in Deliverable 7.

Data cleaning steps specific to net savings sequences are described below.

Calculate net savings ratios as described below.

Summarize results, describe implications, and make recommendations


Page 50

Tasks

9.3.1 Task 1: Survey development

Overview: DNV GL will ask downstream rebate recipients how the program affected the timing,

efficiency, and quantity of the installed measures. These surveys will also include a battery to capture

spillover. For programs designed to include mid-/upstream market actors, we will also conduct

surveys with those market actors to explore how the program affected their sales practices.

Detailed Description: DNV GL’s team’s standard approach to self-report surveys is to use questions

that explore how rebates and program services affected the timing, efficiency, and quantity of

installed measures:

In the absence of the services offered by the program, would you have installed the measure at

the same time, earlier, or later?

In the absence of the services offered by the program, would you have installed equipment of the

same efficiency, lesser efficiency, or greater efficiency?

In the absence of the services offered by the program, would you have installed the same quantity

of (or size) equipment, lesser, or more?

Existing instruments that DNV GL’s team has deployed in California modified these standard

sequences to collect the data applicable to each measure. These program attribution dimensions are

not always applicable for all measures. For example, some measures, such as air sealing, do not have

variable levels of efficiency – a customer either does them or doesn’t. DNV GL developed the existing

2016 HUP impact evaluation survey and the 2013-2014 VSD pool pump evaluations using these sorts

of customizations. We will expand upon this work to create modular, measure-specific batteries that

can be used across any program that incentivizes that measure. Table 20 provides an initial

assignment of timing, efficiency, and quantity sequences to each measure group within the HVAC

roadmap.

Table 20. Timing, efficiency, and quantity by measure

Roadmap Measure Group Timing Efficiency Quantity

HVAC PTAC Controls ● ●

HVAC Coil Cleaning ● ●

HVAC Time Delay Relay Controls ● ●

HVAC Duct Sealing ● ●

HVAC Fan Motor Replacement ● ●

HVAC Maintenance ● ●

HVAC Furnace ● ● ●


Page 51

We have recently completed work in Massachusetts exploring specific wording options for alternative

efficiency levels. We have found that respondents can make more sense out of specific efficiency

levels, rather than more generalized wording. In particular, participants provided more believable

responses to questions that asked them which specific alternative efficiency levels they would have

installed if not the program-sponsored efficiency level. For example, for lighting, the program-specific

efficiency level was LED, and the alternatives were high-performance T8, T5, standard T8, and HID

(as opposed to general wording of “the efficiency you installed”, “standard or code minimum”, “or

something in between”). Boiler wording specified efficiency levels of 80-84% efficiency, 85-90%, and

90-95% (program efficiency levels were 95% or better). Using specific wording such as this aligns well

with measure-specific question batteries. However, determining the specific efficiency levels for every

measure can be costly. Thus, DNV GL recommends using the measure-specific efficiency levels for

measures requiring high rigor and retain the general wording for standard rigor measures.

In past years, when participant surveys indicated that participating trade allies had an effect on their

equipment decisions, we conducted follow-up interviews with those trade allies to determine the

extent to which the program affected their recommendations. We will continue to conduct mid-

/upstream market actor interviews for programs that are designed to reach these actors. Those

programs include the quality maintenance program and the upstream program. The quality

maintenance questions focus on how often the contractor offers program eligible maintenance now,

compared to how often they offered them prior to participating in the program. The upstream market

actor surveys focus on changes to the actors’ stocking, upselling, and pricing practices due to the

program.

Improvements: We plan to update the survey instruments/approaches for PY2018 to address the

following concerns:

A concern raised in the previous cycle was that some of the vendor questions about stocking and

upselling described the efficient HVAC equipment in terms that were too generic and therefore

could not capture subtleties in response due to variations in equipment size, type, and project

application. Conducting such interviews is always a delicate balancing act between getting the

most precise information possible and not fatiguing the respondent. To capture some of these

nuances, without unduly lengthening the interview guides, we will the core questions in more

generic terms and then follow up with open-ended questions about possible exceptions due to

variations in equipment size, type, and project application.

Another concern from the previous evaluation cycle was that the interviewers did not probe to

better understand some of the responses of the participating vendors. For example, one

commenter wished the evaluators had probed further to find out why so many HVAC vendors

considered the program Quality Maintenance (QM) services promoted by the program to be not

that much different than their typical maintenance practices. As noted, such vendor interviews

always must strike the right balance between getting more precise information and not fatiguing

the respondents. That being said, we agree that there is much value in probing further on

research questions of particular interest. Therefore, we will scan the previous evaluation reports to

identify researchable questions which could have benefitted by additional exploration or probing in

the previous cycle, and then add new probes to the interview guides to make sure we can explore

these topics more deeply. These changes apply to PYs 2018 and later.


Page 52

Based on work DNV GL recently completed in Massachusetts, we propose testing alternative scoring

algorithms that can serve to shorten surveys and simplify computations. We have recently tested an

alternative scoring method in Massachusetts that treats efficiency levels as “binary” – assigning either

a 0 or 1 free ridership value rather than a partial free ridership value for intermediate efficiency levels.

Our testing shows that the simpler question sequence makes very little difference in the efficiency

component of the free ridership score for some measures. We will investigate the feasibility of adding

some questions to the surveys conducted for PY2018 to test these approaches.

We furthermore propose investigating changes to the framing and free ridership questions to make

them more consistent across the various measure groups. Previous evaluations have used NTG

sequences taken from at least two different paradigms that partially align. We would like to improve

the alignment by modifying all sequences to capture information on various factors that affected the

decision as well as how the program affected timing, efficiency, and quantity of measures installed.

9.3.1.1 NTG methods by measure group

In summary, we intend to conduct the following NTG evaluation activities by measure group.

The PTAC controls measure group will receive standard rigor treatment consisting of enhanced

phone surveys with end-user decision makers

The direct install residential HVAC measure groups (Coil Cleaning, Time Delay Relay Controls, Duct

Sealing, Fan Motor Replacement, & Maintenance) will receive basic rigor treatment. These

measure groups will receive end-user surveys to assess program effects on the key decision

makers based on program designs.

The Furnace measure group will receive a standard rigor NTG evaluation. For this measure group,

we will conduct market actor interviews with the program participating equipment distributors.

Table 21. HVAC Roadmap NTG evaluation activities by measure group

Measure Group Rigor Level Activities

HVAC PTAC Controls Standard Enhanced Participant Self-report phone surveys

HVAC Coil Cleaning Basic Participant Self-report web and phone surveys

HVAC Time Delay Relay Controls Basic Participant Self-report web and phone surveys

HVAC Duct Sealing Basic Participant Self-report web and phone surveys

HVAC Fan Motor Replacement Basic Participant Self-report web and phone surveys

HVAC Maintenance Basic Participant Self-report web and phone surveys

HVAC Furnace Standard Enhanced Participant Self-report phone surveys

9.3.2 Task 2. Test the approach

Our basic QA/QC procedures include reviewing completed instruments to confirm skip logic, readability,

reliability, internal validity, external validity, clarity, length, and flow. DNV GL’s team will provide draft


Page 53

data collection instruments to Commission staff (and, as directed, to stakeholders) for review and

incorporate all feedback into a final version. We will not proceed with data collection until Commission

staff approve the final instruments. We also conduct “soft launches” as described in Deliverable 7.

During analysis, we will conduct sensitivity analyses. At a minimum, these include identification of

statistical outliers that have extreme influence on the final results. Where there is indication that

participants may have difficulty answering partial free ridership questions (such as that they would

have installed measures of “intermediate” efficiency levels) we will test the effects of different scoring

algorithms, such as how much difference it makes to final free ridership scores if efficiency levels are

considered completely binary rather than allowing for partial free ridership.

9.3.3 Task 3. Survey fielding and data collection

The data collection will be conducted following procedures described under Deliverable 7, including

guidelines that will be developed under that deliverable for fieldwork management.

9.3.4 Task 4. Data cleaning

In a survey, there are two types of questions that can generate verbatim responses: open ended

questions and those that include an “other” response to catch responses that are not included in a

pre-coded set of responses. Questions that include pre-codes and an “other” response will go through

two rounds or stages of coding. The first round is for what is called ‘back-coding’. Back-coding is to

see if the verbatim responses were not true “other” responses, but miscoded answers. For example, if

there is a pre-code for “Electronics Store” and the other response for a respondent is “Best Buy” that

needs to be back-coded into the “Electronics Store” category. Once the back-coding has been done,

then the post-coding can occur. Post-coding is the process of looking at provided responses (for either

open-ended or “other” responses), clustering the responses to create new response categories, and

assigning a code to these.

We provide additional detail on our approach for a standard participant self-report survey. In our

method, each of the components of attribution: Timing, Efficiency, and Quantity, has a question

sequence that follows the same pattern:

Xa. What would you have done without the program?

Xa_O. Why do you say that?

Xb. <If Xa=program effect> How different would the project have been?

Quality control for each component of attribution consists of comparing the final component attribution

score (t, e, q) to the open-ended response for the “Xa_O. Why do you say that?” question.

Interviewers are trained to probe if the response to the open-ended question is inconsistent with the

scored response to Xa.

During the analysis phase, the analyst will put measures into 3 bins: full attribution, partial attribution

and full free rider for each component. The analyst works a bin at a time to compare each verbatim

open-ended response to the score for the attribution component. Assessing verbatim responses by bin

reduces analyst error and speeds the review. If an open-ended response appears inconsistent with the

score received, the case is elevated to subject matter expert (SME) review.


Page 54

The attribution score calculated via the timing, efficiency, and quantity questions is also check against

the following for consistency. Inconsistent scores are referred to SME review.

The answer to a closed-ended overall attribution question

The answer to an open-ended summary of the program’s influence question

Answers to questions about timing of program awareness relative to the project timing

Analysts are instructed to have a low bar (“when in doubt flag for review”). SME review consists of

reviewing the entire survey, including all responses to all measures when the survey covers multiple

measures. If the SME determines that the flagged score (whether of a component or overall) is not

clearly contradicted by the overall story told by the respondent throughout the interview, the SME

makes no change. If the flagged score is clearly contradicted (approximately 1% of cases in DNV GL’s

experience), the SME decides among 3 options:

Drop the measure from the sample (for very muddled responses, much more common with

computer-aided telephone interviews [CATI] than IDI)

Replace the inconsistent response with a “Don’t Know” (effectively using the average if there

should be some attribution for the component, but unclear how much)

Adjust the flagged score to more accurately reflect the intent of the respondent (employed in

cases where there is overwhelming evidence of intent, for instance the open-ended response says

clearly what the score should be)

9.3.5 Task 5. Score surveys

When we use surveys or IDIs as the basis for determining NTGR, developing the scoring or analysis

method algorithm is done as part of the survey design. This process will lay out how we will score

each response to each question, and how those scores will be combined to generate the free ridership

score (or another metric).

Following is a description of the scoring algorithm for a timing-quantity-efficiency self-report approach,

including a diagram or flowchart when it will make the explanation easier to understand.

Our basic self-report scoring algorithm follows: Each free ridership dimension (timing, efficiency,

quantity) receives a score between 0 (no free ridership) and 1 (complete free ridership). We combine

these scores by multiplying, then subtracting the product from 1 to compute program attribution.

FRtotal = FRtiming * FRefficiency * FRquantity

Attribution = (1-FRtotal)

The use of multiplication at the free ridership level means that if free ridership is zero for any of the

dimensions applicable to the measure, the total free ridership will also be zero and the program will

receive full credit for the measure. On the other hand, a respondent must be a full free rider along all

applicable dimensions to result in a total free ridership of one. The description of the free ridership

methodology above applies to the HVAC deemed savings measures in this category. In the previous

evaluation cycle, the CPUC EM&V Research Roadmap had requested that the evaluation team develop


Page 55

more unique and customized NTG approaches for the upstream HVAC programs. We plan to continue

using these customized NTG approaches for those programs for the PY2019 evaluation with some of

the improvements in methodology described above.

Scoring method for the Upstream HVAC programs

This subsection describes the scoring method used in the previous Upstream HVAC program surveys.

To establish program attribution, we considered the pathways distributors take when selling a high

efficiency HVAC unit, and the related pathways buyers take when purchasing one. Our goal was to

develop an approach that considered these pathways in the context of the HVAC1 program design and

real-world complexity. We created the term “causal pathway” to identify how the program may cause

behavior change along these paths. We then used this approach to integrate NTG survey responses

between buyers and the distributors into an overall NTG score.

Our methodology assumed that there were three main causal pathways of influence that impacted

both the HVAC equipment distributor and buyer. We derived these assumptions from the program

logic model provided from the PAs. Distributors and buyers are both important when evaluating

program attribution of this nature, and both were taken into consideration to formulate an overarching

attribution score. Table 22 shows the researchable questions which represent the 3 causal pathways

across distributors and buyers.

Table 22. Question themes across 3 causal pathways for distributors and buyers

Causal Pathways Distributor Questions Buyer Questions

Stock 1. What was the program influence

on distributor stock?

1. How did the mix of equipment in

stock influence the buyer?

Promotion/Upsell 2. What was the program influence

on encouraging the distributor to

promote or upsell the units?

2. What was the influence that

distributor upselling had on the

buyer’s decision?

Price of Units 3. Did the distributor pass on some

or all of the incentive to buyers?

3. What was the influence the price

had on the buyer’s decision?

To better understand program attribution, our survey instruments also had questions which focused

on the following topics:

The distributors’ perspectives on sales and how sales may have differed in the absence of the

program.

The buyers’ perspectives on the factors that led them to select the specific efficiency level for

the HVAC unit purchased.

We used the responses to these questions as consistency checks to the three main causal paths

described above.


Page 56

Each of the three causal pathways was contingent on the distributor changing their behavior in

response to the program, and this change in behavior influencing the behavior of their buyers. We

surveyed distributors involved in the program and a sample of buyers from those distributors. We

believed that if the program failed to show attribution through the distributors or buyers, then the

influence of this program had failed to affect the equipment sale on this casual path. This did not

mean that the program had no influence on the sale, only that any influence it had was not through

this path. If another causal path did show program influence, then we determined the sale to be at

least partially program-attributable.

We evaluated each causal path at the level of the individual buyer and their associated distributor for

attribution. We then subtracted from 1 to get a free-ridership score on that pathway. To calculate the

total program attribution score, we multiplied these 3 free-ridership scores together. We explore this

calculation further below, but the overall approach captures multiple paths of attribution, as well as

partial attribution when it exists.

9.3.6 Task 6. Calculate net savings estimates

When using methods based on participant self-report surveys, we compute an attribution score for

each survey respondent, multiply their gross savings by that attribution to calculate net savings, then

use sample expansion as described in Deliverable 7 to produce population level net savings.

Population level NTGRs are computed by dividing population net savings by population gross savings.

Consumption regression analysis methods are described under Deliverable 9. These methods directly

provide net saving under RCT design, in some conditions under RED design, and under quasi-

experimental methods for some behavioral program. Quasi-experimental analysis with the survey-

based adjustment described above are an alternative for producing net savings.

9.3.7 Task 7. Make recommendations for program improvements

Our approach to NTG takes the program design, logic, and mechanisms into consideration, and at its

core, NTG is about assessing the programs’ effect on the market. Thus, it is an inherent quantification

of the interaction of the programs and the market. Not only is it useful for assessing how well certain

elements work, it also provides insights into what is likely and unlikely to work given current and

future market conditions. As such, an output of our NTG analyses will be to make recommendations to

the programs about program design, where to set incentive levels, how to set ex ante NTGRs, and

which products to incentivize.


Page 57

10 APPENDIX F - WORKPLAN COMMENTS

Table 23. Workplan comments

Subject Comment

From

Page or Section

QUESTION or COMMENT Response


Page 58

DRAFT Workplan for Program Year 2019 HVAC Roadmap · California Public Utilities Commission HVAC...

Documents

Transcript of DRAFT Workplan for Program Year 2019 HVAC Roadmap · California Public Utilities Commission HVAC...