Our EW Experience is Ending, Our Experimentation Journey Continues

14 min readMay 16, 2019

Results and Reflections from Team NRCan

This is our third and final Experimentation Works (EW) report and blog post, and includes the results and lessons learned from two experiments. For further information about our EW context and journey, check out our first two posts here and here.

Natural Resources Canada’s Office of Energy Efficiency (OEE) was well-positioned when EW launched. The OEE had worked with partners in and out of government to surface insights and opportunities that would benefit from experimentation, including through randomized control trials. Of particular relevance to our two EW experiments was a cross-jurisdictional home energy labeling and reporting project that yielded a number of different experimentation opportunities related to the EnerGuide home evaluation service and related tools.

The OEE has also been working with Carrot Rewards to engage Canadians on energy efficiency and OEE’s content, services and tools (like ENERGY STAR and EnerGuide) via the Carrot Rewards app [1], where users can earn loyalty points for demonstrated learning and actions. When running field experiments, including randomized control trials, we need a touchpoint (i.e. user interaction, in this case a questionnaire for reward points) with a large enough sample size (30K), and a way to randomize that touchpoint at the user level. Initially, the Carrot Rewards app partially fulfilled these needs. As discussed in our second EW blog post, Carrot Rewards was not designed to be a platform for randomized controlled experiments and we ran into a few challenges along the way. Through a strong partnership with the Carrot team, we were able to address our technical hurdles and randomize content for users that met our experimental requirements. It turned out to be a novel use of the Carrot Rewards app!

Digital platforms such as Carrot Rewards have a solid user base, various interaction opportunities, and are simple to monitor and report on results of this size. This was a valuable learning opportunity as we explore digital tools that enable us to reach Canadians in large numbers (when needed), and reliably detect effects and make inferences via experiments. We have included other info about Carrot Rewards and its user base relevant to each experiment’s methodology and results in the content and related footnotes below.

With a cross-functional team in place, including reps from OEE’s Social Innovation UnLab and Housing Division, NRCan’s Experimentation and Analytics Unit, Carrot Rewards, and a graphic designer, our EW team was ready to take experimentation to the next level.

Home Energy Label Experiment

The EnerGuide label aims to help homeowners become aware of their home’s energy performance (via a rating) and encourage energy efficiency actions, like retrofitting their home. Our user research and a literature review of residential labeling identified some opportunities to improve the label so that it resonates more with Canadians. The OEE is also exploring and transitioning to digital solutions to inform Canadians in more customized ways. The results of this experiment will inform the continuous improvement of the label and our digital innovation opportunities.

Research question (adapted[2]): Does the EnerGuide label effectively convey energy efficiency and consumption information to homeowners?

We used the Carrot Rewards mobile app as an online platform to evaluate whether homeowners in BC, Ontario, and Newfoundland were able to understand information about home energy efficiency and consumption depicted on a label graphic. Comprehension was measured by testing if users could correctly answer questions about energy efficiency and consumption after viewing various label scales[3] representing a fictional home’s efficiency rating. In total, approximately 30,000 Carrot users participated in the online experiment, with data collected between November and December 2018.[4]

Users completed three modules testing their understanding of energy efficiency using EnerGuide label rating scales or its counterparts: the UK Energy Performance Certification (EPC) and the US Home Energy Score (HES). Random assignment to the labels ensured that, on average, users in each group were identical. This allows us to attribute any observed differences in average user performance to features of the labels themselves rather than any background demographic characteristics of our user-sample base.

First Module: Interpreting energy efficiency from a single label

Carrot users were randomly assigned to receive one of three label rating graphics (EnerGuide, EPC or HES) representing either an energy efficient or energy inefficient home. They were asked a single question: “is the house represented by this label energy efficient?”

Results show disparities between each group’s abilities to correctly interpret energy efficiency using the label presented to them. For EnerGuide, only 62% of users were able to correctly identify an energy efficient home, compared with 82% of users assigned to receive the US label and 74% of users who received the UK variant. Both HES and EPC labels depict an efficiency score along a reference scale where a higher number or letter indicates higher efficiency. The EnerGuide label depicts a rating relative to the energy efficiency of a “typical new home” built to the current energy code for buildings on a scale of gigajoules per year. Based on the results of this experiment and previous user research, the requirement to understand both the rating and its relation to the reference house, and what a “typical new home” means, could introduce confusion, making it more difficult for homeowners to assess the energy performance depicted on the label. The results of this experiment are significant[5] and indicate an opportunity for EnerGuide to improve user comprehension by up to 20 percentage points.

Second Module: Comparing energy efficiency across labels

Next, users were randomly assigned to receive one of three label regimes (EnerGuide, EPC or HES) that compared the energy efficiency of two homes. Users were asked a single question: “which home is more energy efficient?”

Results show that when asked to compare the energy efficiencies of two homes, all three labels provided comparable results, viewers of the EnerGuide label were able to choose the correct answer almost 82% of the time, which was only slightly lower than for the US and UK labels.

In this module, the reference “typical new house” was constant between the two EnerGuide labels depicted, which may have simplified the comparison of energy efficiency values for the users, leading to more successful responses. If labels with different values for the ‘typical new houses’ were compared, as would be the case in the housing marketplace, it might be more difficult for users to compare labels. This is an area that may warrant further investigation.

Third Module: Evaluating EnerGuide’s added information about energy consumption

Unlike the European EPC or American HES, Canada’s EnerGuide label does more than communicate information about energy efficiency; it also gives users information about energy consumption.[6]

Our last module sought to evaluate if users were able to interpret both energy consumption and energy efficiency from the EnerGuide label. They were shown one of two options: a home that consumed more energy but was also more efficient or a home that consumed more energy but was equally energy efficient. They were asked two follow-up questions: “which home is more energy efficient?”, and “which home consumes more energy?” Their responses allowed us to evaluate whether the information provided by the EnerGuide’s labelling is effectively understood by the general public.

Results: When presented with one home that consumes more energy than the other, yet is more energy efficient, only 20% of users (one in five) could correctly assess which home was more energy efficient. When presented with a home that consumes more energy but is equally efficient, only 5% of users (one in twenty) could answer correctly. We note that these results are worse than if users had chosen answers at random among the four possible choices (which would have yielded a score of 25%). Results were far better when it came to interpreting which home consumes more energy, with over 70% of users choosing correctly for either example. As consumption is an absolute quantity of Gigajoules/year (no relationship to a reference house value), correctly answering consumption questions appears more straightforward than assessing relative values of efficiency from the EnerGuide label. These results are significant, and suggest that users struggle to interpret relative energy efficiency as depicted on the label.

Conclusions: The EnerGuide label aims to provide home energy performance information to multiple users, including homeowners, government and utility officials, contractors, home energy service providers, home inspectors, and real estate agents. The results from this experiment show that, in some cases, homeowners understood the energy efficiency information that the label conveyed quite well (e.g. module 2). In other cases, however, they misinterpreted or did not understand what the label was communicating (e.g. in modules 1 and 3).

The EnerGuide label provides more information than its UK or US counterparts; however, its rating scale is not as clear. Our results strongly suggest that users have difficulty interpreting the home energy efficiency rating relative to a reference home. Users may also have difficulty interpreting energy efficiency when considering the energy consumption information also included.

The experiment identifies opportunities to further research, analyze, and make improvements to how the EnerGuide label communicates energy efficiency and consumption to homeowners. These findings will inform and advance our efforts to improve how we engage and inform Canadians via energy efficiency labelling in the digital age.

Message Framing Experiment

Behavioural insights research and practice suggests that the way we communicate and frame energy efficiency messaging to Canadians matters. How much though? This experiment leveraged our previous work and behavioural insights theory to test different message frames with Carrot users to learn what works when encouraging them to contact a home energy advisor in their area.

Research question (adapted[7]): Do cost or comfort- specific messaging interventions nudge more homeowners to seek out a home evaluation service organization than generic energy efficiency messaging?

We used the Carrot Rewards app as an online testing platform to evaluate homeowners’ knowledge of EnerGuide home evaluation process and test whether more homeowners in British Columbia, Ontario, and Newfoundland would seek out a home energy evaluation service organization when prompted with different message frames related to cost, comfort, or NRCan’s conventional EnerGuide-related information (the control). Participants were randomized into three message frame groups and all completed questions reflecting their knowledge of the EnerGuide home energy evaluation process. Uptake was measured by monitoring click-throughs and postal code entry on NRCan’s home evaluation service organization finder following the completion of the survey. In total, approximately 30,000 Carrot- homeowners participated in the online experiment, with data collected between December 2018 and January 2019.

Users completed similar but subtly different reward offers related to the EnerGuide home evaluation process, which included one of three randomized message treatments:

Neutral-Framed Messaging (control) included language such as, “An EnerGuide evaluation is a powerful tool!”
Cost-Framed Messaging with the same information as the control but infused with “cost” framing such as, “An EnerGuide evaluation can cut your energy bills.”
Comfort-Framed Messaging with the same information as the control but infused with “comfort” framing such as, “An EnerGuide evaluation can keep you warm this winter.”
Both the cost and comfort-framed offers also included two additional treatment questions on cost information (reduction in monthly bills) or comfort information (heat loss in older homes) to further enhance the potential message framing treatment effect.

Results for all messages show a minimal distinction between uptake responses for click- throughs or postal code entry to search service organizations. Cost-framed messaging nudged slightly higher click-through rates (78.9%) compared to the control group (77.8%), but the comfort-framed messaging (78.7%) did not.[8] Discerning between cost-framed and comfort-framed messaging is a very subtle distinction and this may have been a limitation of this type of experiment when presented via mobile app questionnaires. The statistical power within the experiment was significantly reduced when only 23,551 of the 30 000 participants completed the click-through and only 16,131 continued to postal code entry. This attrition further limits what distinctions can be inferred about the different messaging treatments. Interestingly, this experiment highlighted a postal code entry uptake rate of roughly 54% when using Carrot Rewards as an outreach channel for encouraging EnerGuide evaluation service organization.

Conclusions: Roughly half of the Carrot homeowners were successfully nudged to search for a local EnerGuide home evaluation service organization following a messaging intervention through the Carrot Rewards application. This work has opened up opportunities for further message framing experimentation. For example, developing an experiment that compares the impact that message framing has on consumers will require additional research, a larger response sample size, and stronger interventions to observe substantive and statistically significant differences in uptake for an EnerGuide home evaluation.

Our EW Experience: Reflections and Lessons Learned

Beyond delivering the experiments, we were intentionally building our experimentation capacity and practice through our EW participation. What follows are our reflections on our experience and tips for those interested in public sector experimentation.

Beyond simply trying something new, consider experimentation as a systematic ‘learning by doing’ process to guide, inform, evaluate policy and service actions. Doing it well directly with users sheds light on areas and possibilities to do more.
Know your and your users’ contexts. Know where you, your organization, potential partners, and users are at when it comes to what you’re trying to do, and how experimentation can help.
Do your research. We hit the ground running because we had been working on EnerGuide previously and knew where we could experiment to create value.
Be clear about your research question and problem definition. We kept coming back to this to course-check throughout our journey.
Identify existing touchpoints with users or create new ones. For example, the EnerGuide label is touchpoint and working with Carrot Rewards enabled us to reach Canadians at the scale required. We needed to engage the users of our policies, services, and tools directly. This may require working with partners.

Use experimentation to amplify the distinction between opinion versus behaviour. When it comes to generating evidence and understanding what works, there’s a difference between (a) sharing something with a stakeholder and asking their opinion about it and (b) designing an experiment to see if and how they understand it, interact with it, and use it to accomplish a task. We’re doing the latter in this case.
Be ambitious, but don’t lose sight of feasibility while working within constraints. If working in a new way and within specific timelines, it’s likely better to have a clear, appropriately scaled experiment that will give you usable results than something super complex that is much more challenging to execute within your constraints (e.g. EW timelines).

Experimentation takes time. It requires a lot of iteration and enabling conditions, like:

Establishing the relationships that make experimentation work (shared understanding, partnering, co-creating);
The logistical components that need to be managed (content design, production, testing);
The procurement and approvals (e.g. data sharing, content approval); and,
Learning as we go and developing something new in our context (e.g. tools and practices)
Honestly assess the skills on the team and don’t go at it alone. If you haven’t conducted user research or run a RCT before, and that’s what is needed, then work with others who have done it. That’s the value of cross-functional and multi-disciplinary teams.

The following skills were particularly critical:

NRCan’s OEE Housing colleagues have the mandate and subject matter expertise for home energy labelling, a willingness to experiment, and they are the ones that will implement findings.
OEE’s Social Innovation team has a bias towards relationship-building, collaboration and partnerships for the majority of our projects, including innovation surge capacity, budget, and ways to understand contexts and needs from user perspectives.
NRCan’s Experimentation and Analytics Unit arrived with experience in designing and delivery RCTs just as we were ramping up the experiment. Having their expertise in the same building was a game-changer. They also have data analytics experience that was put to good use.
All of us brought curious mindsets, a willingness to surface and test assumptions, and a bias towards action learning.
While considering how things should be can be useful in surfacing assumptions and setting a shared vision (e.g. the public sector should be experimenting more), at some point, we have to dig into how things actually are in our contexts and work with the right people on what could be . What could be is a place to start learning by doing and to build from through practical action and practice-based learning.

The NRCan EW team would like to thank our senior executives for the leadership and trust they have shown in us to work across boundaries and do new things in new ways in the open. We would also like to thank the EW crew at Treasury Board Secretariat and the opportunity they created to experiment and learn together. And a special thanks goes to our EW cohort colleagues from the other federal departments, who delivered and supported the experiments, as well as those who shared their expertise via the EW learning sessions and events.

Post by NRCan’s EW Team

Article également disponible en français ici: https://medium.com/@exp_oeuvre

Endnotes

[1]Carrot Rewards is an AI-driven wellness app and brand engagement platform that leverages behavioural economics and nudge theory to motivate and reward users for making better lifestyle choices. Carrot maximizes appeal and engagement by offering users a choice of rewards from the most popular consumer loyalty programs — in Canada users can earn SCENE® points, Petro-Points™, More Rewards® points, Drop points or RBC Rewards each time they interact with the app.

[2] Our initial research question focused solely on energy efficiency. We broadened it because the EnerGuide label also includes energy consumption information as well, which could be tested with Carrot users.

[3] Only the graphical rating scales of the EnerGuide label and its US and UK counterparts were tested in this experiment. The entire EnerGuide label was not.

[4] We note that our online Carrot sample skewed younger (average age 36 years, compared to a national average of 42 years), and of far more women than men (20,096 vs. 9,939) than the Canadian population. Nevertheless, statistical adjustment and weighting for age, gender, and region in our analysis do not alter our substantive findings.

[5] Statistically speaking, observed differences between label groups were very unlikely to be due to chance (p<0.001, for all group comparisons in all modules). For all figures, 95% confidence intervals are shown, meaning the level of confidence that our estimate of the population mean lies within the displayed interval. The confidence intervals show that, if we were to repeat the same experiment over and over, our estimates for each bar in the graph would fall within the range of the confidence interval 95 percent of the time.

[6] “Energy consumption” is the absolute amount of energy used, while “energy efficiency” is the amount used relative to a benchmark (or a collection of benchmarks) such as the size of the home, its location, or the number of occupants. These two concepts are distinct: homes considered “efficient” may still consume significant amounts of energy in absolute terms, and homes with a small energy footprint may not be particularly efficient.

[7] Our initial research question included a reference to rewards as an intervention to test. We chose to refine our research question and focus solely on message framing for this experiment.

[8] Statistical significance for p<0.1 only for cost-framed messaging comparison to the control group.

Our EW Experience is Ending, Our Experimentation Journey Continues

Results and Reflections from Team NRCan

Home Energy Label Experiment

Message Framing Experiment

Our EW Experience: Reflections and Lessons Learned

Written by Experimentation Works