BACHELOR THESIS - cvut.cz

Czech Technical University in PragueFaculty of Electrical Engineering

BACHELOR THESISModeling of Room Occupancy Patterns

for Advanced Building Automation

Prague, 2013 Ondrej Svoboda

Abstract

The goal of the thesis is to create a prediction model of room occupancy in residential houses.The purpose of this model is its utilization in a room temperature control system. In case ofolder houses with significant heat losses caused by insufficient insulation or not properly sealedwindows it seems that potential savings can be achieved by temperature control for each roomindependently. This can be done on the basis of predicted room occupancy within a given timeof the day.The thesis focuses on the creation of a model that is able to learn the room occupancy patternsfor particular days of a week and times of a day on the basis of data from motion sensors. Usingthis model it will be possible to predict the room occupancy in advance and thus to effectivelycontrol the room temperature. The primary goal of the thesis is not to develop the controlsystem but rather provide such a system with the room occupancy prediction data.

Anotace

Cılem prace je vytvorit model predikce obsazenosti pokoju v rodinnem dome, ktery by bylvyuzıvan v rıdicım systemu pro regulaci vytapenı. U starsıch domu s velkymi tepelnymiztratami souvisejıcımi napr. s nedostatecnou izolacı nebo netesnıcımi okny muze byt dosazenouspor regulacı vytapenı pro kazdy pokoj zvlast v zavislosti na tom, kdy je pokoj v dane castidne pouzıvan a kdy nikoliv.Prace se zamerı na vytvorenı modelu, ktery se na zaklade udaju z pohybovych sensoru naucıvzory obsazenosti pokoju pro jednotlive dny v tydnu a ruznou dennı dobu. Na zaklade to-hoto modelu pak bude mozne predikovat obsazenost pokoje pro konkretnı obdobı s urcitympredstihem a podle toho ucinne regulovat teplotu uvnitr.Primarnım cılem prace nenı vytvorit samotny rıdicı system regulace teploty, ale spıse poskyt-nout tomuto systemu data predikce obsazenosti pokoju.

vii

Contents

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Related Work 32.1 Neurothermostat (1997) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 SmartHome (2000) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 PreHeat (2011) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Description 63.1 Set Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.2 Data Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.3 Thoughts Before Creating Model . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.3.1 Separate User Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.3.2 Single Room versus Multi Room . . . . . . . . . . . . . . . . . . . . . . 9

4 Occupancy Prediction Model 104.1 Occupancy Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.1.1 Smoothing of Occupancy Vectors . . . . . . . . . . . . . . . . . . . . . . 104.2 Occupancy Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.2.1 K-nearest Neighbour Algorithm . . . . . . . . . . . . . . . . . . . . . . . 114.2.2 Model Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.2.3 Error Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.2.4 Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.2.4.1 Hamming Distance . . . . . . . . . . . . . . . . . . . . . . . . 134.2.4.2 Jaccard Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . 134.2.4.3 Comparison of Similarity Measures . . . . . . . . . . . . . . . 13

4.2.5 Dynamic versus Static Prediction . . . . . . . . . . . . . . . . . . . . . . 144.2.6 Length of the Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.2.7 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

ix

4.2.8 missTime, heatLoss, dayError . . . . . . . . . . . . . . . . . . . . . . . . 174.2.9 Night Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2.10 Vacation Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.3 Final Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.3.1 Error in Different Days . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.3.2 Scheduled Calendars versus Prediction Algorithm Error . . . . . . . . . 214.3.3 Best Total Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Conclusion 265.1 Achieved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6 References 28

7 Appendix 307.1 Appendix A - code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

7.1.1 Prediction algorithm.m . . . . . . . . . . . . . . . . . . . . . . . . . . . 307.1.2 Probabilistic module.m . . . . . . . . . . . . . . . . . . . . . . . . . . . 367.1.3 jaccard.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437.1.4 smoothing.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437.1.5 vacation mode.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447.1.6 hamming distance.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457.1.7 my hamming distance.m . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

7.2 Appendix B - data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

x

List of Figures

2.1 The Neurothermostat and its interaction with the environment [1] . . . . . . . 32.2 Hidden Markov Model (taken from [3]) . . . . . . . . . . . . . . . . . . . . . . . 4

3.1 Climate data Amsterdam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.2 Ground Floor Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.3 Upper Floor Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.4 Data example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.5 Sensors used to obtain data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4.1 Example of k-nearest neighbour classification[taken from wikipedia] . . . . . . . 124.2 Error hamming vs jaccard distance . . . . . . . . . . . . . . . . . . . . . . . . . 144.3 Error static versus dynamic prediction . . . . . . . . . . . . . . . . . . . . . . . 154.4 Error15 vs 30 minute long blocks . . . . . . . . . . . . . . . . . . . . . . . . . . 164.5 Error smoothed occupancy vectors . . . . . . . . . . . . . . . . . . . . . . . . . 164.6 Error difference between smoothing scenarios . . . . . . . . . . . . . . . . . . . 174.7 Error with missTime and heatLoss(k=4 threshold=0.4) . . . . . . . . . . . . . 184.8 MissTime versus heatLoss ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.9 Error improvement with night mode . . . . . . . . . . . . . . . . . . . . . . . . 204.10 Comparison of day error with and without night mode . . . . . . . . . . . . . . 204.11 Error all days . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.12 Living room occupancy histogram averaged across all days(max 12 days) . . . . 234.13 Error scheduled calendars vs prediction algorithm . . . . . . . . . . . . . . . . . 23

xi

Chapter 1

Introduction

1.1 Motivation

The motivation for this thesis comes from a start-up idea developed during Ready to Startupcourse at TU Delft. Ready to Startup is an entrepreneurial course being taught by variousexperts in their field and is the first step for many successful companies. In this course thefirst idea of intelligent efficient thermostat came up. As many of the houses in the Netherlandssuffer from enormous heat losses caused by insufficient insulation, not properly sealed windowsetc. the solution for such heat losses is to regulate the heating that keeps the temperature onlywhen somebody is present in the room.Nowadays, most of the solutions that exist are usually designed for newly built houses and arenot meant to retrofit in older buildings. Therefore most of the thermostats that are used inthese houses are either thermostats with no ability to set the temperature other than manualyor with thermostats with schedule calendars. These approaches prevent any comfortable controlover the temperature. The schedules still have to be manually set and cannot in any way adaptto the behaviour of the occupants.Seeing such a gap in the existing technology we decided to come up with a solution that cando what we normally do manually or do not do at all. Our solution is to create an intelligentthermostat that can predict the occupancy patterns of the people in the house and control theheating accordingly.Many people indicated interest in this topic and the concept itself won the 2nd price in ClimateKIC competition held in Berlin in November 2012. Therefore we have started to work on thisproject and this thesis is the first step in the creation of such an thermostat.

1

1.2 Objectives

The main goal of this thesis is to create and formalize the model of occupancy that can be usedin the heating controlling system in the intelligent thermostat. The first step is to obtain thedata from a testing house which is a necessary step to be able to work further. Another goalis to look into and learn from the research already done by others and try to summarize whathas already been done in this field.Next goal is to measure how efficient the model is and point out possible improvements thatcan be done later in the controlling system. The goal is not to develop the controlling systemitself since it requires a lot of background information in many various fields. Also to be able toprove any results a live test run has to be performed. This is not achievable in the time frameof this thesis.

1.3 Outline

The report is divided in five parts with every chapter focusing on different topic.

• Chapter 1 - describes the motivation behind this thesis and the general introduction tothe problematics

• Chapter 2 - presents related approaches to the same problem and tries to describe thegeneral trend in the field

• Chapter 3 - describes the development of the idea and the process of obtaining andediting the raw data

• Chapter 4 - concretely describes every step of implementation of the algrorithms used tobuild the occupancy model and measures the performance of using different approacheswhen creating the model

• Chapter 5 - the summary and the outcomes of the thesis

2

Chapter 2

Related Work

There already have been published several papers which tackle the problematics of the per roombased heating. One of the first papers, published by M.C. Mozer et al. [1] in 1997, explainsthe experiment with smart thermostats called the Neurothermostat and also states an highlyvalued statement that ”An important finding of this work is that even highly nondeterministicschedule contains sufficient statistical regularity to be expoited by a predictive controller.”[1,p.7].

2.1 Neurothermostat (1997)

The aim of their work[1] is to minimize the energy cost while taking into account the comfortof the occupants. The whole output of the system is when to switch off and on the housefurnace and as a result they do not control the rooms separately. Because of the delay of thewhole control system, the controller must predict the future states and anticipate the occupancyrather than just react.

The system is explained in Figure 2.1.

Figure 2.1: The Neurothermostat and its interaction with the environment [1]

3

The occupancy predictor they used was a look up table that was divided into fixed δ minuteblocks to encode the structure and a neural network for the residual structure. The networkwas a three-layer architecture, trained by back propagation. The inputs of the network were:current time of day; day of the week; average proportion of time that the home was occupiedin 10,20 and 30 minutes from the present time of the day on the previous three days and onthe same day of the week during the past four weeks; and the proportion of time the home wasoccupied during the past 60,180 and 360 minutes. In the test they used δ = 10 minutes.

2.2 SmartHome (2000)

Another paper with different approach is SmartHome[3]. This paper focuses on sleeping pat-terns of the occupants. That means turning the whole system off and on based on the sleepingpatterns. As well as the previous approach they again didn’t tackle the per room based problem.

One of the things the paper introduced was three states of home (i) Away when the home isnot occupied (ii) Active when the home is occupied and not asleep (iii) Sleep when all residentsare sleeping. This states were used in Hidden Markov Model(Fig.2.2) to decide whether turnthe Heating, ventilation and cooling (HVAC) off.

Figure 2.2: Hidden Markov Model (taken from [3])

In another chapter they tackle the problem when to turn the HVAC system on. The problemis to set the trade-off between comfort and savings; how early before the arrival should wepreheat the house. They introduce τ so called preheat time that depends on (i) capacity andefficiency for each stage of the home’s HVAC system and (ii) historical occupancy patterns ofthe home. The τ has to be set by measurements, they again use Hidden Markov Models toestimate when to turn the HVAC system on.

2.3 PreHeat (2011)

In this paper[2] they have addressed the problem of per room heating. They have created asystem that models the occupancy of each room based on motion sensors and RFID tags placed

4

on the house keys of each adult living in the household.The occupancy was derived from two sensed events and the time between them filled with occu-pied/not occupied(2 minutes for RFID, 5 minutes for motion sensor during the day, 30 minutesfor predefined sleep period). For the occupancy prediction they used partial occupancy vectorto compute the Hamming distance between the current day and the previous correspondingdays. Then they picked K-nearest past days for making the prediction. In their case, K=5proved to give the best results.

As an improvement it is taken into account the difference between weekdays and weekends.Authors also took into account not only the past corresponding similar days but also a part ofthe previous day (4 hours).

In their paper are suggested that further improvements such as extra sources of data(locationof the occupants from mobile phones), being able to detect each person individually,sleep de-tection or user customizable trade-off between comfort and savings would further improve thequality of the system.

5

Chapter 3

Description

3.1 Set Up

The data in the thesis are obtained from the household and are covering a period of 89 daysfrom 30-11-2012 till 27-2-2013. This time frame was chosen because it covers the most of thewinter season in Central Europe and is the main time when heating is used in the normalhouseholds (Figure 3.1).All the raw data consist of 183 205 entries. The data capture all the events happening in thehouse during this time frame. For the purpose of this project only motion events are takenfrom the whole data set which results in 133 089 entries distributed over the ten measured rooms.

Figure 3.1: Climate data Amsterdam

The household is occupied by 4 members and 2 animals - 2 adults and 2 children and 2 petcats. These cats, as depicted in the report, make it slightly more difficult and in some casesdistort the prediction of human occupancy in certain rooms. The house consists of 10 roomswith 6 being on the ground floor and 4 being on the upper floor as depicted in Figures 3.2 and3.3.

6

Figure 3.2: Ground Floor Plan

Figure 3.3: Upper Floor Plan

7

3.2 Data Format

The entries are in the following format [timestamp; location; event; sensor id] in Figure 3.4.

2013-02-27 18:16:20.100;Office, Motion motion;VPIR2.MOTION

Figure 3.4: Data example

Only timestamp and location are the necessary information for the purpose of creating themodel. There are two types of motion sensors used in this test house which both have slightlydifferent behaviour of sending the motion event signal. First type is commercial motion sensormade by Visonic, second type is a home made sensor based on Arduino components in Figure3.5. The behaviour of these sensors differs significantly in the way they capture the motionsignal and in the time period they send the signals.The main difference between these two types of sensors is the interval and characteristics howthey transmit the motion signal. Visonic PIR sensors go to sleep for 2 minutes after sending themotion signal. When there is still motion after these 2 minutes nothing happens. If there’s nomore motion after these 2 minutes, it can take until the next heartbeat transmission (heartbeatinterval is about 800 seconds) before the sensors are armed again.

Figure 3.5: Sensors used to obtain data

The Arduino based sensors send the signal differently. After motion signal is sent, it waitsfor a period of 10 seconds without any motion before sending the ’no motion’. Once the ’nomotion’ has been sent, the sensor is ’armed’ again and will send a motion signal as soon as itdetects it.

8

3.3 Thoughts Before Creating Model

3.3.1 Separate User Profiles

There are many ways to tackle the problem of room occupancy prediction. The approach de-scribed in the thesis is only one of the approaches implemented.The first point which had to be addressed was whether to try to make a prediction of the roomoccupancy for each inhabitant of the house separately and creating his own occupancy profile.This would enable us to make sets of rooms which are used by that specific person and makethe prediction probably much more precise. However, this would require a lot more informationand different type of sensors than that are used here, preferably RFID sensors used by JamesScott et al.[2]. Such sensors would on the other hand limit the possibilities of the predictions,as Scott et al.[2] mention, parents would not let their children to be equipped with such sensors.Another factor why not to use this approach is that such a system would be solely dependanton the inhabitants ability to have the chip on themselves at every moment of the day.The benefit of such a system would be the elimination of negative influences of pets on motionsensors. It is much easier to tackle this problem on the side of the motion sensors as done inpet-immune sensors by Visonic.[5]

3.3.2 Single Room versus Multi Room

Another thought in creating the model was to decide if the model should incorporate andpredict all the rooms in one model or rather predict each room as a single unit.As the multi prediction also depends on the structure of the house and position of the sensorsit is more beneficial to treat each room as a separate unit and in higher level of the controlsystem take into account some relations between the rooms.This can be especially interesting for the heating control system that can be given additionalinformation - for example GPS location, door events, RFID tags data etc. to improve it evenmore. However for the creating of the model itself this information is not primarily necessaryin this stage.

9

Chapter 4

Occupancy Prediction Model

4.1 Occupancy Representation

First step in order to create a prediction model is to represent the occupancy of each room ina way that can be used as an input for the prediction algorithm. The representation used inthis method is to divide each day into α minute long time blocks.Each block is presented as a number of motion events that occurred during the α minute longtime block. Because this model is designed to be used for heating controlling purposes theinterval for α is between 15 and 60 minutes.The lower bound is set because of the nature of the Visonic motion sensors that can send motionsignals with interval up to 800 seconds = 13 minutes. The upper bound is set to 60 minutesbecause of the reaction time of heating unit.In the next step each occupancy vector is normalized to binary occupancy vector by simplysetting all blocks with number of motion events higher than 0 to ’1’ and blocks with no motionevents to ’0’. In this way the whole day is represented in vector of length 96 for 15 minuteblocks, 48 for 30 minute blocks and 24 for 60 minute blocks.

4.1.1 Smoothing of Occupancy Vectors

Because heating cannot react in matter of minutes but rather half an hour or even more and theheating cannot react to one empty block in between two occupied blocks. Therefore it is logicalto take this block as an occupied block as well. This is beneficial for many reasons - logicallythe person has just left the room for 30 minutes and is back again so turning down the heatingwould not be the desired behaviour. Another reason is that it improves the prediction algorithmerror rate since these minor changes cannot be caught by the algorithm. The smoothing is doneas shown in the following equations.The smoothing used here is applied in 3 different forms. In the first scenario the smoothingtries to fill only one block among two occupied ones (Formula 4.1). In the second scenario

10

it adds smoothing of two unoccupied blocks surrounded by occupied blocks(Formula 4.2). Inthe third scenario it smooths one block and moreover it is a more strict variant of the secondscenario because it allows to smooth two empty blocks only when being surrounded by twooccupied blocks from each side(Formula 4.3).More complex example that would follow the rules of Scenario 3 is seen in Formula 4.4.

{101} → {111} (4.1)

{1001} → {1111} (4.2)

{110011} → {11111} (4.3)

{11001100101} → {11111100111} (4.4)

4.2 Occupancy Prediction

4.2.1 K-nearest Neighbour Algorithm

The core of this work is based on the k-nearest neighbour algorithm. The method is one of thewidely known methods used in pattern recognition. This method is used for classifying objectsbased on the k nearest training samples from the whole feature space. The new object is thenclassified as the one with most common occurrence among the k nearest neighbours as seen inFigure 4.1.The size of the circle in the Figure 4.1 corresponds to different choices of parameter k in themethod. The inner circle takes into account 3 nearest neighbours and thus classifies the newunknown object(marked green) as red because there are 2 red object and 1 blue object withinthe inner circle.However if we look at the outer dotted circle which corresponds to parameter k being 5, thenthe classification output is going to be blue as there are 3 blue objects and 2 red objects withinthe dotted circle.

4.2.2 Model Description

The model is based on k-nearest neighbour algorithm. As described in the previous section,occupancy is represented in binary occupancy vectors with one vector for each day. The daysare then filtered modulo 7 because there is an assumption that the same days in the weeks(for example Mondays) are more similar to each other than days of the same week among eachother.Therefore we create a matrix of occupancy vectors of the same days of the week.

In the next step this matrix is taken as an input for the learning algorithm which worksas follows: as the new day starts after each α minute long block we adjust the prediction forthe new day. As the new day goes by, the algorithm takes the blocks of time that has alreadypassed and tries to find X (X = k parameter in nearest neighbour algorithm) days which are the

11

Figure 4.1: Example of k-nearest neighbour classification[taken from wikipedia]

nearest ones to the part of the day that it already received. It selects the X most similar daysand based on the mean of its values, which can only be 0 and 1, it calculates the probabilityvalues for the whole day. Then it looks at the probability values and values higher than thethreshold value are set to ’1’ (meaning occupied) and values lower then the threshold valuesare set to ’0’ (meaning unoccupied).Once it creates a possible predicted day, it takes the next block and saves it to the final predictedday. Thus every α minutes it predicts a new block of the day. And this is repeated until itreaches the end of the day.

4.2.3 Error Measurement

The error which is later used to display how the results perform, is calculated in several steps.First is calculated the error of one single day prediction. This error is calculated as a percent-age of displaced blocks between the new day and the day predicted by the algorithm. Thenall same days of the week (for example all the Mondays) are tested and the error is averaged.Afterwards all the days in the weeks are tested in this way and the error is then averaged foreach room.The day currently being tested is always withdrawn from the history of the days so it would notaffect the results. This day is predicted based on the history of all the days currently available.In this thesis we have data from 3 months which means 12 whole weeks so there is always onetesting day and 11 training days.Another metrics that is used is missTime and heatLoss error which gives more informationabout the nature of misprediction. The variable missTime represents how much false negativeshas occurred, this means how many times did the algorithm predict ’not occupied’ while in re-ality the room was occupied. Variable heatLoss shows how much false positives have occurred,meaning how many times did the algorithm predict the room to be occupied while in reality itwas not.We can imagine the really simplified meaning of the variable missTime as how many times didthe algorithm missed to inform the heating to heat so the occupant is ”cold” and heatLoss ashow many times the heating was turned on without anyone being present.

12

4.2.4 Similarity Measures

The k nearest neighbour method is dependent on how the nearest is defined. In this casehow to tell that one day is similar to another day and the second day is on the other handnot similar to that day at all. As the days are represented in binary vectors we can applysome of the distance metrics used to find out how much difference there is between the twovectors. The distance metrics tested in this thesis are hamming distance and jaccard coefficient.

4.2.4.1 Hamming Distance

The hamming distance calculates how many bits in one binary vector do not correspond to thebits of the second binary vector. This method seems to be extremely trivial but efficient. Themain concern with this method however is that it is extremely sensitive to the exact structureof the day. This is however diminished by the design of the algorithm that calculates the ham-ming distance after each block so it can be seen where the difference in the days is coming from.

4.2.4.2 Jaccard Coefficient

Another metric tested in this paper is jaccard distance which differs from the already proposedhamming distance. The main difference between hamming distance and jaccard distance is thatjaccard distance takes into account the difference between misplaced ’1’ and ’0’. The jaccardcoefficient is calculated as follows.

J =M11

M01 +M10 +M00(4.5)

where given two binary vectors A and B

M11 is number of bits where vectors A and B have value 1M10 is number of bits where vector A has value 1 and vector B has value 0M01 is number of bits where vector A has value 0 and vector B has value 1M00 is number of bits where vectors A and B have value 0

4.2.4.3 Comparison of Similarity Measures

The resulting error in the setting (number of best days = 4, threshold = 0.4, smoothing = on)for hamming distance is 20.29% with standard deviation 3,8% and jaccard distance results onaverage in error 21.58% with standard deviation 3,44% (Figure 4.2).For the purpose of this thesis, hamming distance is used as it provides sufficiently good resultsover the other similarity measures.However the jaccard distance is more precise when it comes to prediction of true occupancy.The larger error is caused by false positives when ’not occupied’ is predicted to be occupied.

13

Figure 4.2: Error hamming vs jaccard distance

This can be seen on the missTime error which at hamming distance is 7.08% with standarddeviation 2.59%, while jaccard distance has lower error of 6.08% with standard deviation 2.33%.

4.2.5 Dynamic versus Static Prediction

The first assumption that is made is that as we receive data from the ongoing day, these datahelp make the prediction much more precise. To prove this we have to test the quality of theprediction when we do not take the data from the ongoing day into account and make theprediction solely on the history of the days.The predicted day in this case is only average of all the same days in the history which we haveso far. In this paper such a prediction is called static because it doesn’t react to the ongoingday at all.This prediction is then compared to the prediction that uses the k nearest neighbour algorithmwith the data from the ongoing day. Such prediction is called dynamic in this paper, becauseit dynamically reacts to the changes of the day and adjusts the prediction accordingly. Theresults are shown in the Figure 4.3 where it can be seen that the static prediction shows muchworse error than the dynamic prediction. The average error of the static prediction is 22,74%with standard deviation of 5,02%. The dynamic prediction results in error of 19.19% withstandard deviation of 3,8%.This has been tested with 30 minute long blocks, 4 best days, threshold 0.4 on all the possibledata available across all different rooms.The difference between static and dynamic is even more significant with further improvementslater explained in the thesis.

14

Figure 4.3: Error static versus dynamic prediction

4.2.6 Length of the Blocks

The problem of setting the length of the α minute long blocks depends on the goal of the thesis.As mentioned earlier the upper bound for α has been set to 60 minutes while the lower boundhas been set to 15 minutes. The 60 minutes have been ruled out as this would significantlylower the ability for the heating to react accordingly to the changes of occupation. Thereforein the implementation it was decided to test 30 minute blocks versus 15 minute blocks as theyboth are still able to be recognized by the motion sensors and they are not long enough to cutdown significant part of the information they carry.The results shown in Figure 4.4 state that the difference in the error between 15 minute long and30 minute long blocks is not that high, to be precise exactly 1.62% in the tested environment.The error with 15 minute blocks is 21.91% with standard deviation 4,39% while 30 minuteblocks resulted in 20.29% with standard deviation 3.8.The longer the block, the better the prediction but in this case we still take 30 minute as areasonable time when creating an occupancy model for heating controlling purposes.

4.2.7 Smoothing

As mentioned in Section 4.1.1 there is an assumption that more continuous blocks of occupancywill lead to better prediction results. This is reached by marking empty blocks surrounded byoccupied blocks also as occupied. The smoothing here is also applied to the new incomingvectors to test the outcome error. This might seem to be misleading because we also smooththe new vectors in advance, but by this we want to say that the error produced by theseunsmoothed blocks would not be interesting for us because we do not consider it as a mistakeof the prediction algorithm. Also the smoothing can be applied to the already known part ofthe the ongoing day.Our algorithm would probably make an error on such blank spots but in reality the person

15

Figure 4.4: Error15 vs 30 minute long blocks

is still returning to the place so we want the room to have occupied status even without anymotion signals coming from it. The average error in data without using smoothing is 22.93%with std 4.82% while smoothing with only one block (Scenario1 as mentioned in section 4.1.1.)resulted in average error 20.29% with std 3.8% ( Figure 4.5).In Figure 4.6 is shown the difference of error between Scenario 3 and Scenario 2. As can beseen in most cases Scenario 2 is only more lenient version of Scenario 3 and therefore resultsin lower errors. This is caused by Scenario 3 requiring 2 occupied blocks from each side of thenon occupied ones while Scenario 2 requires only 1 block on each side.

Figure 4.5: Error smoothed occupancy vectors

16

Figure 4.6: Error difference between smoothing scenarios

4.2.8 missTime, heatLoss, dayError

So far the error has only been represented in the terms of falsely predicted blocks in the oc-cupancy vector. This however does not say anything about where and how does the erroroccur. The important aspect to consider is that we need to know if the error was caused byfalsely predicted occupancy resulting only in extra heating time but not lowering the comfortof the user. This scenario is the so called heatLoss in this paper while predicting empty whereoccupancy occurred thus leaving the occupant in not heated environment is called missTime.We have to realize that even though the task is to minimize the total error as much as possiblewe have to take into account the distribution of the error between heatLoss and missTime asmissTime is largely affecting users comfort.This is closely connected to setting the parameter k called number of best days in the k- nearestneighbour method and parameter threshold which sets the boundary between occupancy andnot occupancy in the calculated probabilities.Another aspect that has to be considered is that we are not that much interested in error duringnight since people usually sleep during night hours. This is especially important in bedroomswhere heating during night is not that desired and yet the motion sensors are detecting humanpresence. We have to realize that detecting human presence at night by the model is not anerror but from the point of heating this information can be treated as slightly not valuable. Wecan even say that we are not really interested in knowing the human presence at night hoursin some predefined night time. Therefore we also introduce a variable called dayError thatmeasures the error in day hours between 7am to 12pm.

In Figure 4.7 is the best case of setting the parameters number of best days and threshold.This figure clearly demonstrates the ratio between heatLoss and missTime in this prediction.As it is seen on the figure the error rates and especially the missTime and heatLoss error rate

17

differ significantly.The main reason for this is setting of the parameter threshold which decides at which probabilitythe block is labelled to be occupied and at which it is labelled as empty. In the Figure 4.7 thethreshold is set to value 0.4 from the interval (0.25,0.5).Since we are using the 4 best days the probabilities can be only from the list (0,0.25,0.5,0.75,1).This setting means that at least 2 blocks out of 4 have to be occupied in order to predict theblock to be occupied. This setting results in a small average error with error being 20.29%with standard deviation 3.8%. And with missTime average error only 6.08% with standarddeviation 2.33%. The heatLoss error is 13,18% with standard deviation 3.96%.The missTime in this case makes only 35% of the total error as depicted in Figure 4.8

Adjusting the parameters threshold and number of days the algorithm can significantlychange its results. The best results come from using the 4 days as the number of best days andthreshold set at 0.4. Lowering the threshold results in increase of overall error but decrease ofmissTime error which lowers on expenses of the heatLoss error.

Figure 4.7: Error with missTime and heatLoss(k=4 threshold=0.4)

The results of the day error slightly vary compared to the all day error in each room. Someof the rooms have higher day error, some have higher all day error. The reason behind thiscould be in the type of the rooms. As seen in the Table 4.2 and 4.3(p. 24) the days whereday error increased significantly are the days where human presence usually does not have anypattern or the pattern is too vague to be captured. The results show that Entrance, Washroomand Garage got significantly worse day error than the all day error. This is probably caused bythe irregularity of the visits to rooms where washing machine is etc.On the other hand the rooms like Kitchen,Stairs(probably due to being adjacent to Kitchen),Living room, Room and Office have better or almost the same results as during the whole day.This is probably due to irregular visits of these places at night, another factor can be petswhich can move during the night and thus causing the error.This night problem results in an improvement that is described in the next section.

18

Figure 4.8: MissTime versus heatLoss ratio

4.2.9 Night Mode

Another improvement is the implementation of night mode. Because we are mostly interested inthe prediction during the day as at night time the heating is not important because occupantssleep and heating can be turned off. From the point of algorithm, the data obtained frommovements of the humans during their sleep, only make the algorithm less precise and do notbring any added value for the prediction.The same holds for most of the rooms, the behaviour during night is mostly not connected tothe behaviour during the day.The night mode introduced here basically ignores all the movements that happen in the setpreferred sleep time. In this case the occupants of the house set the sleeping time from 0amtill 6 am.As depicted in Figures 4.9 and 4.10, ignoring the data during the sleeping time or diminishingtheir weight, improved the prediction during the day because the data obtained during nightonly distorted the prediction and didn’t in any way bring any added value.

4.2.10 Vacation Mode

Because the algorithm so far can predict only days which are similar to the ones already in thehistory, there was a need to react to unknown days. This mode is called ’vacation’ mode asthese days, which usually do not fit in any pattern, are days when the vacation occur, weekenddays or days when the person is sick.The approach how to solve this situations is to avoid prediction algorithm in these cases and

19

Figure 4.9: Error improvement with night mode

Figure 4.10: Comparison of day error with and without night mode

base the prediction solely on the ongoing day. The algorithm remembers how many blocksfrom the ongoing day had status ’not occupied’ in row and once this number exceeds some setthreshold (in this thesis set to 12 blocks = 6 hours) it stops predicting and until a new occupiedblock occurres it predicts ’not occupied’ for all the incoming blocks.In the same way we tackle when the rooms are empty when the occupants are for example ona holiday.

20

A similar approach is applied for the opposite scenario, when the occupant stays home. Thishas lower effect because the user will usually leave a room for some time even when stayinghome the whole time. But it still improves the prediction.The only drawback of this mode is that it increases the missTime error of the algorithm sinceit turns into reactive rather than predictive. Because when we use the first rule that predicts’not occupied’ if there is 12 ’not occupied’ blocks in a row, it will predict ’not occupied’ untilanother block appears occupied. This means it will not be able to predict when the occupantreturns.

Table 4.1: Day error with vacation mode using number of best days = 4, threshold =0.4,

smoothing = Scenario3, night mode =off

day error(in %) Room Office Living Entrance Washroom Garage Stairs Kitchen

error 23.95 14.03 10.5 19.01 31.35 18.25 10.17 9.52

4.3 Final Results

4.3.1 Error in Different Days

So far the error was measured over all the days of the week that means weekend days have alsobeen included. The Figure 4.11 shows how does the error vary dependent on different days ofthe week.From this graph it is visible that some of the rooms react more to the changes in the weekendthan the others.Especially ’Room’, which is a room of a kid, shows that patterns during the weekend when thekid is probably playing in his room are unable to be predicted as opposed to the regular dayswhich are more precisely predictable.

4.3.2 Scheduled Calendars versus Prediction Algorithm Error

As most of the thermostats nowadays use schedule calendars to set the heating intervals, it isinteresting to show how our prediction algorithm performs compared to these schedule calen-dars.Because the error of schedule calendars is extremely sensitive to the setting by the occupantsit is hard to evaluate. Every household is going to give different results. The assumption hereis that people do not set these schedules based on some detail analysis of their behaviour [14].People, when it comes to occupancy, are not likely to think in metrics of time in hours butrather in some more complex blocks. By this it is meant morning, noon, afternoon, night.People are more likely to set the thermostat to start heating somewhere in the evening andrather give a big overlap in order not to decrease their comfort.

21

Figure 4.11: Error all days

An even more likely scenario is that people set the thermostat to constant temperature and donot tackle this at all.To simulate the schedule calendar we created a histogram graph(Figure 4.12) across all the daysin the history and created a typical day based on which we measured the error of scheduleddays compared to the ones the prediction algorithm outputted.

The results indicate that in most of the cases the prediction algorithm outperforms theschedule calendars with vast difference.However, in some cases, for example Stairs which are occupied basically all the time, theschedule calendar performs pretty well. This permanent occupancy is however probably not aresult of human occupancy but rather a pet which then basically forces the the whole day tolook as occupied and thus easy for schedule calendar approach.

4.3.3 Best Total Error

As there are many ways to measure the error and many types of errors were introduced, themost important error that we want to know is how does the prediction algorithm perform dur-ing day hours. It means we are interested to know how is it working between 7am and 24pmapproximately. The preferred sleep time is from 12pm to 7am when the heating is usually setoff. This sleep period can be set as a preferred sleep period by the occupants.The reason for this is that in rooms where occupants sleep, the movements humans do duringsleep are enough to trigger motion sensors depending on the phase of sleep and how deep theperson sleeps. Therefore it is extremely difficult to be able to precisely say whether the personis sleeping or just occupying the room.

22

Figure 4.12: Living room occupancy histogram averaged across all days(max 12 days)

Figure 4.13: Error scheduled calendars vs prediction algorithm

As seen from all the graphs above we can see that some rooms are easier to be predicted andsome are more difficult. This is also visible in Table 4.3 when the easier rooms have lower error.The high error rate in Room prediction can be seen from the previous graph as a problematicprediction of weekend days where the algorithm returns extremely bad predictions with dayerror rate on average 26.9% which is quite high. If the average day error is measured withoutthe weekend days it drops to 19.28% which is already a fairly good prediction.

23

Table 4.2: Total error using number of best days = 4, threshold =0.4, smoothing = Scenario3,

night mode = off

total error(in %) Room Office Living Entrance Washroom Garage Stairs Kitchen

error 25.84 16.21 14.54 20.11 24.65 18.41 18.11 14.34

missTime 10.96 5.12 4.61 8.49 9.29 4.43 5.49 4.69

heatLoss 14.91 11.09 9.93 11.63 15.35 13.98 12.74 9.65

Table 4.3: Day error using number of best days = 4, threshold =0.4, smoothing = Scenario3,

night mode = off

day error(in %) Room Office Living Entrance Washroom Garage Stairs Kitchen

error 26.95 16.70 13.42 22.11 34.2 20.95 12.55 12.62

missTime 12.99 3.96 2.53 8.8 12.38 3.82 1.55 2.42

heatLoss 13.96 12.73 10.89 13.31 21.83 17.14 11.00 10.21

The final error with both vacation mode and night mode is shown in the following Table4.5. As can be seen from the previous results, these modes further improve the prediction error.However, we have to notice that lowering the prediction error has negative effect on missTimeerror.The missTime error with vacation mode on got slightly higher which is a result of vacationmode working more as a reactive than predictive mechanism.

Average error is 16.75% with standard deviation 7.93%. Without Washroom, which provedto be difficult to predict, the average error drops to 14.64% with standard deviation 5.66%.

24

Table 4.4: Day error with night mode using number of best days = 4, threshold =0.4, smoothing

= Scenario3, night mode = on from 0am to 6am

dayl error(in %) Room Office Living Entrance Washroom Garage Stairs Kitchens

error 26.37 16.63 11.98 22.25 34.31 20.49 11.18 11.08

missTime 12.81 4.04 2.24 7.79 12.19 3.42 0.87 1.91

heatLoss 13.56 12.59 9.74 14.47 21.11 17.06 10.32 9.16

Table 4.5: Day error with night mode and vacation mode using number of best days = 4,

threshold =0.4, smoothing = Scenario3, night mode = on from 0am to 5am, vacation mode =

on

dayl error(in %) Room Office Living Entrance Washroom Garage Stairs Kitchen

error 24.24 13.85 10.28 18.51 31.49 17.23 9.45 8.95

missTime 13.56 5.59 3.45 8.69 13.6 4.73 1.23 3.21

heatLoss 10.68 8.26 6.78 9.81 17.89 13.2 8.23 5.73

25

Chapter 5

Conclusion

5.1 Achieved

The goal of this thesis was to create a predictive model for room occupancy in the household.From the results that we obtained, we can see that prediction of the occupancy is possible witherror on average of 16.75% with missTime 6.63%. We also found out that not all the rooms aresuitable for prediction. The rooms which are used irregularly and for a short period of time,like Washroom and Entrance, which are not the main rooms for the occupants, are predictablewith higher errors.The error found in the main rooms, by this I mean the rooms which are not used only scarcely,(for example Kitchen, Living room, Office and Stairs) is on values around 10% which is a fairlygood prediction error regarding the number of days we have available for testing.Another finding was that proportion of heatLoss and missTime errors can be changed by settingof several parameters. These parameters can significantly change the prediction results.The main parameters that can change the output of the prediction in this scenario are numberof best days, threshold, vacation mode and night mode settings.These modes further improved the prediction and each tackled a particular problem of theprediction.To sum it up, these systems definitely have potential and as shown in this thesis the predictionper room based is possible.The question that remains unanswered is whether a usage of such a model, when used withheating regulation system, can achieve significant energy savings compared to whole house pre-diction.

26

5.2 Future work

The model has been tested on one set of data and definitely it would be interesting to test onmore various data which would be another stage of this project.Further improvements of this thesis would be creating of archetypical days for each room.This would enable the model to be less demanding on data storage as it would only keep thetypical days and not the whole history of the days. As this thesis was tested on a particularhousehold, all the parameters were set accordingly to the data obtained from that particularhouse. However it is probable that each house will have demand for different parameter settings.Therefore, a real time setting of the parameters based on the error feedback, would enable eachhouse to have its own parameters based on the needs of the occupants.Also connection between the days can be done in the higher level in the heating controllingsystem which can have some interaction with the user. This was on purpose done without anyinteraction with user because this model is only used to create data for the heating controllingsystem.Further development of this model would be the heating controlling systems based on the datafrom the prediction model and live test of such a system.As in work of Krumm et al.[4] integrating more sensors like GPS from mobile phones wouldfurther improve unexpected returns home especially after longer period of absence.

27

Chapter 6

References

[1] Michael C. Mozer, Lucky Vidmar, Robert H Dodier, The Neurothermostat: Predictive Op-timal Control Of Residential Heating Systems, 1997 , [source]

[2] James Scott, A.J.Bernheim, John Krumm, Brian Meyers, Mika Hazas, Steve Hodges, Nico-las Villar, PreHeat: Controlling Home Heating Using Occupancy Prediction, 2011,[source]

[3] Jiakang Lu, Tamim Sookoor, Vijay Srinivasan, Ge Gao, Brian Holben, John Stankovic,Eric Field, Kamin Whitehouse, The Smart Thermostat: Using Occupancy Sensors to Save En-ergy in Homes, [date, source]

[4] John Krumm and A.J. Bernheim Brush, Learning Time-Based Presence Probabilities, Proc.of Pervasive 2011, 2011.

[5] Pet-immune motion sensor http://www.visonic.com/Products/Wired-Detectors/Next-k9-85

[6] Frauke Oldewurtel, David Sturzenegger, Manfred Morari, Importance of occupancy infor-mation for building climate control, Applied Energy, Zurich,2012

[7] Tina Yu, Modeling Occupancy Behaviour for Energy Efficiency and Occupants ComfortManagement in Intelligent Building, Canada, 2010

[8] A.J.N van Breemen, T.J.A de Vries, Design and implementation of a room thermostatusing an agent-based approach, Twente, 2000

[9] S.Theodoridis and K.Koutroumbas, Pattern Recognition (2nd ed.), Elsevier, 2009, ISBN-978-1-59749-272-0

[10] Vishal Garg, N.K. Bansal, Smart occupancy sensors to reduce energy consumption,Elsevier,1999

28

[11] Ian Richardson, Murray Thomson, David Infield, A high-resolution domestic building oc-cupancy model for energy demand simulations, Elsevier, 2008

[12] Gupta, M., S.S. Intille, and K. Larson. “Adding GPSControl to Traditional Thermostats:An Exploration of Potential Energy Savings and Design Challenges.” Proc. of Pervasive, 2009

[13] EPA, USE.P.A., Summary of Research Findings From the Programmable Thermostat Mar-ket. Available from: http : //www.energystar.gov/ia/partners/proddevelopment/revisions/downloads/thermostats/Summary.pdf

[14] Alan Meier, Cecilia Aragon, Becky Hurwitz, Dhawal Mujumdar, Therese Peffer, DanielPerry, Marco Pritoni, How People Actually Use Thermostats, ACEEE Summer Study on En-ergy Efficiency in Buildings, 2010

29

Chapter 7

Appendix

7.1 Appendix A - code

7.1.1 Prediction algorithm.m

%% Bachelor Thesis

% Title: Modeling of Room Occupancy Patterns for Advanced Building Automation

% Author: Ondrej Svoboda

% Czech Technical University in Prague

% Faculty of Electrical Engineering

%% Read the excel files containing the timestamps of the events converted

% in format number for better import

a = xlsread(’C:\Users\Ondra\Dropbox\Thesis\Matlab\Kitchen.xls’,’Data’);

30

%% Clear temporary variables

clearvars raw;

%Converting the timestamps numbers to Matlab time

dateNumbers = a+693960;

%% Creating the histogram

%Define the boundaries of the histogram algorithm and set the interval

%time(the minute size of the block) for the histogram algorithm

aggregationInterval = 1/24/2; %sets the size of the block to 30 minutes

aggregationStart = datenum(’2012-11-30 00:00’,’yyyy-mm-dd HH:MM’);

aggregationStop = datenum(’2013-02-26 23:45’,’yyyy-mm-dd HH:MM’);

%Create boundaries

aggregationBoundaries = aggregationStart:aggregationInterval:aggregationStop;

%The function histc creates the histogram and returns a vectors of values

counts = histc(dateNumbers, aggregationBoundaries);

%% Support variables

size_counts = size(counts,1);

counts_new = zeros(size_counts,1);

%% Creating binary occupancy vectors from the histogram results

for i=1:size_counts,

31

if (counts(i)==0) %%here we set all the counts to zero or 1 so we create a binary vector

counts_new(i)=0;

else

counts_new(i)=1;

end

end

%Transforming the whole occupancy vector into separate days so each

%day is one column the in the matrix

[mat,padded] = vec2mat(counts_new,48);

mat2=transpose(mat);

%% Separating the occupancy vector based on the days of the week and running

% the prediction algorithm for each day of the week to obtain averaged

% errors

counter =1;

for m=0:6

for i=1:89;

if mod(i,7)==m;

res(:,counter)=mat2(:,i); %separates by days on mondays tuesdays etc.

counter=counter+1;

end

end

%% Running the prediction algorithm itself

Probabilistic_30_vacation_mode

32

%% Calculating error of the prediction algorithm

error_fin(m+1)=error;

error_fin_day(m+1)=error_day;

heatLost_fin(m+1)=heatingL;

missTime_fin(m+1)=missT;

heatLost_day_fin(m+1) = heatingLDay;

missTime_day_fin(m+1) = missTDay;

error_fin_simple(m+1)=error_sim;

counter=1;

finalday(m+1,:)=final_predicted_day;

%% Error scheduled day

error_fin_scheduled(m+1)=error_sche;

heatLost_fin_scheduled(m+1)=heatLostScheduled;

missTime_fin_scheduled(m+1)=missTimeScheduled;

%% Histogram of the occupancy of scheduled day

BB(:,m+1) = B;

%

end

%% Calculating averaged errors across all the days of the week

% the final error of the whole prediction algorithm

%mean(error_fin_simple)

33

%mean(missTime_fin)

%mean(heatLost_fin)

%mean(error_fin)

mean(error_fin_day)

mean(heatLost_day_fin)

mean(missTime_day_fin)

%error_fin(3)=[]; %Substracting weekends

%error_fin(3)=[];

%

% heatLost(3)=[]; %Substracting weekends

% error_fin(3)=[];

%

%error_fin_day(3)=[];

%error_fin_day(3)=[];

%

% mean(error_fin)

% mean(error_fin_day)

% mean(error_fin_scheduled)

% for i=1:48

% C(i) =mean(BB(i,:));

% end

% bar(C)

% mean(heatLost_fin_scheduled)

% mean(missTime_fin_scheduled)

34

% plot 10 days

% x=1;

% for i=1:2,

% y=x+95;

% %%axis([0 96 0 2]);

% %%plot(x:y,counts_new(x:y))

% area(counts_new(x:y))

% figure();

% x=x+96;

% end

35

7.1.2 Probabilistic module.m

%% Description of the algorithm

% Bachelor Thesis

% Title: Modeling of Room Occupancy Patterns for Advanced Building Automation

% Author: Ondrej Svoboda

% Czech Technical University in Prague

% Faculty of Electrical Engineering

%% Import of the data from the Prediction_algorithm.m file

A = res(:,[1:12]);

%% Scheduled typical days for each room

%Garage

scheduled_day = [1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 0];

%Office

%scheduled_day = [1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1];

%Room

%scheduled_day = [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 1 1 1 1 1 1];

%Living

%scheduled_day = [1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

36

1 1 1 1 1 1 1 1 1 1 1 0 0 0];

%Entrance

%scheduled_day = [1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1];

%Washroom

%scheduled_day = [1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0];

%Stairs

%scheduled_day = [1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1];

%Kitchen

%scheduled_day = [1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 0];

scheduled_day = transpose(scheduled_day);

%% Support variable

rows = size(A,1); %number of rows

columns = size(A,2); %number of columns

%% Initialize variable to zero /Prealocate space for them

mean_vals = zeros(48,1); %allocate the space for matrix for mean values

simple_predicted_day = zeros(48,1); %allocate the space for simple static predicted day

final_predicted_day = zeros(48,1);%% allocate the space for final dynamic predicted day

means=zeros(48,1);

results=zeros(48,1);

37

threshold= 0.4; %Probability value when we decide between 0 and 1

%% Night mode

% Ignores the part of the night set in the parameters

A = night_mode(A,2,10);

%% Smoothing of the occupancy vectors

for k=1:columns,

A(:,k)=smoothing(A(:,k));

end

%% Static prediction

% Calculates means of the all the past days

for i=1:rows,

mean_vals(i)=mean(A(i,:));

end

% Sets occupied/non occupied for the predicted day

for j=1:rows,

if (mean_vals(j)>= threshold)

simple_predicted_day(j)=1;

else

simple_predicted_day(j)=0;

end

end

38

%% Dynamic prediction as new day goes on

for new=1:12 %selecting which of the days is going to be the new ongoing day

new_day = A(:,new); %selects the new day

%new_day = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 1 1 1 1 0];

A(:,new) = []; % substracts the day from the matrix so it doesn’t affect the resuts

numberzeros=0;

numberones=0;

for block=1:47, %for each of the X minutes long block

%% Vacation mode

[numberzeros,numberones]=vacation_mode(new_day,block,numberzeros,numberones);

if(numberzeros < 12 && numberones < 12) % how many zeros in line will break the prediction

for day=1:11, %for each day in the history(without the ongoing day)

distance(block,day)=hamming_distance(A([1:block],day),new_day([1:block])); %calculate

hamming distance from each day at each block

%calculate the means and set the probabilities at the given moment

%it means after every block (in this case we have 30 minute long blocks)

end

[~,index] = sort(distance(block,:),’ascend’);%pick 5 best days

number_of_best_days = 4;

indexes(block,[1:number_of_best_days])= index(1:number_of_best_days); %keep X best

days for every time possible

39

for i=1:rows, %calculate means for every time after every

iteration from the current n best days

mean_vals2(i,block)=mean(A(i,[index(1:number_of_best_days)]));

end

%make the predicted day from the mean values

for j=1:rows,

if (mean_vals2(j,block)> threshold)

new_predicted_day(j,block)=1;

else

new_predicted_day(j,block)=0;

end

end

%% Building of the predicted day

final_predicted_day(block+1)= new_predicted_day(block+1,block);

%create the final predicted day always predicts one block from the day it created

results_continuos2(block)= hamming_distance(final_predicted_day([1:block]),

new_day([1:block])); %%calculate the error so far

else

if(numberzeros>11)

final_predicted_day(block+1)= 0;

else

final_predicted_day(block+1)= 1;

end

end%create the final predicted day always predicts one block from the day it created

40

end

A = [A(:,1:new-1) new_day A(:,new:end)];

%% Error rates calculated for each predicted day

result(new) = hamming_distance(simple_predicted_day,new_day); %number or errors based on mean

error_simple(new) = 100*result(new)/47;

result_final(new) = hamming_distance(final_predicted_day, new_day); %calculate total error

[heatingLost(new),missTime(new)] = my_hamming_distance(final_predicted_day, new_day);

heatLostPer(new)= 100*heatingLost(new)/47;

missTimePer(new)= 100*missTime(new)/47;

error_percentage(new) = 100*result_final(new)/47;

%% Day error

day_error = hamming_distance(final_predicted_day([14:47]),new_day([14:47]));

day_error_simple = hamming_distance(simple_predicted_day([14:47]),new_day([14:47]));

[heatingLostDay(new),missTimeDay(new)]=my_hamming_distance

(final_predicted_day([14:47]),new_day([14:47]));

heatLostPerDay(new)= 100*heatingLostDay(new)/(47-14);

missTimePerDay(new)= 100*missTimeDay(new)/(47-14);

day_error_percentage(new) = 100*day_error/(47-14);

%% Error rate schedules day

error_scheduled(new) = hamming_distance(scheduled_day,new_day);

[hLScheduled(new),mTScheduled(new)] = my_hamming_distance(scheduled_day, new_day);

hLSchePer(new)= 100*hLScheduled(new)/47;

41

mTSchePer(new)= 100*mTScheduled(new)/47;

error_percentage_scheduled(new) = 100*error_scheduled(new)/47;

end

%% Mean errors of the same days in the week(Mondays,..) used in

create_histogram.m file to average the error

error_sim = mean(error_simple);

error = mean(error_percentage);

error_day = mean(day_error_percentage);

heatingL = mean(heatLostPer);

missT = mean(missTimePer);

heatingLDay =mean(heatLostPerDay);

missTDay = mean(missTimePerDay);

%% Error scheduled

error_sche= mean(error_percentage_scheduled);

heatLostScheduled = mean(hLSchePer);

missTimeScheduled = mean(mTSchePer);

%% Creat sum bar of all the days in intervals of hours

B = sum(A,2);

% figure();

% bar(B)

%set(gca, ’XTick’, 0:2:48);

%set(gca, ’XTickLabel’, 0:24);

42

7.1.3 jaccard.m

function output = jaccard( x,y )

matrix(1,:)=x;

matrix(2,:)=y;

output = 1 - pdist(matrix,’jaccard’); %jaccard, correlation

end

7.1.4 smoothing.m

function x = smoothing( x )

%% Smoothing of the occupancy vectors as described in the report of this

% thesis

%% Scenario 1

pos = strfind(x(:)’, [1 0 1]) + 1;

for i=1:size(pos),

x(pos)=1;

end

%% Scenario 2 & Scenario 3 (use one of them)

%pos = strfind(x(:)’, [1 0 0 1]) + 1; %scenario 2

pos = strfind(x(:)’, [1 1 0 0 1 1]) + 2; %scenario 3

for i=1:size(pos),

43

x(pos)=1;

x(pos+1)=1;

end

end

7.1.5 vacation mode.m

function [ numberzeros,numberones ] = vacation_mode( new_day,block,numberzeros,numberones)

if(new_day(block)==0)

numberzeros = numberzeros + 1;

else

numberzeros = 0;

end

if(new_day(block)==1)

numberones = numberones + 1;

else

numberones = 0;

end

end

44

7.1.6 hamming distance.m

function y = hamming_distance(x,y)

y = sum(x ~= y);

7.1.7 my hamming distance.m

function [y1,y2] = my_hamming_distance(x,y)

y1=0;

y2=0;

k = size(x,1);

for i=1:k;

if (x(i)>y(i));

y1=y1+1;

elseif (x(i)<y(i));

y2=y2+1;

else

end

end

7.2 Appendix B - data

The data are stored on the attached CD.

45

BACHELOR THESIS - cvut.cz

Documents

Transcript of BACHELOR THESIS - cvut.cz