Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin...

25
Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    227
  • download

    3

Transcript of Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin...

Page 1: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Distributed Regression:an Efficient Framework for

Modeling Sensor Network Data

Carlos GuestrinPeter Bodik

Romain ThibauxMark Paskin

Samuel Madden

Page 2: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Data collection paradigm

Base StationQuery

Distributequery

Collectdata

New QuerySQL-style

query

RedoprocessGoal:

Push beyond simple data gathering devices paradigm

Page 3: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Example: temperature datafrom 10 nearby sensors:• Slow changes over time• Measurements correlated

4 hours of data

send 5 numbers!!(yet very good approximation)

Approximate measurements as

send 500 numbersCollect all measurements:

VS

using Regression:

Data is highly correlated

Redundancy &Structure

Redundancy &Structure

Build lower dimensional representation Compression for data transmission Provide nodes with local view of global state …

Page 4: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

The regression problem Given, basis functions Find coeffs w={w1,…,wk}

Precisely, minimize the residual error:

N

senso

rs

K basis functions

N

senso

rs

measurements

weights

K b

asis fu

nc

Page 5: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Regression solution

where

k×k matrix for k basis functions

k×1 vector

Problems:

• Invert A: too expensive in one mote • “Gather” matrix A: NK2

messages

Page 6: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Global temperature is complex

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

Temperature surface is complex

Need complex basis functions?Lots of communication?

Page 7: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

What are we missing?

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

Temperature surface is complex but

Lots of local structure!

Local temperature regions Do the right thing in

the overlaps

Page 8: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Kernel regression

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

Local basis functions for each region Kernels average betweenregions

Distributed algorithm for obtaining coefficients Simple communication along a spanning tree Robust to lost messages

Need global optimization to find optimal coefficients

Page 9: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Kernel regression Sparse matrices

0

0

sensors

basis functions

(sparse)

Sparsebasis

Kernel basis functions have local support

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

020

4060

80100

0

10

20

30

400

0.2

0.4

0.6

0.8

1

h1

Page 10: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Gaussian Elimination

A is sparse ) Efficient Gaussian elimination:

Complete system [A|b]

After Gaussian elimination,solve linear system by k simple divisions

subtract

Page 11: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Add messagefrom node 1

One step ofGaussian

elimination

Distributed regression

same matrices

Complete system [A|b]

Sensor 2 can locally compute w2, w3

1 2

This subsystem is enough to compute w2, w3

M12

Page 12: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

1 2 3 4 5

.

Specifyregions.

1

Sensors compute small matrices that add up to [A|b]:

2

.

MessagePassing.

3

.

Solve localSystems.

4

Distributed Regression:Solve global kernel regression problem

with simple local communication

Page 13: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Communication pattern

1

2 3

6 754

High quality links may not align with kernel topology

Kernels may not form a tree structure

Kernels form a tree structure

Communication along

a spanning treeCommunication along spanning treeusing junction tree data structure

Page 14: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Distributed junction trees

K1, K3

K1, K2

K3, K4

K4, K6

K3, K5

K5, K6

K1 ,K2

K4 , K6

K1 ,K3

K3 ,K5 ,K6

K1 ,K2 ,K3 ,K4 ,K5 ,K6

K1 ,K2 ,K3 ,K4 ,K5 ,K6

K1 ,K2 ,K3 ,K4 ,K5 ,K6

K1 ,K3 ,K4 ,K5 ,K6

K1 ,K2 ,K3 ,K4 ,K6

K5 ,K6

K1,

, K6

, K6

1

2

4 5

3

6

Any spanning tree transformed to a junction tree

Communication along junction tree guaranteed to obtain optimal parameters

Different spanning trees lead to different junction trees with different computation and communication complexity

See Paskin and Guestrin ’04 for spanning tree optimization

Page 15: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Robustness Robustness is key in sensor networks

Nodes may be added to the network or fail Communication is unreliable Link qualities change over time

Distributed regression messages are robust: Lost messages correspond to lost measurements

Must make spanning tree and junction tree algorithms robust See Paskin and Guestrin ’04 for details

Page 16: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Locally, nodes obtain global view

SERVER

LAB

KITCHEN

COPYELEC

PHONEQUIET

STORAGE

CONFERENCE

OFFICEOFFICE50

51

52 53

54

46

48

49

47

43

45

44

42 41

3739

38 36

33

3

6

10

11

12

13 14

1516

17

19

2021

22

242526283032

31

2729

23

18

9

5

8

7

4

34

1

2

3540

020

4060

80100

0

10

20

30

4018

20

22

24

26

28

View from node 1:View from node 17:

020

4060

80100

0

10

20

30

4018

20

22

24

26

28

View from node 46:

020

4060

80100

0

10

20

30

4018

20

22

24

26

28

020

4060

80100

0

10

20

30

4018

20

22

24

26

28

Global solution:

020

4060

80100

0

10

20

30

4018

20

22

24

26

28

Global solution:

Page 17: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Temperature model for lab data

Page 18: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Convergence and robustness

0 5 10 15 200

5

10

15

20

25

Epochs

RM

S (

in C

els

ius)

0 5 10 15 200

5

10

15

20

25

Epochs

RM

S (

in C

els

ius)

0 5 10 15 200

5

10

15

20

25

Epochs

RM

S (

in C

els

ius)

Distributed regressionreliable communication

Distributed regression50% packets lost

Offline solution

Page 19: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Incremental changes

0 5 10 15 200

5

10

15

20

25

Epochs

RM

S (

in C

els

ius)

Distributed regression reliable communicationDistributed regression 50% packets lost

Offline solution

0 5 10 15 200

5

10

15

20

25

Epochs

RM

S (

in C

els

ius)

0 5 10 150

1

2

3

4

Epochs

RM

S (

in C

els

ius)

Initializing with noon temperatures

At 6pm, initializing fromnoon results

Page 20: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Residual error varies over time

Average over regions

Quadratic in time

Linear in time

Constant in time

Regression with linear spatial components:

Page 21: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Effect of time window

Page 22: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Communication complexity

Page 23: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Extensions and applications Adaptive sampling Outlier and faulty sensor detection Contour finding Adaptive data modeling

Basis function selection Model-based bit compression

Bounds on bit precision for Gaussian elimination applicable

Hierarchical models Unifying with wavelet-based approaches

Currently applying similar ideas to probabilistic inference, actuator control, … See Paskin and Guestrin ’04 for details

Page 24: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Conclusions General distributed regression algorithm

for sensor networks

Robust to node and message losses

Kernel regression is an effective model for wide range of sensor network data

Provide basis for new more complex sensor network applications

Page 25: Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Add messagefrom node 1

One step ofGaussian

elimination

Distributed regression

same matrices

Complete system [A|b]

Sensor 2 can locally compute w2, w3

1 2

This subsystem is enough to compute w2, w3

M12