Data Mining for Mobile and Situation Aware Adaptive...

ARK Data Mining: Strategic Tools and Techniques 1

8/5/2008 © Krishnaswamy and Loke 1

Data Mining for Mobile and Pervasive Applications

Shonali Krishnaswamy

Centre for Distributed Systems and Software Engineering, Monash University

Seng Loke

La Trobe University

2

Outline Background, Rationale and Motivations

Architectural View – Situation Aware Adaptive

Processing of Data Streams

Algorithms – Light-weight Suite of Data Stream

Mining Techniques

Applications:

Road Safety and Intelligent Transportation Systems

Health Monitoring

Habitat Monitoring and Wireless Sensor Networks

Smart Wardrobe

3

Background

Technology Evolution

Pervasive Computing

Wireless Communications

Sensor Devices

Data Explosion in the Mobile Space

4 4

The scale of networking

The wide area networks of yesterday (eg: GSM)

> A Million nodes @ €50k

The Nomadic local area networks of today (eg: WiFi)

> Millions of Nodes @ €100

The Sensor and Personal area network of tomorrow

> Billions of Nodes @ €1

Challenges:Challenges: Removing social, geographical, economic and capacity Removing social, geographical, economic and capacity

impediments through impediments through the provision of cost effective the provision of cost effective

infrastructures, allowing an infrastructures, allowing an ““Always onAlways on”” network existence.network existence.

Contributing to accrued facilities based competition.Contributing to accrued facilities based competition.

5 5

69%69%2,132,2382,132,2383,212,7313,212,7313,416,2813,416,2815,421,2215,421,221TotalTotal

28%298151103Optical

36%2401,2003271,634Paper

-3%58,209431,69076,69420,254Film

80%2,073,7602,779,7603,416,2304,999,230Magnetic

% Change

Upper Estimates

1999-2000

Lower estimate

1999-2000

Upper estimate

2002 Terabytes

Lower estimate

2002 Terabytes

Upper estimate

Storage Medium

No shortage of content, either from private, corporate or public sources

Aggregation of content, its structuring and indexing are key issues

Five exabytes of information is equivalent in size to the information contained in half a million new libraries the size of the Library of Congress print collections.

ScannedScanned CompressedCompressed

Source: http://www.sims.berkeley.edu/research/projects/how-much-info-2003/printable_report.pdf

Content ExplosionContent Explosion

Migrating to digital media

Exabyte (EB)1,000,000,000,000,000,000 bytes OR 10*18 bytes

2 Exabytes: Total volume of information generated in 1999.5 Exabytes: All words ever spoken by human beings.

6 6

Wireless EvolutionFocus:

UserUser--contentcontent

>Broadband

>New Services

>Efficiency

Focus: BandwidthBandwidth

Subscribers

Voice

>Coverage

>Mobility

Focus: CoverageCoverage

>Voice Quality

>Portability

>Capacity

Focus:

GrowthGrowth

>Scalability

>Ubiquity

>Price

>QoE> Simplicity

> Performance

> Service Richness

>Security/trust

>Price


7 7

Smart dust

http://www-bsac.eecs.berkeley.edu/~warneke/SmartDust/index.html8 8

Mica Sensor Node

Left: Mica II sensor node 2.0x1.5x0.5 cu. In.

Right: weather board with temperature, thermopile (passive IR), humidity, light, acclerometer sensors, connected to Mica II node

Single channel, 916 Mhz radio for bi-directional radio @40kps

4MHz micro-controller

512KB flash RAM

2 AA batteries (~2.5Ah), DC boost converter (maintain voltage)

Sensors are pre-calibrated (±1-3%) and interchangeable

9 9

Explosion of Devices and DataExplosion of Devices and Data

Information explosion and Information explosion and overload overload

Number of communicating Number of communicating data devices growing from 2.4 data devices growing from 2.4

billion to 23 billion in 2008 and billion to 23 billion in 2008 and one trillion by 2012one trillion by 2012

ChallengesChallenges:: Designing and managing an information infrastructure where all Designing and managing an information infrastructure where all

devices communicate with and understand one anotherdevices communicate with and understand one another

Creating an advanced digital ecoCreating an advanced digital eco--system for the agile enterprisesystem for the agile enterprise

Amount of data received or transmitted (in Petabytes/Day)

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

2003 2004 2005 2006 2007 2008

Computers

IndustrialAutomobile

Mobile

Entertainment

10

Rationale and Motivations

Growth/ Proliferation of Mobile/Embedded Devices

Increasing Computational Capacity

Increasing Data Generation

Communication Overhead Vs. Processing Overhead Vs. Energy Consumption

Opportunity:

A new breed of intelligent pervasive applications

11

Real Applications, Real Challenges Wireless Sensor Networks - Environment Monitoring and Disaster

Management Annual estimated cost of bushfires is $77 million on average 2003 Canberra bushfires alone cost over $300 million Human Factors – over and above

Healthcare Patient Monitoring Emergency/Triage Management

Intelligent Transportation Systems Total cost of road trauma in Australia is estimated at almost $15 billion

per year World Report on road traffic injury prevention - Intelligent Transportation

System (ITS) technologies can reduce this by about 40%

Mobile Workforce Gartner: mobile workforce spending “will grow faster than IT budgets” CIOs need to look beyond “mobile workforce enablement projects to …

innovative applications such as wireless enabled intelligent products and services”

12

Technical Challenges for Data Mining in

Mobile/Embedded Environments

Data as a Continuous Stream

Resource Constraints

Application Constraints Real-Time Decision Making Needs

Intermittent Connectivity

Iterative nature of learning algorithms


13

Data As A Continuous Stream –Theory

A data stream is a continuous, rapid flow of data that challenge our state-of-the-art processing and communication infrastructure

The general features of data streams are listed in the followingpoints: Very high rate input data (e.g., 1 Mb/second transmission rate of

an oil drill) Read only once by an algorithm Real time processing demand Unbounded Time varying

Scientific, Business, Web applications

Systems, techniques, and strategies have been proposed and implemented for data stream processing.

14

Data As A Continuous Stream - Application

SeeWhy.com

15

Data As A Continuous Stream - Application SeeWhy.com

16

Data Stream Mining in Mobile/Pervasive

Environments

Cost-efficient, Intelligent and Real-time Data Stream Mining techniques that can:

adapt to the context of diverse applications;

cope with and leverage distributed computational platforms ;

take into account available resources;

17

Systems and Architectures – for

Mobile/Embedded Data Stream Mining

MobiMine @ UMBC

VEDAS – Vehicle Data Stream Mining @ UMBC / Agnik

Situation-Aware Adaptive Data Stream Processing @ Monash University + Collaborators


Situation-Aware Adaptive Data Stream Processing


19

Research Team Monash University – Centre for Distributed

Systems and Software Engineering http://hercules.infotech.monash.edu.au/dsse

A/Prof. Arkady Zaslavsky

Dr Shonali Krishnaswamy

Dr Mohamed Gaber

Current PhD Students: Pari Delir Haghighi, Brett Gillick, Nomica Imran, Suan Khai Chong, Flora Dilys Salim

Several Masters/Honours Students

La Trobe University Dr Seng Loke

20

Research Team

Centre for Accident Research and Road Safety (CARRS-Q) @ Queensland University of Technology

Prof. Mary Sheehan

A/Prof. Andry Rakotonirainy

PhD Student: Samantha Chen

Insurance Australia Group

Department of Primary Industries, Victoria

Dr Ian McCauley

IBM T.J. Watson Research Lab

Dr Phillip Yu

Current Ongoing Discussions with Prof. Andrew Tonkin, Head of Cardiovascular Research Unit, Monash University

21

Research Funding

Australian Research Council

Discovery Grant

Linkage Grant

Doctoral Internships

DPI

IBM Labs

Hewlett Packard Endowment

Situation-Aware Reasoning

22

Research Outputs – since 2003

One Book

Several Journals and Conference Papers

References Provided

23

Situation-Aware Adaptive Data Stream

Processing

24

Situation-Aware Adaptive Data Stream

Processing•Visualization •Mining•Adaptation•Situation Inference •Context Engine •Sensory layer


25

Situation-Aware Reasoning

26 26

Context-Situation pyramid

Situations

Context

Sensory-originateddata

27

Sensory Layer

Sensors: Berkeley motes, temperature sensors, light sensor, motion sensor, etc…

Broad definition of sensor: any device (hardware or software) that can provide context information

28 28

Context

The interrelated conditions in which something exists or occurs (Merriam Webster)

The situation within which something exists or happens, and that can help explain it (Cambridge Dictionary)

“Any information that can be used to characterize the situation of an entity” (Dey, 1999)

The set of environmental states and settings that either determines an application’s behaviour or in which an application event occurs and is interesting to the user”(Chen, Kotz, 2000)

29 29

Categories of Context

Computing Context – computing information

Network context – networking information

User Context – user’s information

Physical Context – environmental information

Time Context – such as time of day, week, month

Etc, etc, etc

30 30

Categories of Context (cont’d) In practice, some contexts are more important than others

from a computational perspective:

Location

Identity

Activity

Time

Answer the questions of who, what, when and where

Primary Context Types

Form the basis for determining other contextual information known as Secondary Context Types


31 31

Context-Awareness

Research directions in context-aware computing:

Context Modeling – represent and use context in a

general way

Context Reasoning – Infer situations and reason

about context

Context Acquisition – gathering and dissemination

of contextual information

32

Situation Recognition

Situation? A part of the world that an individual manages to “carve out” (Devlin, ‘91)

A state of affairs

A current state of a part of the world when a given set of sensor readings have certain values

33

Situation Awareness - The Context Spaces

(CS) Model Situation: characterized by a set of regions

Each region: a set of acceptable values of a context attribute that satisfies a predicate

Example: situation ‘healthy’:

SBP: >85 and ≤135, DBP:>60 and ≤110, HR>45 and ≤85

34

Situation Awareness - The CS Model

The CS model provides heuristics developed specifically for addressing context-awareness under uncertainty1. Individual significance (i.e. weight) and contribution of context

attributes in the situation space 2. Inaccuracies of sensory originated information3. Characteristics of context attributes and their effect on

reasoning 4. Partial and complete containment of context-attributes’ values in

the situation space

These heuristics are integrated into reasoning formulae that areutility-based data fusion algorithms and compute the confidence level in the occurrence of a situation

35

Situation Awareness - Fuzzy Situation

Inference (FSI) Issue of uncertainty related to:

sensors

human concepts and real life situations

The FSI model integrates fuzzy logic principles into the Context Spaces (CS) model using the benefits of fuzzy logic for modeling and reasoning about vague and

uncertain situations while incorporating the CS model’s underlying theoretical basis for supporting context-aware and pervasive computing environments

36

Situation Awareness - Fuzzy Situation

Inference (FSI) situation: a set of fuzzy sets that are expressed as a FSI rule

fuzzy set: takes a pair of numeric values (i.e. a value and its membership degree)

In a fuzzy set, unlike a region, membership of an item is gradual and is represented by a membership degree between 0 and 1

FSI rule: includes multiple conditions joined with the AND operator where each condition can itself be a disjunction of conditions

Example: situation ‘healthy’:

If SBP is normal and DBP is normal and HR is normal then situation is healthy


37

Comparative Evaluation - FSI and CS for Situation Awareness

FSI and CS reasoning for Hypotension

0

0.2

0.4

0.6

0.8

1

1.2

1

time

FS_Hypo

CS_Hypo

FSI and CS reasoning for Normal

0

0.2

0.4

0.6

0.8

1

1.2

1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 129

time

Le

ve

l o

f co

nfi

de

nc

e

FSI_N

CS_N

FSI and CS reasoning for Hypertension

0

0.2

0.4

0.6

0.8

1

1.2

1 10 19 28 37 46 55 64 73 82 91 100 109 118 127

time

Leve

l o

f C

on

fid

en

ce

FS_Hyper

CS_Hyper

38

Adaptive Data Stream Mining

39

Algorithm Output Granularity (AOG)

We have proposed the use of adapting the algorithm output according to resource availability and data stream generation/input rate.

The AOG approach is based on the following axioms:

a) The algorithm output rate (AR) is function in a data rate (DR), i.e., AR = f(DR).

b) The time needed to fill the available memory by the algorithm results (TM) is function in (AR), i.e., TM = f(AR).

c) The algorithm accuracy (AC) is function in (TM), i.e., AC = f(TM).

40

AOG Typical Procedure

1- Determine the frequency of adaptation and Integration.

2- According to the data rate, calculate the algorithm output rate and the algorithm threshold.

3- Mine the incoming stream using the calculated algorithm threshold.

4- Adjust the threshold after a time frame to adapt with the change in the data rate using linear regression.

5- Repeat the last two steps till the algorithm lasts the time interval threshold.

6- Perform knowledge integration of the results.

41

AOG Primitives

AOG parameters TFi The time frame i Di: Input data stream during the time

frame i. I(Di): Average data rate of the input

stream Di. O(Di): Average output rate resulting from

mining the stream DiAOG operations α(Di) Mining process of the Di stream. β([I(D1), O(D1)],…,[I(Di), O(Di)])

Adaptation process of the algorithm threshold at the end of the time frame i.

ΩΩΩΩ (Oi, ...,Ox) Knowledge integration process done on the output I to the output x.

AOG settings D(TF) Time duration of each time frame.

D(ΩΩΩΩ) Time duration between each two consecutive knowledge integration processes.

42

AOG-based Learning Algorithms

In the mining stage, there are three variations in using the threshold according to the mining technique: LightWeight Clustering (LWC): the threshold is used to specify

the minimum distance between the cluster center and the data element/record;

LightWeight Classification (LWClass): In addition of using the threshold in specifying the distance, the class label is checked. If the class label of the stored items and the new item that are similar (within the accepted distance) is the same, the weight of the stored item is increased along with the weighted average of the other attributes, otherwise the weight is decreased and the new item is ignored;

LightWeight Frequent patterns (LWF): the threshold is used to determine the number of counters for the heavy hitters.


43

AOG-based VFKM

Very Fast K-Means (VFKM) by Domingosand Hulten1 is a number of K-means runs. Unlike K-means that uses all the data records in each of its iterations, VFKM uses only a calculated number of all the data records bounded by a probabilistic error bounds.

Before checking the termination condition of VFKM, if the free memory is found to be less than critical level of memory availability, the error bound is increased according to the percentage of free memory to the total memory.

1P. Domingos and G. Hulten, A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering, Proceedings of the Eighteenth International Conference on Machine Learning, 2001, 106--113, Williamstown, MA, Morgan Kaufmann

44

AOG-based Querying

Blocking operators requires the query results to be resident in memory over the period of query execution.

Using AOG adapts the sampling rate of the query according to the following

Input rate

Output rate

Available memory

Available time to deliver final results

45

An Application of LWC in Change

Detection Using the change of

clustering models over time to detect changes in data stream distributions and domains.

The changes are associated with the events for the classification purpose.

The classification is done by voting over the change in the attributes.

46

Application of LWC in Change Detection

(Cont’d)

47

Integrating AOG with other Adaptation

Approaches We term the input settings as Algorithm

Input Granularity AIG.

AIG is represented in sampling, load shedding, and creating data synopsis techniques.

Algorithm Output Granularity AOGrepresents the output settings.

Strategies for AOG include number of knowledge structures created or level of output granularity.

Algorithm Processing Granularity APG is concerned with changing the processing settings of the algorithm itself to consume smaller amount of resources.

Strategies include changing the error rate of approximation algorithms.

48

RA-Cluster

RA-Cluster is an incremental online clustering algorithm that has all the required parameters to enable resource-awareness.

Memory adaptation is done through threshold adaptation and outlier and inactive cluster elimination.

CPU adaptation is done through randomized assignment.

Battery adaptation is done through the change in sampling rate.


49

DRA-Cluster

DRA is an extended version of RA-Cluster that works in a distributed mode in a wireless sensor network.

DRA has been implemented and evaluated on the new SunSpot wireless sensor networks from Sun Microsystems.

The approach is to migrate current results from a near-dead node to another ‘best’ neighbor. This raises three main questions: Which neighbor to migrate to? When to migrate? How to migrate (and merge these clustered data)?

50

DRA-Cluster (Cont’d)

Three thresholds are used to answer this question:1. The adaptive threshold signals when to start the resource

adaptation process.

2. The best-neighbor-finding threshold signals when to start broadcasting requests to direct neighbors.

3. The migrating threshold signals when to start migrating data.

We use a simple linear extrapolation model to estimate the first two thresholds dynamically.

The third threshold is a predefined value that depends on the hardware platform.

51

Adaptive Rule Triggers for WSNs

Learn Energy Efficient Associations in WSNs

Rules discovered are then used for cluster heads to infer readings of their neighboursand control cluster members’ operations

52


53


54

Applications

Intelligent Transportation Systems

Patient Monitoring

Wireless Sensor Networks – Energy

Efficient Habitat Monitoring

Smart Wardrobe


55

Applications – ITS and Road Safety

Centre for Accident Research and Road Safety –Queensland

Insurance Australia Group

Intersection Safety Crashes on Curves Drunk Driving Behaviour

56


Approach/Methodology

Risk Understanding

We first identify and model high risk situations and high risk drivers using knowledge of the history of risks.

This knowledge is collated and formalised from:

organisational experience driver psychology expertise

data mining/analysis of existing crash data (IAG policyholder and Queensland Transport WebCrashdatabase)

analysis of data generated from driving simulators

57


Crashes on Curves

58


Crash Detection on Curves

Impact Factors on Claim Costs

Analysis of Crash Data – Building the Knowledge Base

Traditional Data Mining Exercise With a few twists and turns

Text Mining Cluster Analysis Contributing Factors Classificatory Analysis

59


Intersection Safety

60

The U & I Aware Framework (cont’d)

Calculating

Collision Point

collision detection

algorithm

Matching Vehicle

Status Data

Speed, Angle,

Position,

Direction, Size,

Maneuver

Point of

Collision

Found?

Calculating Time

To Collision

collision

detection

algorithm

Issue

Warning /

Command

Collision

Predicted?Yes

Learning from

collision, near

collision

events

Data mining

Collision

Actually

Happened?

Yes, No

Yes

Knowledge

Base of

Collision

Patterns

Preselection

collision

pattern

Time to Avoid <

Time to Collision?

Yes, No

COLLISION

DETECTION

COLLISION

WARNING

COLLISION

LEARNING


61

Intersection Simulation

62

Collision Patterns Learning

Learning is performed by mining sensor and historical collision data

No existing research on learning whole sets of collision patterns at an intersection

Sensor and collision data are generated by the simulation

Veh1_Manouvre Veh1_Direction Veh1_angle Veh2_Manouvre Veh2_Direction Veh2_angle Coll_Type

STRAIGHT RIGHT 0 STOPPED DOWN 90 SideCollision

STRAIGHT RIGHT 0 STRAIGHT RIGHT 0 RearEndCollision

STRAIGHT LEFT 0 STRAIGHT LEFT 0 RearEndCollision


STRAIGHT DOWN 90 STRAIGHT DOWN 90 RearEndCollisionSTRAIGHT DOWN 90 STRAIGHT DOWN 90 RearEndCollision

STRAIGHT DOWN 90 STRAIGHT DOWN 90 RearEndCollision

STRAIGHT DOWN 90 STOPPED LEFT 0 SideCollision


63

Preselection

Collision detection is only performed on pairs of vehicles that have the possibility of collisions based on the known intersection collision patterns.

Choosing only the vehicles that exhibits behaviours, location, and driving manoeuvres that match the collision patterns in the knowledge base

Performance is improved by eliminating the need to check every pair of vehicles at the intersection for collision possibility.

64

Preselection Algorithm

Implementation Two types of side collision patterns: perpendicular left with

straight manoeuvre and perpendicular right with straight manoeuvre.

Only cars that are located within a certain area and exhibiting certain manoeuvres are selected.

After preselection is executed, only then the pair-wise collision detection algorithm is applied.

65

Collision Detection Evaluation

Speed of detection

Performance/accuracy: precision and coverage

66

Speed of Detection

Collision Detection Log File with attributes: registration number of both vehicles, collision point, time to collision, leg location of both vehicles, and collision type

Average detection time (time to collision) for each run is calculated

If preselection is ignored in collision detection, the average time to collision is 5.6 seconds

When preselection is used, the average time to collision is 8.7 second


67

Accuracy: Precision and Coverage

true positive: valid detection

false negative: invalid detection

false positive: undetected collision

Detectionscollisiontotal

Detectionsvalidofnoprecision

.=

)( negativefalsepositivetrue

positivetrue

+=

)( zx

x

+=

Collisionstotal

DetectionsvalidofnoCoverage

.=

)( positivefalsepositivetrue

positivetrue

+=

)( yx

x

+=

68

Accuracy: Evaluation Result

Side collision detection 100% precision when side collision detections present

100% coverage when side collisions present

Rear-end collision detection No detection at this stage – most rear-end collisions

happen as chain effects of side collisions

0% coverage

Overall 100% precision

< 100 % coverage due to undetected rear-end collisions

69


Approach/Methodology

Situation Understanding

Adaptive Data Stream Mining techniques perform real-time on-board diagnostics, with an acceptable degree of accuracy, for the risk situations identified

Prototyping and Evaluation

70


Apply LWC onboard a moving vehicle.

Create a clustering model.

Annotate the clusters with their labels using expert knowledge base.

Apply the annotated clustering to induce the driver status of drinking.

System Overview

On-Board Device Classification models - T

UDM clusters – t

Central Server

71


72

Applications – Patient Monitoring Cardiovascular Research Unit, Monash University

Focus on Cardiac Patients

Support for post hospital monitoring and recovery

Ageing population

Need For:

early diagnosis

for remote health monitoring rural areas: hard to access hospitals, facilities and specialists

elderly people avoiding regular trips/visits

for mobile health monitoring Provide continuous and convenient way of monitoring

Increase patients confidence to continue daily activities

Provide patients with self-management and awareness of disease


73

Biosensors

Alive Technology

A & D Medical

74

Bio-sensors:

Alive Technology (QLD)

Alive Diabetes Management System:-Bluetooth enabled-$550

Alive Heart Monitor +Accelerometer +

AliveECG (software)as a package:-Bluetooth enabled-$1200

75

A & D Medical (VIC)

•UA-767PBT model-Bluetooth enabled-uses the oscillometric method-price $379 -accuracy - ±3mmHg or 2% whichever is greater (pressure) ±5% (pulse)-Measurement range - 20-280mmHg (pressure) 40-200 pulse/minute (pulse)-Validation -Clinically Validated with a AA rating in accordance to British Hypertension Society and AAMI protocols.

76

Vitaphone - Professional Telemedicine

Solutions Vitaphone Tele-Care-Monitor 3370

Blood pressure monitor

Bluetooth enabled

Vitaphone Tele-ECG-Loop-Recorder 3100 BT Vitaphone Tele-ECG-Loop-Recorder 3300 BT

77

ActiveECG

ActiveECG with BluetoothIncludes the ActiveECG hardware, a Bluetooth adapter, software for Palm OS and companion software for the PC, ECG leadwires, battery, test cable, extra cover, and one set of ECG electrodes. US$899

78

Recent Projects

Larger scale:

EPI-MEDICS (Rubel et al. 2005) http://epi-medics.univ-

lyon1.fr/flash/epimedics.html

European collaboration

intelligent personal ECG Monitor (PEM) for early detection of cardiac event

80 PEM prototypes finalized and tested on 697 patients/citizens

MobiHealth (Konstantas et al. 2007) http://www.mobihealth.org/

using 2.5 (GPRS) and 3G (UMTS) technologies

Smaller scale:

Personal Health Monitoring System (Leijidekkers et al. 2006a,

2006b, 2007) http://www.personalhealthmonitor.net/index.html


79

Recent Projects

EPI-MEDICS (Rubel et al. 2005)

Detecting cardiac ischemia and arrhythmia

Detecting serial changes with reference to the patient’s stored ECGs

Personal Health Monitoring System(Leijidekkers et al. 2006a, 2006b, 2007)

Detecting Ventricular Fibrillation and Ventricular Tachycardia

80

Limitations of Current Systems

context-awareness:

the need for a general and formal context modelling and reasoning approach

Situations as a higher level of abstraction over context

context: room temp, blood pressure and heart rate

situations: ‘healthy’ and ‘hypertension’

Learning: data stream mining on mobile devices

the need for light weight algorithms

the need for context-aware adaptation of algorithms

81

SAAP Mobile Monitoring

82

83

Defining Medical Situations

Enter variable names and their minimum and maximum values

1-Add terms for each variable

2- Enter4 parameters for each term (more…)

Enter situation name and add conditions based on pre-defined variables and terms

(weight for conditions of a situation must add up to 1)

84

Situation-Aware

Adaptation Demo:

Video Here


85

Visualization of Situations

86

Applications – Context Aware Sensors and

Smart Wardrobe

Context-aware energy-efficient sensing for habitat monitoring: the Case of the Pig Farm and Data Mules

Building Profiles of Monitored Objects in Closed Environments: the Case of the Smart Wardrobe.


Context Aware Sensors

Suan (Khai) Chong, Seng Wai Loke, Shonali Krishnaswamy

88

Sensor Battery

Battery capacity is finite, and progress in battery technology is very slow.

Battery capacity expected to make little improvement in the near future.

89

Existing Techniques

Hardware approaches

Energy-efficient sensor designs (Chandrakasan and Brodersen, 1996)

Low-power wakeup radios (G. Guo and Rabaey, 2001)

Software approaches:

Energy-efficient architecture for a surveillance system. (He et al, 2004)

Prediction of neighbouring sensor readings to avoid sensors resending info. (Elnahrawy and Nath, 2004)

In Date Stream Management Systems, sensor proxies that can control sensor behaviour while answering user queries. (Madden and Franklin, 2002)

90

Existing Techniques…

Hardware approaches limited to specific sensor hardware.

Energy-efficient software designs are specific to sensor network applications.

(Elnahrawy and Nath, 2004) and (Madden and Franklin, 2002) focussed only on specific data correlations to control sensors.


91

Research Aims

General Aim:

To conserve energy in sensor networks by taking advantage of sensor data patterns to dynamically adapt sensor operations.

We address two issues :

Data Management in WSNs.

Energy Conservation.

92

Research Questions

(i) How do we determine the context from the sensor data?

(ii) Can we use the contextual knowledge to drive sensor operations?

93

Conserving Energy in WSN

We believe that:-

More energy could be conserved when sensors are fully aware of their environment.

94

Habitat Monitoring Example

Deployment of 32 sensor nodes using Mica motes coupled with Mica weather Boards to monitor petrel nests activity.

Known context:

(i) Petrels enter or leave nests during light phase => little/no sensing during those times => reduce data sampling.

(ii) Outside temperatures constant => less sensors required to sense outside => sleep a few sensors.

95

Contextual information

A sensor’s context:

- its profile, such as the location in a sensor network, and a common situation they face(e.g weather is hot)

- sensor state, e.g. battery power, network connectivity

- history of readings

- time

- etc.

96

The Conceptual Model


97

Sensor roles Partitioning

98

Context Discovery Module

Purpose is to obtain contextual information.

Context based on custom scenarios.

Mining Data Stream Offline/Online.

99

Major Components of the System

(i) Context Discovery Module.

(ii) Context-Trigger Module

(iii) Communication

(iv) Sensor operations repository

100

Context-Trigger Module

101

Other Components

Communication

- handles data transfer between the sensors and the application, receiving and sending data packets between 2 parties.

Sensor Operations Repository

- storage of sensor operations that constitutes action macros.

102

Implementation


103

Experimental Setup

*Note partitioning of sensors / bootstrapping 104

Experimental Tools

tinyOs,programming with nesC.

Mica2 motes with sensor board.

Simulations with PowerTOSSIM (Shnayder et al., 2004).

105

Experiments Performed

(i) Control Experiment

(ii) Transmission Rates Experiment

(iii) Message Size Experiment

(iv) Sleep Mote Experiment

106

Control Exp.

107

Transmission Rate Exp.

108

Sleep Mote



Context Aware Sensors

and Data Muling

Suan (Khai) Chong, Ian McCauley, Seng Wai Loke,

Shonali Krishnaswamy

110

Data Collection in WSNs

Efficient collection of data as one of the key challenges for sparse WSNs.

For sparsely deployed WSNs, the issues include: Heavily imposing particular nodes to relay data.

Draining sensor energy due to transmission over long distances.

=> Use of mobile nodes, Data Mules

111

Data Muling Example

*Mule A,B,C transporting data from S1/2/3/4 to Base 112

Research Aim

General Aim:

To conserve energy in sensor networks by taking advantage of mule data gathering patterns to dynamically adapt sensor operations.

We address two issues :

Data Management in WSNs.

Data Communication in Muling.

113

Mobility in Data Collection

Real life Data Muling examples:Data mule sensors mounted on spade used in vineyard [20].

AUV used as data mule to collect readings from underwater sensors [27].

External environmental conditions &

sensors’ location contributes to

contextual information.

114

A sensor’s context can be:

Sensor’s profile, such as location

External Environment

Past readings

Time

…

Contextual Information for

a sensor


115

Context-Aware Framework

116

Main System Components

Communication Server

Context Locator Service

Context Trigger Engine.

117

Experimental Scenario

118

Polling Approach - Mule

119

Polling Approach - Base

120

Polling Approach - Sensor


121

Context-Aware Muling

RFID sensors as “Sensors that provide contextual info. for control”.

Sensors(to be data muled) as “Sensors that are to be controlled by triggers”

Instead of polling, act only when data mules

are “near enough” (location of data mule as

context for triggering transmission)

122

Context-Aware Muling Approach

123

Context-Aware Muling: Steps

1. RFID readers connected to PC launches the CAP.

2. Based on the preprogrammed rules, send appropriate macros to sensors, depending on contextual input received.

Note: not just RFID but any positioning technology can be employed…

124

Context-Aware Muling (1)

(I) Once mule C enters the shed, RFID readers located at the entrance of shed detects mule C on entry.

(ii) The RFID readers send the detection information to CAP.

125


(iii) Signals will then be sent from PC to sensor 2, to initiate and establish data transfer connection with mule C.

(iv) After mule C is in safe distance of sensor 2, mule C will receive data from sensor 2 and send acknowledgements.

126


(iv) Mule C remains in listening mode if there are no more packets from sensor 2. CAP sends trigger to stop sensor 2 to stop

communication when mule C leaves shed again.


127

Summarizing…Future work

(A) We avoid packet loss as we automate the process of mule detection from the use of context triggers.

(B) We eliminate the need to broadcast signals continuously by sensors to establish connection.

Future work

Working on further applications of our framework.

Developing the learning component of the

framework.8/5/2008 © Krishnaswamy and Loke 128

Mining events from monitored/tracked

objects within a “closed environment”

e.g., clothes within a wardrobe?

UNOBSTRUSIVE USER PROFILING

The Use of RFID Technology to Create a “Smart Wardrobe”

Maria Indrawan, Sea Ling, Frida Samara, Seng Loke

129

Profiling Systems

Collecting individual data.

The ownership of the collected data belongs to organization rather than individual.

Requires users to interact directly with a computer panel.

130

Motivation

Return the ownership of collected data

to users!

Unobtrusive collection process

Utiliseexisting

technology

privacy convenience available

Smart Wardrobe

131

Smart Wardrobe

RFID

reader

Events Data

Smart

Wardrobe

application profile

132

Usage of Smart Wardrobe

Create a fashion profile for users.

The fashion profile:

assists users to understand his/her fashion behaviour.

assists users to make purchasing decision (recommender system).


133

Main Components of the System

Hardware

RFID and RFID reader.

Software

Events detector and events database.

Profiles generator.

134

Physical Layout

track

RFIDreader

RFID tagsembedded incloth hangers

135

Events

Item of clothes is out of the wardrobe. Poll the RFID tags every s interval.

‘Missing’ RFID tags is interpreted as item out of the wardrobe.

Item of clothes is being worn. The item is detected to be out of the wardrobe for

a given time t.

A pair of items is being worn together. The items are detected to be out of the wardrobe

for a given time t.

136

Profiles

Most and least frequently worn item.

Most and least frequently worn brand.

Most and least frequently worn colour.

Most and least frequently worn pattern (eg. floral, plain).

Most frequently worn combination of items; During the daytime.

During the evening.

During the weekend.

137

Profile Generator

Profile generator algorithms were developed based on:

Frequency analysis.

Association rules.

138

Prototype

Simulation based on the following assumptions: The inventory generator creates woman’s clothing items.

We choose to generate woman data because there is a broader range of woman clothing items compared to menswear.

All clothing items inside a single wardrobe belong to a single user.

All clothing items inside the wardrobe are tagged with RFID tags.

When one decides not to wear the item, one will always put the item back into the wardrobe. Therefore, application can be certain that the user wears clothing items that have been taken out of the wardrobe for a long time period


139

Interface

140

Sample Profiles

141

Summarizing…

RFID enables the creation of private and unobtrusive users profiles.

Design considerations: Hardware:

The types of RFID and the placement of the RFID in the object.

The placement of the RFID reader.

The accuracy of RFID reader.

Software What can be considered as an ‘event’ of interest?

How to map the RFID readings into an ‘event’ of interest?

142

Conclusions

Data mining for mobile and pervasive computing applications: data mining on mobile nodes or in computers embedded in the environment

Situated data mining anywhere, everywhere where enough data is generated

Data is, can be, and will be generated ubiquitously – e.g., picked up by an increasing number of ubiquitous sensors, etc…

Sensed information can be used to adapt mining, or mining of sensed information can be used to adapt applications

Data mining in everyday objects: data mining in your mobile phone, PDA, heart, blood stream, shoes, spectacles, wall of thisroom, car, table, coffee cup, jacket, …

Many open issues…

143

Resources

First International Workshop on Knowledge Discovery from Data Streams (IWKDDS) at ECML/PKDD 2004 on September 24th, 2004, in Pisa, Italy.

Organized by: Joao Gama, University of Porto, Portugal

Jesus S. Aguilar-Ruiz, University of Seville, Spain

Web: http://www.lsi.us.es/~aguilar/ecml2004/

Second International Workshop on Knowledge Discovery from Data Streams (IWKDDS) at ECML/PKDD 2005 on October 10th, 2005, in Porto, Portugal.

Organized by: Jesus S. Aguilar-Ruiz, University of Seville, Spain

Joao Gama, University of Porto, Portugal

Web: http://www.niaad.liacc.up.pt/~jgama/IWKDDS/

144

Resources (Cont’d)

Third International Workshop on Knowledge Discovery from Data Streams (IWKDDS) at ICML 2006 on June 29th, 2006, at Carnegie Mellon University (CMU) in Pittsburgh, PA, USA. Organized by:

Joao Gama, University of Porto, Portugal Jesús S. Aguilar-Ruiz, University of Pablo de Olavide, Spain Josep Roure, Carnegie Mellon University, US

Web: http://www.cs.cmu.edu/~jroure/iwkdds/iwkdds_icml06.html ECML/PKDD 2006 Workshop on Knowledge Discovery from Data

Streams Organized by:

João Gama,University of Porto, Portugal Jesus S. Aguilar-Ruiz, University of Seville / University of Pablo de

Olavide, Spain Ralf Klinkenberg, University of Dortmund, Germany

Web: http://www.machine-learning.eu/iwkdds-2006/


145


International Workshop on Knowledge Discovery from Ubiquitous Data Streams

Organized by:

João Gama, University of Porto, Portugal

Mohamed Medhat Gaber, CSIRO ICT Centre, Australia

Jesus S. Aguilar-Ruiz, University of Seville and University of Pablo de Olavide, Spain

Web: http://www.niaad.liacc.up.pt/~iwkduds/

ACM SAC – Data Streams Track (2004 – 2008) –papers could be accessed via ACM Portal

146


UCR Time Series Classification/Clustering Datasets

Maintained by:

Eamonn Keogh, UCR, US

Web: http://www.cs.ucr.edu/~eamonn/time_series_data/

Mining Data Streams Bibliography

Maintained by:

Mohamed Medhat Gaber, Monash University, Australia

Web: http://www.csse.monash.edu.au/~mgaber/WResources.htm

147

Resources

Books

Data Streams: Algorithms and Applications (Foundations and Trends in Theoretical Computer Science,) by S. Muthukrishnan (Now Publishers)

Data Streams: Models and Algorithms (Advances in Database Systems) by Charu C. Aggarwal (Ed) (Springer)

Learning from Data Streams: Processing Techniques in Sensor Networks by Joao Gama and Mohamed Medhat Gaber (Eds) (Springer)

Seminal Surveys

B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and Issues in Data Stream Systems, in Proceedings of PODS, 2002.

Gaber, M, M., Zaslavsky, A., and Krishnaswamy, S., Mining Data Streams: A Review, in ACM SIGMOD Record, Vol. 34, No. 1, March 2005, ISSN: 0163-5808

S. Muthukrishnan, Data streams: Algorithms and Applications. Proceedings of the fourteenth annual ACM-SIAM symposium on discrete algorithms, 2003

148

Project Publications

Gaber M. M., Data Stream Processing in Sensor Networks, a book chapter in Learning from Data Streams: Processing Techniques in Sensor Networks (Eds., Gama J., Gaber M. M.), published by Springer, 2007.

Gaber, M, M., Zaslavsky, A., and Krishnaswamy, S., A Survey of Classification Methods in Data Streams, an invited chapter in the forthcoming book Data Streams: Models and Algorithms, (Eds.) Charu Aggarwal, Springer Verlag, 2007.

Gaber, M, M., Zaslavsky, A., and Krishnaswamy, S., Mining Data Streams: A Review, ACM SIGMOD Record, Vol. 34, No. 1, June 2005, ISSN: 0163-5808.

149

Project Publications Horovitz, O., Krishnaswamy, S., and Gaber, M, M., A Fuzzy Approach for Interpretation

of Ubiquitous Data Stream Clustering and Its Application in Road Safety, Accepted for publication in Intelligent Data Analysis - Special Issue on Knowledge Discovery from Data Streams, 2006, IOS Press .

Gillick B., Krishnaswamy S., Gaber M. M., and Zaslavsky A., Visualisation of Fuzzy Classification of Data Elements in Ubiquitous Data Stream Mining, Accepted of publication in Proceedings of the Third International Workshop on Ubiquitous Computing, to be held in conjunction with the 8th International Conference on Enterprise Information Systems (ICEIS 2006), ICEIS Press.

Horovitz O., Krishnaswamy S., and Gaber M. M., A Fuzzy Approach for Interpretation and Application of Ubiquitous Data Stream Clustering, Proceedings of Second International Workshop on Knowledge Discovery in Data Streams, to be held in conjunction with the 16th European Conference on Machine Learning (ECML 2005) and the 9th European Conference on the Principals and Practice of Knowledge Discovery in Databases (PKDD 2005), Porto, Portugal, October 3-7, 2005.

Horovitz O., Gaber M. M., and Krishnaswamy S., Making Sense of Ubiquitous Data Streams - A Fuzzy Logic Approach, Proceedings of the 9th International Conference on Knowledge-based Intelligent Information & Engineering Systems 2005, Special Session on Knowledge Discovery in Data Streams, 14 - 16 September, 2005, Springer-Verlag.

Krishnaswamy S., Loke S. W., Rakotonirainy A., Horovitz O., and Gaber M. M., Towards Situation-awareness and Ubiquitous Data Mining for Road Safety: Rationale and Architecture for a Compelling Application, Proceedings of Conference on Intelligent Vehicles and Road Infrastructure to be held at The University of Melbourne, 16-17 February 2005.

150

Project Publications Gaber, M, M., Krishnaswamy, S., and Zaslavsky, A., Resource-Aware Mining of

Data Streams, Journal of Universal Computer Science, Special Issue on Knowledge Discovery in Data Streams, edited by Jesus S. Aguilar-Ruiz and Joao Gama, August 2005.

Gaber, M, M., Krishnaswamy, S., and Zaslavsky, A., On-board Mining of Data Streams in Sensor Networks, a book chapter in Advanced Methods of Knowledge Discovery from Complex Data, (Eds.) Sanghamitra Badhyopadhyay, Ujjwal Maulik, Lawrence Holder and Diane Cook, Springer Verlag,.2005.

Gaber, M, M., Zaslavsky, A., and Krishnaswamy, S., Resource-Aware Knowledge Discovery in Data Streams, Proceedings of First International Workshop on Knowledge Discovery in Data Streams, to be held in conjunction with the 15th European Conference on Machine Learning (ECML 2004) and the 8th European Conference on the Principals and Practice of Knowledge Discovery in Databases (PKDD 2004), Pisa, Italy, 20-24 September 2004.


151

Project Publications Gaber, M, M., Krishnaswamy, S., and Zaslavsky, A., Ubiquitous Data Stream

Mining, Current Research and Future Directions Workshop Proceedings held in conjunction with The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining, Sydney, Australia May 26 2004.

Gaber, M, M., Zaslavsky, A., and Krishnaswamy, S., Towards an Adaptive Approach for Mining Data Streams in Resource Constrained Environments,Proceedings of Sixth International Conference on Data Warehousing and Knowledge Discovery - Industry Track (DaWaK 2004), Zaragoza, Spain, 30 August - 3 September, Lecture Notes in Computer Science (LNCS), Springer Verlag.

Gaber, M, M., Zaslavsky, A., and Krishnaswamy, S., A Cost-Efficient Model for Ubiquitous Data Stream Mining, Proceedings of the Tenth International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2004), Perugia Italy, July 4-9.

Chen, Samantha and Rakotonirainy, Andry and Loke, Seng Wai and Krishnaswamy, Shonali (2007) A crash risk assessment model for road curves. In Proceedings 20th International Technical Conference on the Enhanced Safety of Vehicles, pages Chen 1-Chen 8, Lyons, France.

Ling, S., Indrawan, M., & Loke, S. (2007), RFID-based User Profiling of Fashion Preferences: blueprint for a smart wardrobe, International Journal of Internet Protocol Technology, 2(3/4), 153-164.

152

Project Publications Shah R., Krishnaswamy S., and Gaber M. M., Resource-Aware Very Fast K-Means for Ubiquitous

Data Stream Mining, Proceedings of Second International Workshop on Knowledge Discovery in Data Streams, held in conjunction with the 16th European Conference on Machine Learning (ECML 2005) and the 9th European Conference on the Principals and Practice of Knowledge Discovery in Databases (PKDD 2005), Porto, Portugal, October 3-7, 2005.

Gaber M. M., Yu P. S., Detection and Classification of Changes in Evolving Data Streams, International Journal of Information Technology & Decision Making, Vol. 5, No. 4, World Scientific Publishing Company, 2006.

Gaber M. M., Yu P.S., Classification of Changes in Evolving Data Streams using Online Clustering Result Deviation, Proceedings of the third International Workshop on Knowledge Discovery in Data Streams June 29, 2006, Pittsburgh PA, USA.

Gaber, M, M., Krishnaswamy, S., and Zaslavsky, A., A Wireless Data Stream Mining Model,Proceedings of the Third International Workshop on Wireless Information Systems (WIS 2004), Held in conjunction with the Sixth International Conference on Enterprise Information Systems (ICEIS 2004), Porto, Portugal, ICEIS Press, ISBN.

Gaber, M, M., Krishnaswamy, S., and Zaslavsky, A., Cost-Efficient Mining Techniques for Data Streams, First Australasian Workshop on Data Mining and Web Intelligence DMWI 2004, Held in conjunction with the Australasian Computer Science Week (ACSW 2004), Dunedin, New Zealand.

Gaber, M, M., Krishnaswamy, S., and Zaslavsky, A., Adaptive Mining Techniques for Data Streams Using Algorithm Output Granularity, Australasian Data Mining Conference AusDM2003, Held in conjunction with the 2003 Congress on Evolutionary Computation (CEC 2003), December, Canberra, Australia.

153

Project Publications Chong, S. K., McCauley, I., Loke, S. W., Krishnaswamy, S., 2007, Context-aware

sensors and data muling, Proceedings of Context-Awareness for Self-Managing Systems (Devices, Applications and Networks) International Workshop (CASEMANS 2007), 13 May 2007, VDE VERLAG GMBH, Berlin Germany, pp. 103-117.

Haghighi, P. D., Gaber, M. M., Krishnaswamy, S., Zaslavsky, A., Loke, S. W., 2007, An architecture for context-aware adaptive data stream mining, Proceedings of the International Workshop on Knowledge Discovery from Ubiquitous Data Streams, 17 September 2007, IWKDUDS, http://www.ecmlpkdd2007.org/CD/workshops/IWKDUDS/workshop_print.pdf, pp. 117-128.

Haghighi, P. D., Zaslavsky, A., Krishnaswamy, S., 2006, An evaluation of query languages for context-aware computing, Proceedings of the Seventeenth International Workshop on Database and Expert Systems Applications (DEXA 2006), 04 September 2006 to 08 September 2006, IEEE Computer Society, Los Alamitos USA, pp. 455-459.

Chong, S. K., Krishnaswamy, S., Loke, S. W., 2005, A context-aware approach to conserving energy in wireless sensor networks, Proceedings of the 3rd International Conference on Pervasive Computing and Communications Workshops, 08 March 2005 to 12 March 2005, IEEE Computer Society, Los Alamitos CA USA, pp. 401-405.Chong, S. K., Loke, S. W., Krishnaswamy, S., 2005, Wireless sensor networks: from data to context to energy saving, Proceedings of the International Workshop on Ubiquitous Data Management, 4 April 2005, IEEE Computer Society, Los Alamitos USA, pp. 33-40.

154

Project Publications F. D. Salim, S. W. Loke, A. Rakotonirainy, S. Krishnaswamy, "U&I Aware: A Framework Using

Data Mining and Collision Detection to Increase Awareness for Intersection Users", Proc. of

AINA Workshops 2007, The 2007 IEEE International Symposium on Ubisafe Computing

(UbiSafe-07), in conjunction with AINA 2007, Niagara Falls, Canada, May 21-23, 2007, IEEE

Computer Society Press.

F. D. Salim, S. Krishnaswamy, S. W. Loke, A. Rakotonirainy, "Context-Aware Ubiquitous Data

Mining Based Agent Model for Intersection Safety", Proc. of EUC Workshops 2005, The 2nd

International Symposium on Ubiquitous Intelligence and Smart Worlds (UISW 2005), in

conjunction EUC 2005, 6-7 December 2005, LNCS, Springer-Verlag, pp. 61-70.

F. D. Salim, S. W. Loke, A. Rakotonirainy, S. Krishnaswamy, "Simulated Intersection

Environment and Learning of Collision and Traffic Data in the U&I Aware Framework",

accepted for publication in Proceedings of The 4th International Conference on Ubiquitous Intelligence and

Computing (UIC-07), Hong Kong, China, July 11-13, 2007, LNCS, Springer-Verlag.

F. D. Salim, S. W. Loke, A. Rakotonirainy, S. Krishnaswamy, "U & I Aware (Ubiquitous

Intersection Awareness): a Framework for Intersection Safety", Handbook on Mobile and Ubiquitous

Computing: Innovations and Perspectives, E. Syukur, L. Yang, and S. W. Loke, Ed. American Scientific

Publishers, to be published in 2007.

155

Project Publications Gaber M. M., Yu P. S., A Holistic Approach for Resource-aware Adaptive Data Stream Mining, Journal

of New Generation Computing, Special Issue on Knowledge Discovery from Data Streams, 2006.

Gaber M. M., Yu P. S., A Framework for Resource-aware Knowledge Discovery in Data Streams: A Holistic Approach with Its Application to Clustering, Proceedings of the 21st ACM Symposium on

Applied Computing (ACM SAC 2006) - Data Streams Track, 23 - 27 April 2006, Dijon, France, ACM Press.

Phung N. D., Gaber M. M., and Roehm U, Resource-aware Online Data Mining in Wireless Sensor Networks, Proceedings of IEEE Symposium on Computational Intelligence and Data Mining, IEEE Press, 2007.

Phung N. D., Gaber M. M., and Roehm U., Resource-aware Distributed Online Data Mining for Wireless Sensor Networks, Proceedings of the International Workshop on Knowledge Discovery from Ubiquitous Data Streams (IWKDUDS07), in conjunction with ECML and PKDD 2007, September 17, Warsaw, Poland,

2007

Roehm U., Gaber M. M., and Tse Q, Enabling Resource-Awareness for In-network Data Processing in Wireless Sensor Networks, Proceedings of the Nineteenth Australasian Database Conference (ADC2008), January 22-25, Wollongong, 2008

Roehm U, Scholz B., and Gaber M. M., Integration of Data Stream Clustering into a Query Processor forWireless Sensor Networks, Proceedings of the International Workshop on Data Intensive Sensor Networks

(DISN07) held in conjunction with MDM 2007, IEEE Press

Agarwal I., Krishnaswamy S., and Gaber M. M., Resource-Aware Ubiquitous Data Stream Querying, Proceedings of the International Conference on Information andAutomation, December 15-18, 2005, Colombo, Sri Lanka.

Data Mining for Mobile and Situation Aware Adaptive...

Documents

Transcript of Data Mining for Mobile and Situation Aware Adaptive...