Continuous Data Stream Processing

20
Continuous Data Stream Continuous Data Stream Processing Processing MAKE Lab Date: 2006/03/07 Post-Excellence Project Post-Excellence Project Subproject 6 Subproject 6

description

Continuous Data Stream Processing. MAKE Lab. Post-Excellence Project Subproject 6. Date: 2006/03/07. Peer search engine. Profile database. Cluster coordinator. Cluster monitor. Music channel simulator. XML Filtering engine. MusicXML database. Music Virtual Channel. Clustering - PowerPoint PPT Presentation

Transcript of Continuous Data Stream Processing

Page 1: Continuous Data Stream Processing

Continuous Data Stream Continuous Data Stream ProcessingProcessing

MAKE Lab

Date: 2006/03/07Post-Excellence ProjectPost-Excellence ProjectSubproject 6Subproject 6

Page 2: Continuous Data Stream Processing

Continuous Data Stream Processing

22

Clusteringengine

Clusteringengine

Music metadat

a

Music metadat

a

Music Virtual ChannelMusic Virtual Channel

…11

NN

22

Music collections

Internet V.C.player

V.C.player Filtering

engineFilteringengine

Music channel simulat

or

Music channel simulat

or

InterfaceInterface

ProfilemonitorProfile

monitorChannelmonitorChannelmonitor

FavoritechannelFavoritechannel

ClustermonitorClustermonitor

Clustercoordinator

Clustercoordinator

Peer searchengine

Peer searchengine

Profiledatabase

Profiledatabase

MusicXML

database

MusicXML

database

XML Filteringengine

XML Filteringengine

Page 3: Continuous Data Stream Processing

Continuous Data Stream Processing

33

Research DirectionsResearch Directions

Streaming Data

Management

Mining

Filtering

Temporal Query Processing

Spatial Query Processing

Aggregate Query Processing

Frequent Tree Pattern Mining

Frequent Itemset Mining(sliding window)

Sequence Query Matching

Episode Query Matching

Range Search

KNN Search

Top-K Search

Closed Tree Pattern Mining

Frequent Itemset Mining(landmark model)

Page 4: Continuous Data Stream Processing

Continuous Data Stream Processing

44

Sequence Query MatchingSequence Query Matching

Given a set of sequence queries (SQs), how to continuously monitor the event stream for them and report the segments that are approximate answers of certain queries as soon as the segments arrive according to the error bounds of the queries?

Event Stream <a,b,c,d><c,e><a,b,c><b,d><a,d><e,f><a,e><a,b,c

><e,f><a,b,c><e><b,c,e><d,f>······················Sequence Query

<a,b,c><b,d><a,c,d><e,f><a,e>, ε=1

Page 5: Continuous Data Stream Processing

Continuous Data Stream Processing

55

Episode Query MatchingEpisode Query Matching

Knowledge Discovery from Telecommunication Network Alarm Databases [ICDE96] If an alarm of type A occurs, then an alarm of type B occurs within 30

seconds with probability 0.8 If alarms of types A and B occurs within 5 seconds, then a alarm of typ

e C occurs within 60 seconds with probability 0.7 If an alarm of type A precedes an alarm of type B, and C precedes D, a

ll within 15 seconds, then E will follow within 4 minutes with probability 0.6

A

A B5 seconds

C D

A

B

15 seconds

Page 6: Continuous Data Stream Processing

Continuous Data Stream Processing

66

Top-K Query Top-K Query

Suppose there are two continuous queries and . Then, another continuous query is registered.

Coordinator

Server 1

Server 2 Server 3

Server4

Queries

Which two web documents are the most popular across the first and second servers?

Which two web documents are the most popular across the third and fourth servers?

Which two web documents are the most popular across the second and third servers?

Page 7: Continuous Data Stream Processing

Continuous Data Stream Processing

77

Main DifficultiesMain Difficulties

Heavy Communication Cost The serve only updates its current data when necessary

Multiple Continuous Queries Most papers focus on one-time top-k queries or single

continuous top-k query Information sharing is necessary

Page 8: Continuous Data Stream Processing

Continuous Data Stream Processing

88

SearchengineSearchengine

V.C.player

V.C.player

V.C.player

V.C.player

user profile,channel

V.C.playerrecommended

channel

selectedchannel

Vote Mechanism

Spatial Query ProcessingSpatial Query Processing

Continuous queries for moving objects in high-dimensional space Range search KNN search

userprofile

Page 9: Continuous Data Stream Processing

Continuous Data Stream Processing

99

Problem DefinitionProblem Definition

Given a set of objects with their positions on a N-dimension (N>20) region. The set of objects is highly dynamic: each object can move in an unrestricted fashion, i.e., we do not assume any pattern of motion

Continuously monitoring the results of each query point Range Query KNN Query

Page 10: Continuous Data Stream Processing

Continuous Data Stream Processing

1010

Main DifficultiesMain Difficulties

Heavy Communication Cost The object updates occur only when the results for some

queries might change• Safe Region [SIGMOD05]

Incremental Update Efficiently maintain the effective results

Multiple Continuous Queries Decide the quarantine area for each query

Mixed Types of Queries Support both the range query and

the KNN query Q1 Q2

Q1Q2

Q1 Q2

Page 11: Continuous Data Stream Processing

Continuous Data Stream Processing

1111

Range QueryRange Query

Query Q: (x,y), r

Cell CA: max < rB: min r maxC: min > rmax: dis(query,cell)min: dis(query,cell)

Page 12: Continuous Data Stream Processing

Continuous Data Stream Processing

1212

Range Query (Cont.)Range Query (Cont.)

Moving Query MQ

How to maintain the Result for a MQ?

Page 13: Continuous Data Stream Processing

Continuous Data Stream Processing

1313

Range Query (Cont.)Range Query (Cont.)

When to update?

Q1 Q2 Q3

A A A

A A B

A A C

No update and no recalculate

Update and recalculate for some queries

No update and no recalculate

We only need to consider those objects marked with B

flag = 0/1

Client

Server Q1 Q2 Q3

Page 14: Continuous Data Stream Processing

Continuous Data Stream Processing

1414

Range Query (Cont.)Range Query (Cont.)

For a range query Q

Result list O3 O5 O7

Affected queries Q2 Q4 Q7A

For a cell C

Q3 Q6 Q9BC2

Covered cells

C2

C3 C4 C5A

C2 C7 C9B

Query Motion

Page 15: Continuous Data Stream Processing

Continuous Data Stream Processing

1515

KNN QueryKNN Query

Query Q: (x,y), 3

update the order

Object Update

re-computation

update the order

Page 16: Continuous Data Stream Processing

Continuous Data Stream Processing

1616

KNN Query (Cont.)KNN Query (Cont.)

Query Q: (x,y), 3

Query Q’: (x’,y’), rr = d’max

d’max

Page 17: Continuous Data Stream Processing

Continuous Data Stream Processing

1717

KNN Query (Cont.)KNN Query (Cont.)

Query Q: (x,y), 3

dmax

dquery

Query Q’: (x’,y’), rr = dmax+dquery

Page 18: Continuous Data Stream Processing

Continuous Data Stream Processing

1818

KNN Query (Cont.)KNN Query (Cont.)

Query Q: (x,y), 3

dmax

dcell

Query Q’: (x’,y’), rr = dmax+dcell

Page 19: Continuous Data Stream Processing

Continuous Data Stream Processing

1919

Tree Pattern Mining

As the trees stream in, find out the subtrees that occur more than θ·N times, where N is the number of trees received so far and 0≦θ 1≦

STMerSTMer

Frequent Tree Patterns

T1 T3 T2

Page 20: Continuous Data Stream Processing

Continuous Data Stream Processing

2020

Closed Tree Pattern Mining

Mining closed frequent subtrees over data streams a subtree is closed if none of its proper supertrees

has the same support as its

A

B

C D

A

B

C

B

C D

closed

A B C D

B

D

B

C

B

C D

A

B

C2 3 3 2 2 3 2 2

frequent subtrees

A

B2