K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN...

58
KEYWORD SEARCH OVER RELATIONAL TABLES AND STREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science and Technology Doklea Meci (A.M 2152) May 2012 University Of Crete Department Of Computer Science 1

Transcript of K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN...

Page 1: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

1

KEYWORD SEARCH OVER RELATIONAL TABLES AND STREAMS

ALEXANDER MARKOWETZ

University of Bonn

YIN YANG and DIMITRIS PAPADIAS

Hong Kong University of Science and Technology

Doklea Meci (A.M 2152)

May 2012

University Of Crete

Department Of Computer Science

Page 2: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

3

THE CHALLENGES OF ACCESSING STRUCTURED DATA Query languages:

Numerous complex SQL statements

Schemas: Complex, or nontrivial

schema

R-KWS queries: replaces numerous

complex SQL statements liberates users from

studying a database schema

allows querying for terms in unknown locations (tables/attributes)

Page 3: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

INTRODUCTION

KeyWord Search (KWS) each document/Web page constitutes one unit of information

a result if it contains a subset of the query’s keywords

has been applied to relational DBMS allows data retrieval without SQL

Relational-Keyword Search (R-KWS) the basic unit of information is a record/tuple queries cannot be answered by inspecting

records individually results have to be constructed by joining tuples

Page 4: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

5

OUTLINE

Introduction Relational Keyword Search On Tables

Graph-Based Processing Operator-Based Processing

Optimizations For Continuous GB Predecessor-KL Time-KL

Optimizations For Continuous OB Operator Mesh Demand-Driven Operator Execution Partial-Mesh

Experimental Evaluation Snapshot R-KWS Queries over Tables Continuous R-KWS Querie0s over Streams Summary of Experimental Evaluation

Conclusion

Page 5: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

RELATIONAL KEYWORD SEARCH ON TABLES Goal: methods for BG and OB processing

avoid the shortcomings of prior systems improve performance of R-KWS in conventional

databases

Page 6: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

7

GRAPH-BASED PROCESSING

Basic Idea: given an inverted index I (on disk), it traverses

an undirected data graph G (in memory), searching for MTJNT (Minimal Total Join Networks of Tuples ) results

JNT –Join Networks of Tuples (JNT), which are connected acyclic components of G

A JNT is called Minimal Total JNT (MTJNT) iff it is impossible to remove any node and find the remainder to be total

Page 7: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

8

GSEARCH ALGORITHM

Basic Idea: the algorithm enumerates all possible trees in G rooted at sn

Result: a tree that corresponds to an MTJNT

Page 8: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

9

GSEARCH ALGORITHM

GSearch maintains a queue Q of trees each constituting a fraction of a potential MTJNT

Every tree is de-queued and expanded by adding one new node , resulting in a new tree

The new tree falls into one of three categories: It forms an MTJNT, and is included in the result set It has the potential to become an MTJNT, and is

inserted in Q to be expanded later None of the previous and the tree can be safely

discarded The algorithm terminates when Q becomes

empty

Page 9: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

10

GSEARCH ALGORITHM

GSearch computes the set of MTJNT containing node sn and so GB answers an R-KWS query q correctly, completely, without duplicates.

Page 10: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

11

OPERATOR-BASED PROCESSING

Basic Idea: Query processing relies on Candidate Networks

(CN)

Candidate Networks (CN) are projections of MTJNT onto the expanded schema a tuple s of relation S maps to node S{K} EG(q), iff s

contains all keywords in K , but does not contain any other term in q\K

An MTJNT projects to a unique CN

Page 11: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

12

EXAMPLE

Page 12: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

13

EXAMPLE

Page 13: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

14

OUTLINE

Introduction Relational Keyword Search On Tables

Graph-Based Processing Operator-Based Processing

Optimizations For Continuous GB Predecessor-KL Time-KL

Optimizations For Continuous OB Operator Mesh Demand-Driven Operator Execution Partial-Mesh

Experimental Evaluation Snapshot R-KWS Queries over Tables Continuous R-KWS Querie0s over Streams Summary of Experimental Evaluation

Conclusion

Page 14: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

15

OPTIMIZATIONS FOR CONTINUOUS GB

Basic Idea: Keyword labeling a simple and effective method to summarize

reachable keywords for a given node.

Improves performance by avoiding unnecessary calls to GSearch and constraining graph traversals.

A keyword label (KL) of format , stored at node n, indicates a path of h edges in the data graph, connecting n to an occurrence of keyword .

Page 15: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

16

EXAMPLE s:[ ,2] corresponds to

the path connecting s to an occurrence of , via 2 edges

Page 16: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

17

BENEFITS OF A MIN-COMPLETE LABELING GSearch(G, q, s) is called if s node can reach all query

terms, only if the node stores a KL for every k ∈ q. In any other case, s is guaranteed not to participate

in an MTJNT.

KL-aware Gsearch Algorithm: Inserts into Q iff there exists a set NL of labels with

belows criteria:

The KL in NL can reach all missing keywords; that is, NL

Page 17: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

18

EXAMPLE - INTERMEDIATE TREES ABANDONED BY KL-AWARE GSEARCH. ( = 9)

lacking keyword new nodes can only be

added to node can reach in four

hops, the shortest path to

2-nd criteria not satisfied!while = 6; + 4 FAIL! 6+4

Page 18: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

19

PREDECESSOR-KL IMPLEMENTATION

Basic Idea: A predecessor-KL is a triplet of the form [k, h, p]

a path of length h, connecting n to an occurrence of keyword k

p is n’s predecessor

Every node n must contain a predecessor-KL [k, h, p] for the shortest path leading from n through p to the occurrence of k

An arriving tuple s can itself contain a keyword, or create new paths between keywords and nodes

require KL insertions and updates

each path contains at most edges

Page 19: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

20

PREDECESSOR-KL EXAMPLE

must keep bothKL [] , KL[,1, ] represent the shortest

path via predecessors and

both paths (to and ) share the same predecessor

suffices to keep KL [] through node

Page 20: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

21

TIME-KL

Basic Idea: More efficient labeling that does not require

explicit removal A time-KL is a triplet [k, h, ] indicating a

path of length h to an occurrence of keyword k, which exists until KL [k, h1, ] dominates another [k, h2, ] iff ( h1 h2 and )Result: the graph that contains all KL that are not

dominated by others

Page 21: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

22

TIME-KL EXAMPLE

1) is connected to in via 2 hops

2) is connected to in via 1 hop

3) is connected to in via 3 hops and node expires at 21

Result:

(1) and (2) must be stored as each indicates the shortest path for some period of time.

(3) is not recorded as it expires sooner than the other two

Page 22: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

23

OUTLINE

Introduction Relational Keyword Search On Tables

Graph-Based Processing Operator-Based Processing

Optimizations For Continuous GB Predecessor-KL Time-KL

Optimizations For Continuous OB Operator Mesh Demand-Driven Operator Execution Partial-Mesh

Experimental Evaluation Snapshot R-KWS Queries over Tables Continuous R-KWS Querie0s over Streams Summary of Experimental Evaluation

Conclusion

Page 23: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

24

OPTIMIZATIONS FOR CONTINUOUS OB

Basic Idea: If a selection on a table (e.g., T{}) returns no

tuples, all operator trees using this input can be discarded immediately For data streams, this is not permissible

Even though the selection T{} does not currently produce tuples, it may do so in the future, and all operator trees must thus be maintained.

Solution: optimizations that enable efficient OB R-KWS

over data streams

Page 24: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

25

OPERATOR MESH (1/3)Basic Idea:

sharing common subexpressions all operator trees are integrated into an operator mesh, reducing

CPU cost (for evaluating joins) as well as memory overhead (for intermediate results).

The mesh has |SR|* clusters |SR| is the number of streaming relations |K| the number of query keywords

Each cluster contains the operator trees for all CN (Candidate Networks) discovered from a certain

The entire operator mesh has |SR|* leafs/sources, one for each node of the extended schema

Maximum depth of the mesh is +1 Number of edges depends on the schema complexity Different clusters are interconnected only through

their source operators Joins from different clusters do not connect directly

Page 25: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

26

OPERATOR MESH EXAMPLE

shows the shared execution of four operator trees

Page 26: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

27

OPERATOR MESH EXAMPLE

Algorithm: The first node in a cluster corresponds to the root

node , from which CNGen starts Whenever the algorithm generates a new tree

from (by adding a new child to a parent ), a join .op is added to the mesh

The left child of .op is .op (the operator that was inserted when was created)

The right child is the source of For each tree t in CNGen, a pointer is maintained to

the corresponding operator t.op, to decide where to place subsequent joins when t is expanded

The algorithm is initialized with t first .op pointing to the source of

Page 27: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

28

PROBLEMS WITH OPERATOR MESH APPROACH

Example: Assume tuples from S{} and T{} and

V{},U{, },V {, } are empty none of the joins , , or requires the output of

because they do not receive right input

Worst case:

’s results expire before the arrival of any tuples from V{},U{, } or V {, }

The join has wasted CPU and memory, without any contribution to the query

Page 28: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

29

DEMAND-DRIVEN OPERATOR EXECUTION (2/3) This mesh is maintained in main memory

throughout the lifespan of the query. A join is considered to be either

running - operators process input Sleeping – operators ignore input

A join operator is sent to sleep if: it has no input from the right child (a source), or all its parents are sleeping

Sending operators to sleep does not affect the result’s correctness or completeness because either: the operator cannot produce output, or its output would not be consumed

Page 29: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

30

DEMAND-DRIVEN OPERATOR EXECUTION - EXAMPLE

Shows the state diagram for a join operator

Page 30: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

31

DEMAND-DRIVEN OPERATOR EXECUTION - EXAMPLE

States are characterized by two binary flags: d indicating that at least one parent operator is running, and r specifying that the operator’s right input is not

empty. An operator only runs in the topmost state (d/r) Operators exchange messages regarding their

state, in order to ensure that all d and r flags are up-to-date.

When it leaves this state (transition 2 or 3) it goes to sleep (or halts), to wake up (or restart) later (transitions 9 and 10)

a join operator communicates changes (running/sleeping) to its left child that adjusts its d flag

Page 31: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

32

DEMAND-DRIVEN OPERATOR EXECUTION - EXAMPLE

Assume U{, } stops producing output

Result: turns off its r flag,

goes to sleep (transition 2)

calls its left child decreases its counter of running parents no further actions

for as there are other running parents ,

Page 32: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

33

DEMAND-DRIVEN OPERATOR EXECUTION - EXAMPLE

If T{},V{, } dries up, too, then, goes to sleep

When operator decreases its counter (rParents=0)

Trasition 3

Page 33: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

34

EXAMPLE- CONSIDERING THAT THE ONLY RUNNING JOIN OPERATORS ARE AND

Join does not generate results, due to lack of left input

When T{} begins producing output, it causes to adjust its r flag, wake up (transition 9), and

call .Pstart operator restarts

and informs

Page 34: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

35

EXAMPLE - ALL JOINS RUN AGAIN EXCEPT AND

Note!!! this method is not restricted to keyword search; it can

equally benefit other data stream applications.

Page 35: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

36

PARTIAL-MESH (3/3)BASIC IDEA

A Partial-Mesh (PM) is built at runtime and breaks the distinction between

operator initialization Tuple processing

The method maintains relatively few active operators in memory

It is each operator’s responsibility to create its parents before it can produce output

It destroys its parents (and other operators up the tree) if it cannot supply them with input

In large meshes operators are idle Their absence does not affect result’s

completeness, but dramatically reduces memory consumption

Page 36: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

37

PARTIAL-MESH EXAMPLE

When the leftmost source S{} first produces output

It creates its direct parents and

when generates results, it creates its own parents

Page 37: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

38

PARTIAL-MESH EXAMPLE

when outputs a first tuple t and instantiates , this operator immediately probes t against T {}

Page 38: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

39

PARTIAL-MESH ALGORITHM

Basic Idea: TreeGen, is an algorithm for reconstructing a tree

I decideS which parents to create

The algorithm checks the join condition of .op If is the source joined with then is generated

by adding as the rightmost child of in

Page 39: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

40

PARTIAL-MESH EXAMPLES OF TREEGEN.

TreeGen(S{} )returns a tree that contains a single node S{}

parent is inserted in the mesh and connected to its left and right inputs

The call TreeGen() returns the tree

The expansion of reveals the parents of (e.g., , , )

Page 40: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

41

OUTLINE

Introduction Relational Keyword Search On Tables

Graph-Based ProcessingOperator-Based Processing

Optimizations For Continuous GBPredecessor-KLTime-KL

Optimizations For Continuous OBOperator MeshDemand-Driven Operator ExecutionPartial-Mesh

Experimental EvaluationSnapshot R-KWS Queries over TablesContinuous R-KWS Queries over Streams

Conclusion

Page 41: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

42

SNAPSHOT R-KWS QUERIES OVER TABLES (1/3)

Comparing GB and OB implementation: Experiments are focused on tables

Part (0.2M entries), Supplier (10K), PartSupp (0.8M), Customer (150K), Orders (1.5M), and LineItem (6M)

Two tables can join if and only if there is a foreign-key to primary-key between them

The length of join sequences is restricted to , which ranges between 4 and 6.

Page 42: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

43

EXAMPLE

Page 43: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

44

EXAMPLE - SEVEN SETS OF R-KWS QUERIES QS 1 -QS 7

QS 1, QS 2 : people’s or companies’ names (denoted as PeopleName), which appear in the columns Customer. Name, Supplier.Name, and Orders.Clerk; (retrieve connections between multiple people)QS 3 /QS 4:terms from the name of apart, for example, “ivory”, from the Part.Name attribute;

Page 44: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

45

EXAMPLE - SEVEN SETS OF R-KWS QUERIES QS 1 -QS 7

QS 5, QS 6 :years, which are present in LineItem.ShipDate, LineItem.CommitDate, LineItem.ReceiptDate, Orders.OrderDate; QS 7 :terms from Part.Brand, Part.Mfgr, Part.Size, and Part.Container

Page 45: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

46

EXAMPLE- PROCESSING TIME FOR QUERIES QS 1 -QS 7

The below picture depicts the total runtime ( y-axis) of GB and OB The result set cardinality |R| (below the x-

axis) for the seven query sets Report the median values after setting to 4,

5, and 6.

Page 46: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

47

SNAPSHOT R-KWS QUERIES OVER TABLES –CONCLUSION

(+) For conventional tables, GB is more

efficient than OB,. GB methods, GSearch avoids

duplicate results reduces the total cost GB is preferable for datasets with

frequent updates (-) Not efficient for queries involving

numerous keywords and/or a large value of T max

consumes a large amount of main memory to store the data graph

Conclusion:On servers dedicated for R-KWS queries, GB is the best choice due to its high performance

(+) OB utilizes the

functionality provided by a DBMS, and, thus, can answer R-KWS queries using much less memory than GB

Conclusion:On servers running multiple applications and only answering R-KWS queries infrequently, OB might be preferable due to its low memory footprint

GB OB

Page 47: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

48

CONTINUOUS R-KWS QUERIES OVER STREAMS(2/2)

Page 48: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

49

CONTINUOUS R-KWS QUERIES OVER STREAMS

Page 49: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

50

CONTINUOUS R-KWS QUERIES OVER STREAMS

Page 50: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

51

CONTINUOUS R-KWS QUERIES OVER STREAMS

Page 51: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

52

CONTINUOUS R-KWS QUERIES OVER STREAMS

Page 52: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

53

CONTINUOUS R-KWS QUERIES OVER STREAMS

Page 53: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

54

CONTINUOUS R-KWS QUERIES OVER STREAMS

Page 54: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

55

CONTINUOUS R-KWS QUERIES OVER STREAMS

Page 55: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

56

CONTINUOUS R-KWS QUERIES OVER STREAMS - CONCLUSION

FM is usually the most

CPU-efficient method for a single query

GB and PM are more economical in terms of memory consumption

FULL MESH (FM) Partial Mesh (PM)

Page 56: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

57

OUTLINE

Introduction Relational Keyword Search On Tables

Graph-Based ProcessingOperator-Based Processing

Optimizations For Continuous GBPredecessor-KLTime-KL

Optimizations For Continuous OBOperator MeshDemand-Driven Operator ExecutionPartial-Mesh

Experimental EvaluationSnapshot R-KWS Queries over TablesContinuous R-KWS Queries over Streams

Conclusion

Page 57: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

58

CONCLUSION – ADVANTAGES OF R-KWS

R-KWS handles broad query tasks whose complexity does not permit handcoded structured queries

Presents considerable algorithmic challenges because query processing has to explore a vast search space

Challenges are faced through a series of contributions

they provide R-KWS semantics that are well defined and easily extensible to streaming environments

develop GB and OB processing techniques that match these semantics and remedy problems encountered in previous systems

they adapt their framework to relational streams, and propose a wide range of optimizations

support their claims through an extensive set of experiments

Page 58: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science.

59

CONCLUSION – FUTURE WORK

They plan to further improve R-KWS performance by means of indexing

They intend to integrate ranking into continuous R-KWS query processing Example:

if there are a sudden burst of results, it may be desirable to report only the top-k answers for the affected period.