Senthil Gnanaprakasam System R Based Query Execution Optimization for Internet Information...

Post on 19-Dec-2015

213 views 1 download

Tags:

Transcript of Senthil Gnanaprakasam System R Based Query Execution Optimization for Internet Information...

Senthil Gnanaprakasam

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization for Internet Information

Gathering

MS Committee

Dr Subbarao KambhampatiDr Chitta Baral

Dr Susan D Urban

MS Thesis Defense

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Organization

Internet Information Gathering

Internet databases

Join ordering issues

Current methods and algorithms

Internet System R Algorithm

Implementation and conclusion

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Internet Information Gathering

Internet Information Gathering

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Need to order pizzas!

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Need to order pizzas!

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Need to order pizzas!

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Need to order pizzas!

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Need to order pizzas!

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Need to order pizzas!

Information Gatherer

Other Sources

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Information Integration

Uniform query interfaceUses mediated schema/virtual relationsSource descriptions & statistics

Query rewriting

Query plan optimization

Query execution engine

Wrapper

Wrapper

Wrapper

Global Data Model

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Query Rewriting

• Sound: Does all data returned satisfy given query?

• Complete: Does the query return all possible sound results?

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Query Plan Optimization

• Subsumption of sources

• Quality of data

• Cost of accessing sources

• Ordering for optimized cost

Duplicate sources

Pay-per-use,Network

costs

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Internet Databases

Internet Databases

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Types of internet databases

Form interfaced database

Text database

Intranet databases

1

2

3

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Source Statistics

Access and transfer time vary widely with

• Type of source: local, intranet, Internet

• Time of the day

• Number and speed of servers

• Reliability of connection

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Binding Constraints

What attributes can be bound?

Books [isbn, title, author, publisher, price, pagesf]

YellowPages (lastNamef, firstName, zip, phoneb)

YellowPages (lastNameb, firstName, zip, phonef)

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Join Ordering Issues

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Importance of join ordering

P ⋈ Q

a + 100t + 100a + 10t

Q ⋈ P

a + 10t + 10a + 100t

Cost(P⋈Q): a + 100t + 100a + 10t

Cost(Q⋈P): a + 10t + 10a + 100t

90aTraditional: Hard disk seek time

Internet source access

time

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Internet Information GatheringUniversity (Intranet)

Administration

Student

Library (Machine)

Borrow

Information Gatherer

Lost

Books

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Schema

• Books (isbnb, title, author, publisher, price, pages)

• Student (idb, firstName, lastName)• Borrow (studentId, isbn, dateIssued)• Lost (isbn)

• SELECT *• FROM Student, Books, Lost, Borrow• WHERE

Borrow.isbn=Lost.isbn AND Books.isbn=Lost.isbn AND Borrow.id=Student.id

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Current Methods & Algos

Current Methods & Algorithms

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Bound Is Easier

Binding values to attributes produces lesser number of results

•Valid heuristic in absence of source statistics

Example:

|UniversityStudent(Name, Age, Sex, Dept)| 50,000

|UniversityStudent(Name, Age, Sex, “CS”)| 4,000

|UniversityStudent(Name, Age, “M”, “CS”)| 2,000

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Greedy Algorithm

•Based on Bound-Is-Easier heuristic

•Gives importance to access costs keeping the Internet scenario

•Maintains a list of feasible binding patterns

•Views sources/binding patterns as either High Traffic Binding Pattern [HTBP] or Low Traffic

•Attempts are made at each iteration to access the most general feasible binding pattern if not in HTBP

•Sentinel checks ensure the algorithm proceeds to completion

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Join Ordering Strategies

Bound is easierUniversityStudent(Name, Age, Sex, Dept)Student(Name, Id)

Greedy AlgorithmStudent(Name, Id, Dept)UniversityStudent(Age, Sex, Dept)

System RStatic query optimization algorithm

Exhaustive searchDynamic programming approachRetains candidate trees with smallest cost and prunes others

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Left linear

Bushy

Shortcomings of System R

Binding restrictions not taken care of

Bushy trees not considered

R4

R3

R2R1

⋈ R1 R2 R3 R4

⋈⋈

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

ISR Algo

Internet System R Algorithm

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Internet System R

•Update bindings obtained from previous level of subplans

•Search all types of trees (left linear, bushy & right linear)

•Use full set of statistics to estimate sizes

•Preserves graceful degradation property

•Trade off: Planning vs. execution time

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

AlgorithmINPUTS S [1..m]: Array of all subgoals expanded w.r.t binding patterns; Associated data structure along with above which will help calculate costs;  Initialize NODE with PP = nil; Bindings = {φ}; Cost=0.

IF S has a corresponding BestPlan return the corresponding join order ENDIF

REPEAT FOR i = 1 TO number of feasible leaf nodes

FOR j = 1 TO |Q|Ci DO

LET LeftSubGoal = jth element in |Q|Ci

LET RightSubGoal = S - LeftSubGoal Recursively call this algorithm with LeftSubGoal and RightSubgoal

CurPlan = Make a new plan by joining the above resultant plans IF it has a lower cost than current BestPlan THEN

update BestPlan ENDIF NEXT j NEXT i UNTIL no child nodes are generated in an entire iteration return join order of BestPlanEND.

Perform feasibility

check

Bushy trees Prunin

g

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

ExampleISR (Student, Books, Borrow, Lost)

ISR (Student, Books, Borrow) ⋈ ISR (Lost)ISR (Student, Books, Lost) ⋈ ISR (Borrow)ISR (Student, Lost, Borrow) ⋈ ISR (Books)ISR (Books, Lost, Borrow) ⋈ ISR (Student)ISR (Lost) ⋈ ISR (Student, Books, Borrow)ISR (Borrow) ⋈ ISR (Student, Books, Lost)ISR (Books) ⋈ ISR (Student, Lost, Borrow)ISR (Student) ⋈ ISR (Books, Lost, Borrow)ISR (Student, Books) ⋈ ISR (Lost Borrow)ISR (Student, Lost) ⋈ ISR (Books, Borrow)ISR (Student, Borrow) ⋈ ISR (Books, Lost)ISR (Lost, Borrow) ⋈ ISR (Student, Books)ISR (Books, Borrow) ⋈ ISR (Student, Lost)ISR (Books, Lost) ⋈ ISR (Student, Borrow)

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Feasiblity Check

Books (isbnb, title, author, publisher, price, pages)

Student (idb, firstName, lastName)Borrow (studentId, isbn, dateIssued)Lost (isbn)

ISR (Student, Books) ⋈ ISR (Lost, Borrow)

University (Intranet)

Administration

Student

Library (Machine)

Borrow

Information Gatherer

Lost

Books

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Pruning

PlanCost

(Student ⋈ Borrow) ⋈ (Books ⋈ Lost) 2100

(Books ⋈ Lost) ⋈ (Student ⋈ Borrow) 2800Prune

d

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Impl & Conclusion

Implementation & Conclusion

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Implementation

• Java 2 on Sun Solaris

• Simulated sources with variable statistics

• Measured time independent data

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Experiments• How important is access time?

• How big is the search space? How much is the overhead?

• Measure tradeoff between planning and execution time

• What if all statistics are not available?

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Access & Transfer TimesMeasured for an intranet & Internet source on a T1 line

intranetAccess time ~ 92ms Transfer time ~ 25ms/kb

InternetAccess time ~ 4.8s Transfer time ~ 25ms/kb

Access time is a large enough cost worth spending time to optimize

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

ISR vs. SR•Empirical evaluation of search space•Trade off between planning time and execution time

Larger search space Higher

planning cost

More optimal solution

Lower execution cost

Fast processors/slower network Lower

total cost

It is worth exploring a larger search space

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Effect of Binding Patterns•Search space increases with number of sources•Search space is more constrained and small as number of binding restrictions increase

Size of search space is a non-trivial function of the given parameters

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Graceful DegradationIs searching a larger space a good idea when statistics are not fully available?

Yes: Preserves graceful degradation propertyof traditional System R

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Related WorkFlorescu, Levy et al

• Do not consider access costs being important

• Bottom up approach- build partial plans and check which ones lead to complete plans

• Consider planning time important and generate a best first method to produce before the algorithm runs to completion

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Related Work

Kabra and DeWitt

• Generate a seemingly optimal plan

• Difficult to gather statistics

• Run time collection of statistics and modification of plan

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Related Work

Urhan, Franklin et al

• Access costs are most important

• Concentrate on initial delays

• Change plan if delays exceed a limit

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Contributions

• Analysis of importance of considering access costs

• Developed Internet System R Algorithm– Included binding constraints over traditional System R– Increased search space to include bushy trees

• Empirical evaluation of total cost compared to planning and execution costs

• Examined preservation of graceful degradation with a larger search space and partial statistics

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Papers Published• [LKG99] Eric Lambrecht, Subbarao Kambhampati and Senthil

Gnanaprakasam Optimizing Recursive Information Gathering Plans. In Proceedings of the IJCAI-99.

• [KG99] Subbarao Kambhampati and Senthil Gnanaprakasam. Optimizing source-call ordering in information gathering plans. Proceedings of the IJCAI-99 Workshop on Intelligent Information Integration.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Thank You

Questions?