Senthil Gnanaprakasam System R Based Query Execution Optimization for Internet Information...

44
Senthil Gnanaprakasam http://tsangpo.eas.asu.edu System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao Kambhampati Dr Chitta Baral Dr Susan D Urban MS Thesis Defense
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    1

Transcript of Senthil Gnanaprakasam System R Based Query Execution Optimization for Internet Information...

Page 1: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

Senthil Gnanaprakasam

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization for Internet Information

Gathering

MS Committee

Dr Subbarao KambhampatiDr Chitta Baral

Dr Susan D Urban

MS Thesis Defense

Page 2: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Organization

Internet Information Gathering

Internet databases

Join ordering issues

Current methods and algorithms

Internet System R Algorithm

Implementation and conclusion

Page 3: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Internet Information Gathering

Internet Information Gathering

Page 4: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Need to order pizzas!

Page 5: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Need to order pizzas!

Page 6: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Need to order pizzas!

Page 7: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Need to order pizzas!

Page 8: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Need to order pizzas!

Page 9: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Need to order pizzas!

Information Gatherer

Other Sources

Page 10: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Information Integration

Uniform query interfaceUses mediated schema/virtual relationsSource descriptions & statistics

Query rewriting

Query plan optimization

Query execution engine

Wrapper

Wrapper

Wrapper

Global Data Model

Page 11: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Query Rewriting

• Sound: Does all data returned satisfy given query?

• Complete: Does the query return all possible sound results?

Page 12: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Query Plan Optimization

• Subsumption of sources

• Quality of data

• Cost of accessing sources

• Ordering for optimized cost

Duplicate sources

Pay-per-use,Network

costs

Page 13: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Internet Databases

Internet Databases

Page 14: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Types of internet databases

Form interfaced database

Text database

Intranet databases

1

2

3

Page 15: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Source Statistics

Access and transfer time vary widely with

• Type of source: local, intranet, Internet

• Time of the day

• Number and speed of servers

• Reliability of connection

Page 16: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Binding Constraints

What attributes can be bound?

Books [isbn, title, author, publisher, price, pagesf]

YellowPages (lastNamef, firstName, zip, phoneb)

YellowPages (lastNameb, firstName, zip, phonef)

Page 17: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Join Ordering Issues

Page 18: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Importance of join ordering

P ⋈ Q

a + 100t + 100a + 10t

Q ⋈ P

a + 10t + 10a + 100t

Cost(P⋈Q): a + 100t + 100a + 10t

Cost(Q⋈P): a + 10t + 10a + 100t

90aTraditional: Hard disk seek time

Internet source access

time

Page 19: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Internet Information GatheringUniversity (Intranet)

Administration

Student

Library (Machine)

Borrow

Information Gatherer

Lost

Books

Page 20: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Schema

• Books (isbnb, title, author, publisher, price, pages)

• Student (idb, firstName, lastName)• Borrow (studentId, isbn, dateIssued)• Lost (isbn)

• SELECT *• FROM Student, Books, Lost, Borrow• WHERE

Borrow.isbn=Lost.isbn AND Books.isbn=Lost.isbn AND Borrow.id=Student.id

Page 21: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Current Methods & Algos

Current Methods & Algorithms

Page 22: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Bound Is Easier

Binding values to attributes produces lesser number of results

•Valid heuristic in absence of source statistics

Example:

|UniversityStudent(Name, Age, Sex, Dept)| 50,000

|UniversityStudent(Name, Age, Sex, “CS”)| 4,000

|UniversityStudent(Name, Age, “M”, “CS”)| 2,000

Page 23: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Greedy Algorithm

•Based on Bound-Is-Easier heuristic

•Gives importance to access costs keeping the Internet scenario

•Maintains a list of feasible binding patterns

•Views sources/binding patterns as either High Traffic Binding Pattern [HTBP] or Low Traffic

•Attempts are made at each iteration to access the most general feasible binding pattern if not in HTBP

•Sentinel checks ensure the algorithm proceeds to completion

Page 24: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Join Ordering Strategies

Bound is easierUniversityStudent(Name, Age, Sex, Dept)Student(Name, Id)

Greedy AlgorithmStudent(Name, Id, Dept)UniversityStudent(Age, Sex, Dept)

System RStatic query optimization algorithm

Exhaustive searchDynamic programming approachRetains candidate trees with smallest cost and prunes others

Page 25: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Left linear

Bushy

Shortcomings of System R

Binding restrictions not taken care of

Bushy trees not considered

R4

R3

R2R1

⋈ R1 R2 R3 R4

⋈⋈

Page 26: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

ISR Algo

Internet System R Algorithm

Page 27: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Internet System R

•Update bindings obtained from previous level of subplans

•Search all types of trees (left linear, bushy & right linear)

•Use full set of statistics to estimate sizes

•Preserves graceful degradation property

•Trade off: Planning vs. execution time

Page 28: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

AlgorithmINPUTS S [1..m]: Array of all subgoals expanded w.r.t binding patterns; Associated data structure along with above which will help calculate costs;  Initialize NODE with PP = nil; Bindings = {φ}; Cost=0.

IF S has a corresponding BestPlan return the corresponding join order ENDIF

REPEAT FOR i = 1 TO number of feasible leaf nodes

FOR j = 1 TO |Q|Ci DO

LET LeftSubGoal = jth element in |Q|Ci

LET RightSubGoal = S - LeftSubGoal Recursively call this algorithm with LeftSubGoal and RightSubgoal

CurPlan = Make a new plan by joining the above resultant plans IF it has a lower cost than current BestPlan THEN

update BestPlan ENDIF NEXT j NEXT i UNTIL no child nodes are generated in an entire iteration return join order of BestPlanEND.

Perform feasibility

check

Bushy trees Prunin

g

Page 29: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

ExampleISR (Student, Books, Borrow, Lost)

ISR (Student, Books, Borrow) ⋈ ISR (Lost)ISR (Student, Books, Lost) ⋈ ISR (Borrow)ISR (Student, Lost, Borrow) ⋈ ISR (Books)ISR (Books, Lost, Borrow) ⋈ ISR (Student)ISR (Lost) ⋈ ISR (Student, Books, Borrow)ISR (Borrow) ⋈ ISR (Student, Books, Lost)ISR (Books) ⋈ ISR (Student, Lost, Borrow)ISR (Student) ⋈ ISR (Books, Lost, Borrow)ISR (Student, Books) ⋈ ISR (Lost Borrow)ISR (Student, Lost) ⋈ ISR (Books, Borrow)ISR (Student, Borrow) ⋈ ISR (Books, Lost)ISR (Lost, Borrow) ⋈ ISR (Student, Books)ISR (Books, Borrow) ⋈ ISR (Student, Lost)ISR (Books, Lost) ⋈ ISR (Student, Borrow)

Page 30: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Feasiblity Check

Books (isbnb, title, author, publisher, price, pages)

Student (idb, firstName, lastName)Borrow (studentId, isbn, dateIssued)Lost (isbn)

ISR (Student, Books) ⋈ ISR (Lost, Borrow)

University (Intranet)

Administration

Student

Library (Machine)

Borrow

Information Gatherer

Lost

Books

Page 31: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Pruning

PlanCost

(Student ⋈ Borrow) ⋈ (Books ⋈ Lost) 2100

(Books ⋈ Lost) ⋈ (Student ⋈ Borrow) 2800Prune

d

Page 32: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Impl & Conclusion

Implementation & Conclusion

Page 33: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Implementation

• Java 2 on Sun Solaris

• Simulated sources with variable statistics

• Measured time independent data

Page 34: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Experiments• How important is access time?

• How big is the search space? How much is the overhead?

• Measure tradeoff between planning and execution time

• What if all statistics are not available?

Page 35: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Access & Transfer TimesMeasured for an intranet & Internet source on a T1 line

intranetAccess time ~ 92ms Transfer time ~ 25ms/kb

InternetAccess time ~ 4.8s Transfer time ~ 25ms/kb

Access time is a large enough cost worth spending time to optimize

Page 36: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

ISR vs. SR•Empirical evaluation of search space•Trade off between planning time and execution time

Larger search space Higher

planning cost

More optimal solution

Lower execution cost

Fast processors/slower network Lower

total cost

It is worth exploring a larger search space

Page 37: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Effect of Binding Patterns•Search space increases with number of sources•Search space is more constrained and small as number of binding restrictions increase

Size of search space is a non-trivial function of the given parameters

Page 38: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Graceful DegradationIs searching a larger space a good idea when statistics are not fully available?

Yes: Preserves graceful degradation propertyof traditional System R

Page 39: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Related WorkFlorescu, Levy et al

• Do not consider access costs being important

• Bottom up approach- build partial plans and check which ones lead to complete plans

• Consider planning time important and generate a best first method to produce before the algorithm runs to completion

Page 40: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Related Work

Kabra and DeWitt

• Generate a seemingly optimal plan

• Difficult to gather statistics

• Run time collection of statistics and modification of plan

Page 41: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Related Work

Urhan, Franklin et al

• Access costs are most important

• Concentrate on initial delays

• Change plan if delays exceed a limit

Page 42: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Contributions

• Analysis of importance of considering access costs

• Developed Internet System R Algorithm– Included binding constraints over traditional System R– Increased search space to include bushy trees

• Empirical evaluation of total cost compared to planning and execution costs

• Examined preservation of graceful degradation with a larger search space and partial statistics

Page 43: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Papers Published• [LKG99] Eric Lambrecht, Subbarao Kambhampati and Senthil

Gnanaprakasam Optimizing Recursive Information Gathering Plans. In Proceedings of the IJCAI-99.

• [KG99] Subbarao Kambhampati and Senthil Gnanaprakasam. Optimizing source-call ordering in information gathering plans. Proceedings of the IJCAI-99 Workshop on Intelligent Information Integration.

Page 44: Senthil Gnanaprakasam  System R Based Query Execution Optimization for Internet Information Gathering MS Committee Dr Subbarao.

http://tsangpo.eas.asu.edu

System R Based Query Execution Optimization For Internet Information Gathering

Thank You

Questions?