Senthil Gnanaprakasam System R Based Query Execution Optimization for Internet Information...
-
date post
19-Dec-2015 -
Category
Documents
-
view
213 -
download
1
Transcript of Senthil Gnanaprakasam System R Based Query Execution Optimization for Internet Information...
Senthil Gnanaprakasam
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization for Internet Information
Gathering
MS Committee
Dr Subbarao KambhampatiDr Chitta Baral
Dr Susan D Urban
MS Thesis Defense
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Organization
Internet Information Gathering
Internet databases
Join ordering issues
Current methods and algorithms
Internet System R Algorithm
Implementation and conclusion
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Internet Information Gathering
Internet Information Gathering
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Need to order pizzas!
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Need to order pizzas!
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Need to order pizzas!
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Need to order pizzas!
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Need to order pizzas!
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Need to order pizzas!
Information Gatherer
Other Sources
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Information Integration
Uniform query interfaceUses mediated schema/virtual relationsSource descriptions & statistics
Query rewriting
Query plan optimization
Query execution engine
Wrapper
Wrapper
Wrapper
Global Data Model
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Query Rewriting
• Sound: Does all data returned satisfy given query?
• Complete: Does the query return all possible sound results?
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Query Plan Optimization
• Subsumption of sources
• Quality of data
• Cost of accessing sources
• Ordering for optimized cost
Duplicate sources
Pay-per-use,Network
costs
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Internet Databases
Internet Databases
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Types of internet databases
Form interfaced database
Text database
Intranet databases
1
2
3
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Source Statistics
Access and transfer time vary widely with
• Type of source: local, intranet, Internet
• Time of the day
• Number and speed of servers
• Reliability of connection
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Binding Constraints
What attributes can be bound?
Books [isbn, title, author, publisher, price, pagesf]
YellowPages (lastNamef, firstName, zip, phoneb)
YellowPages (lastNameb, firstName, zip, phonef)
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Join Ordering Issues
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Importance of join ordering
P ⋈ Q
a + 100t + 100a + 10t
Q ⋈ P
a + 10t + 10a + 100t
Cost(P⋈Q): a + 100t + 100a + 10t
Cost(Q⋈P): a + 10t + 10a + 100t
90aTraditional: Hard disk seek time
Internet source access
time
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Internet Information GatheringUniversity (Intranet)
Administration
Student
Library (Machine)
Borrow
Information Gatherer
Lost
Books
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Schema
• Books (isbnb, title, author, publisher, price, pages)
• Student (idb, firstName, lastName)• Borrow (studentId, isbn, dateIssued)• Lost (isbn)
• SELECT *• FROM Student, Books, Lost, Borrow• WHERE
Borrow.isbn=Lost.isbn AND Books.isbn=Lost.isbn AND Borrow.id=Student.id
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Current Methods & Algos
Current Methods & Algorithms
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Bound Is Easier
Binding values to attributes produces lesser number of results
•Valid heuristic in absence of source statistics
Example:
|UniversityStudent(Name, Age, Sex, Dept)| 50,000
|UniversityStudent(Name, Age, Sex, “CS”)| 4,000
|UniversityStudent(Name, Age, “M”, “CS”)| 2,000
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Greedy Algorithm
•Based on Bound-Is-Easier heuristic
•Gives importance to access costs keeping the Internet scenario
•Maintains a list of feasible binding patterns
•Views sources/binding patterns as either High Traffic Binding Pattern [HTBP] or Low Traffic
•Attempts are made at each iteration to access the most general feasible binding pattern if not in HTBP
•Sentinel checks ensure the algorithm proceeds to completion
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Join Ordering Strategies
Bound is easierUniversityStudent(Name, Age, Sex, Dept)Student(Name, Id)
Greedy AlgorithmStudent(Name, Id, Dept)UniversityStudent(Age, Sex, Dept)
System RStatic query optimization algorithm
Exhaustive searchDynamic programming approachRetains candidate trees with smallest cost and prunes others
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Left linear
Bushy
Shortcomings of System R
Binding restrictions not taken care of
Bushy trees not considered
R4
R3
R2R1
⋈
⋈
⋈ R1 R2 R3 R4
⋈⋈
⋈
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
ISR Algo
Internet System R Algorithm
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Internet System R
•Update bindings obtained from previous level of subplans
•Search all types of trees (left linear, bushy & right linear)
•Use full set of statistics to estimate sizes
•Preserves graceful degradation property
•Trade off: Planning vs. execution time
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
AlgorithmINPUTS S [1..m]: Array of all subgoals expanded w.r.t binding patterns; Associated data structure along with above which will help calculate costs; Initialize NODE with PP = nil; Bindings = {φ}; Cost=0.
IF S has a corresponding BestPlan return the corresponding join order ENDIF
REPEAT FOR i = 1 TO number of feasible leaf nodes
FOR j = 1 TO |Q|Ci DO
LET LeftSubGoal = jth element in |Q|Ci
LET RightSubGoal = S - LeftSubGoal Recursively call this algorithm with LeftSubGoal and RightSubgoal
CurPlan = Make a new plan by joining the above resultant plans IF it has a lower cost than current BestPlan THEN
update BestPlan ENDIF NEXT j NEXT i UNTIL no child nodes are generated in an entire iteration return join order of BestPlanEND.
Perform feasibility
check
Bushy trees Prunin
g
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
ExampleISR (Student, Books, Borrow, Lost)
ISR (Student, Books, Borrow) ⋈ ISR (Lost)ISR (Student, Books, Lost) ⋈ ISR (Borrow)ISR (Student, Lost, Borrow) ⋈ ISR (Books)ISR (Books, Lost, Borrow) ⋈ ISR (Student)ISR (Lost) ⋈ ISR (Student, Books, Borrow)ISR (Borrow) ⋈ ISR (Student, Books, Lost)ISR (Books) ⋈ ISR (Student, Lost, Borrow)ISR (Student) ⋈ ISR (Books, Lost, Borrow)ISR (Student, Books) ⋈ ISR (Lost Borrow)ISR (Student, Lost) ⋈ ISR (Books, Borrow)ISR (Student, Borrow) ⋈ ISR (Books, Lost)ISR (Lost, Borrow) ⋈ ISR (Student, Books)ISR (Books, Borrow) ⋈ ISR (Student, Lost)ISR (Books, Lost) ⋈ ISR (Student, Borrow)
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Feasiblity Check
Books (isbnb, title, author, publisher, price, pages)
Student (idb, firstName, lastName)Borrow (studentId, isbn, dateIssued)Lost (isbn)
ISR (Student, Books) ⋈ ISR (Lost, Borrow)
University (Intranet)
Administration
Student
Library (Machine)
Borrow
Information Gatherer
Lost
Books
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Pruning
PlanCost
(Student ⋈ Borrow) ⋈ (Books ⋈ Lost) 2100
(Books ⋈ Lost) ⋈ (Student ⋈ Borrow) 2800Prune
d
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Impl & Conclusion
Implementation & Conclusion
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Implementation
• Java 2 on Sun Solaris
• Simulated sources with variable statistics
• Measured time independent data
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Experiments• How important is access time?
• How big is the search space? How much is the overhead?
• Measure tradeoff between planning and execution time
• What if all statistics are not available?
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Access & Transfer TimesMeasured for an intranet & Internet source on a T1 line
intranetAccess time ~ 92ms Transfer time ~ 25ms/kb
InternetAccess time ~ 4.8s Transfer time ~ 25ms/kb
Access time is a large enough cost worth spending time to optimize
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
ISR vs. SR•Empirical evaluation of search space•Trade off between planning time and execution time
Larger search space Higher
planning cost
More optimal solution
Lower execution cost
Fast processors/slower network Lower
total cost
It is worth exploring a larger search space
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Effect of Binding Patterns•Search space increases with number of sources•Search space is more constrained and small as number of binding restrictions increase
Size of search space is a non-trivial function of the given parameters
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Graceful DegradationIs searching a larger space a good idea when statistics are not fully available?
Yes: Preserves graceful degradation propertyof traditional System R
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Related WorkFlorescu, Levy et al
• Do not consider access costs being important
• Bottom up approach- build partial plans and check which ones lead to complete plans
• Consider planning time important and generate a best first method to produce before the algorithm runs to completion
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Related Work
Kabra and DeWitt
• Generate a seemingly optimal plan
• Difficult to gather statistics
• Run time collection of statistics and modification of plan
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Related Work
Urhan, Franklin et al
• Access costs are most important
• Concentrate on initial delays
• Change plan if delays exceed a limit
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Contributions
• Analysis of importance of considering access costs
• Developed Internet System R Algorithm– Included binding constraints over traditional System R– Increased search space to include bushy trees
• Empirical evaluation of total cost compared to planning and execution costs
• Examined preservation of graceful degradation with a larger search space and partial statistics
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Papers Published• [LKG99] Eric Lambrecht, Subbarao Kambhampati and Senthil
Gnanaprakasam Optimizing Recursive Information Gathering Plans. In Proceedings of the IJCAI-99.
• [KG99] Subbarao Kambhampati and Senthil Gnanaprakasam. Optimizing source-call ordering in information gathering plans. Proceedings of the IJCAI-99 Workshop on Intelligent Information Integration.
http://tsangpo.eas.asu.edu
System R Based Query Execution Optimization For Internet Information Gathering
Thank You
Questions?