Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software...
-
Upload
gwenda-golden -
Category
Documents
-
view
219 -
download
0
Transcript of Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software...
![Page 1: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/1.jpg)
Querying Large Databases
Rukmini Kaushik
![Page 2: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/2.jpg)
Purpose
• Research for efficient algorithms and software architectures of query engines.
![Page 3: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/3.jpg)
Query Execution Engine Architecture
• Query processing algorithms – physical algebra
• Data Model – logical algebra
![Page 4: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/4.jpg)
Sorting & Hashing
• Both are memory intensive.
• Memory Concerns
- Merge Efficiency & memory
management.
- Hash table overflow
![Page 5: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/5.jpg)
Aggregation and Duplicate Removal
• Aggregation Concept
Describes a set of objects with one value.• Algorithms
Three Types
- Nested Loops
- Sorting
- Hashing
![Page 6: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/6.jpg)
Aggregation & Duplicate Removal
• Nested Loops - Easiest of the three - Doesn’t work well for large inputs• Sorting - Sort for common elements which results
in a simple duplicate removal. - Should remove duplicates as early as
possible.
![Page 7: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/7.jpg)
Aggregation & Duplicate Removal
• Hashing
- Hash on group attributes.
- Can perform duplicate removal when creating hash table.
• Algorithm Analysis
Sorting and hashing functions are logarithmic with input size
![Page 8: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/8.jpg)
Complex Query Execution Plan
• Purpose
- To schedule a query with several operations optimally
• Ideas
- Right-deep plans
- Left-deep plans
![Page 9: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/9.jpg)
Complex Query Execution Plan
• Prediction
- Use a decision tree of sub-plans
- Done by using choose-plan operators
• Major Concern
- Optimal resource allocation
![Page 10: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/10.jpg)
Parallel Query Execution Mechanism
• Goal
Obtain speed-up & scale-up
• Speed-up
- Uses extra hardware for constant size problem
- Linear speed-up is optimal
- Can be expressed as parallel efficiency
![Page 11: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/11.jpg)
Parallel Query Execution Mechanism
• Scale-up
- Uses same resources with altered problem size
- Can be expressed as parallel efficiency.
![Page 12: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/12.jpg)
Parallel Query Execution Mechanism
• Parallel Vs Distributed Systems
• Distributed
- Locally Autonomous
- Also uses Parallelism
![Page 13: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/13.jpg)
Parallel Query Execution Mechanism
• Parallel
- One center of control
- Three types
Shared memory
Shared Disk
Distributed Memory
![Page 14: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/14.jpg)
Parallel Query Execution Mechanism
• Three forms of parallelism
- Inter Query: Servicing multiple requests at the same time
- Inter Operator: Pipelining
- Intra Operator: Execute a single operator in multiple processors
![Page 15: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/15.jpg)
Parallel Query Execution Mechanism
• Implementation
Bracket Models
Operator Models
• Bracket Model
Goal: Generic process template that receives and sends data and performs one operation at a time
![Page 16: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/16.jpg)
Parallel Query Execution Mechanism
Number of inputs is limited to two
Can be run in parallel by having many templates in the system running simultaneously.
• Operator Model
Goal: Insert parallel operators in an ordered plan
![Page 17: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/17.jpg)
Parallel Query Execution Mechanism
• Uses the exchange operator
• Exchange operator
- Does not manipulate data
- Provides capabilities for parallel query processing
- Changes a complex query into a single process
![Page 18: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/18.jpg)
Parallel Algorithms
• Idea: More focus on algorithms and parallel execution
• Parallel selections and updates
- Disk input and output should be made parallel
- Selection: Maintain indices near stored data
- Updates: Use keys for partitioning attributes
![Page 19: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/19.jpg)
Parallel Algorithms
• Parallel Sorting:
-classified by
- number of parallel inputs
- number of parallel outputs
- Algorithms consists of local sort and a data exchange step
![Page 20: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/20.jpg)
Parallel Algorithms
- Major Concern
- Deadlock can be avoided by using range partitioning
- having a sufficient size data exchange buffer
- using a modified sort algorithm
![Page 21: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/21.jpg)
Query Optimization
• Uses the differences between logical and physical aspects
• Must keep track of the properties of the inputs
• Cost models focus on throughput measures
![Page 22: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/22.jpg)
Tuning query performance
• Focus
- Guidelines for improving query performance
• Guidelines for three points of view
- implementor and vendor
- database administrator
- application programmer
![Page 23: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/23.jpg)
Tuning Query Performance
• Implementor
System should support indexing and clustering
Query optimizer should be reliable and accurate
• Administrator
Ensure usage of system facilities
![Page 24: Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.](https://reader030.fdocuments.us/reader030/viewer/2022032612/56649ed05503460f94bdf0dd/html5/thumbnails/24.jpg)
Tuning Query Performance
carefully choose physical database design
provide available and efficient processing resources
• Application Programmer
Provide high level queries