Post on 22-Feb-2016
description
EXTERNAL SORTING ALGORITHMSAND IMPLEMENTATIONS
05011004 Ayhan KARGIN05011027 Ahmet MERAL
Contents External Sorting Needs and Usage Areas External Sorting Algorithms Environments for Implementing External Sorting Algorithms Used Technologies Phases of External Sorting K-Way Merge Sort Multi-Step K-Way Merge Sort Replacement Selection Sort SimpleDB Layered Components of SimpleDB Classes of SimpleDB Query Layer of SimpleDB Relational Algebra, that SimpleDB Supports Relational Algebra Preparatory Work External Sorting on SimpleDB
External Sorting Needs and Usage Areas
DBMS Group By, Join, Order By
Data Warehouse (ETL) Data Mining Data Processing
External Sorting Algorithms
K-Way Merge Sort Multi-Step K-Way Merge Sort Replacement Selection Sort
Environments for ImplementingExternal Sorting Algorithms
MinSQL PosgreSQL SimpleDB
Used Technologies
Java VM Java SE JDBC Java RMI Eclipse
Phases of External Sorting
Run Generation Phase Merge Phase
K-Way Merge Sort
Multi-Step K-Way Merge Sort
Multi-Step K-Way Merge Sort
Replacement Selection
Replacement Selection
Test results depending on stage area size
Note: Main memory is 8x.
Stage Area Disc Access Count Run Time K (Run Count)
1x 3817 0.690 17
2x 420 0.742 8
3x 252 1.049 6
4x 258 1.420 4
5x 266 1.952 4
6x 279 2.333 3
SimpleDB
The Client Side That contains the JDBC interfaces and
implements the JDBC driver The Basic Server
Which provides complete funcionality of DB but ignores efficiency issues
Extensions To the basic server that support
efficient query processing.
Layered Components of SimpleDB
Remote Perform JDBC requests received from clients.
Parse Extract the tables, fields, and predicate mentioned in an SQL statement.
Planner Create an execution strategy for an SQL statement, and translate it to a
relational algebra plan. Query
Implement queries expressed in relational algebra. Metadata
Maintain metadata about the tables in the database, so that its records and fields are accessible.
Layered Components of SimpleDB
Record Provide methods for storing data records in pages
Transaction Support concurrency by restricting page access. Enable recovery by
logging changes to pages. Buffer
Maintain a cache of pages in memory to hold recently-accessed user data.
Log Append log records to the log file, and scan the records in the log file.
File Read and write between file blocks and memory pages.
Classes of SimpleDB
Classes of SimpleDB
Classes of SimpleDB
Query Layer of SimpleDB
Relational Algebra Select Project Product Sort
Relational Algebra, that SimpleDB Supports
Relational Algebra
SelectProjectProduct
Sort
SQL
WhereSelect
JoinOrder By
Relational Algebra
Select SId, SName, DName from STUDENT, DEPT where
MajorId=DId and DName='math'
Q1: Product (STUDENT, DEPT)Q2: Select (Q2, MajorId=DId )
Q3: Select (Q2, DName='math')Q4: Project (Q3, {SId, SName,
DName })
Relational Algebra
Preparatory Work We have to introduce order by operator (in SQL) to parser layer of
SimpleDB. The columns, which will be sort, must be presenced in QueryData
class in structural form. If sorting will be happened, sorting plans must be called by
BasicQueryPlanner class. Calculating number of random accesses per file on FileMgr layer. Calculating duration of transaction with adding start time and end
time to Transaction class. Calculating number of unsorted records on each transaction. We may have to compare, copy and exchange congeneric records
on different scans. So, RecordExchange class is written for these purposes.
StepCalculator class is written. It splits any number to two close integer multipliers. For example; from 29, 6 and 5; from 49, 7 and 7; from 50, 8 and 7; etc.
External Sorting on SimpleDB
Materialize Plan Edward Sciore’s Merge Sort Plan&Scan K-Way Merge Sort Plan&Scan Multi-Step K-Way Merge Sort Plan&Scan Replacement Selection Sort Plan&Scan Multi-Step Replacement Selection Sort
Plan&Scan
Algorithms Choice Dependency