Adaptive Data Structures
description
Transcript of Adaptive Data Structures
![Page 1: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/1.jpg)
LOGO
Simon Zeltser
Towards Declarative Queries onAdaptive Data Structures
Based on the article by Nicolas Bruno and Pablo Castro
![Page 2: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/2.jpg)
Seminar in Database Systems Technion
Contents
Introduction1
LINQ on Rich Data Structures2
LINQ Query Optimization3
Conclusions and Discussion4
![Page 3: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/3.jpg)
Introduction
THE PROBLEM There is an increasing number of applications
that need to manage data outside the DBMS Need for a solution to simplify the interaction
between objects and data sources Current solutions lack rich declarative query
mechanismTHE NEED
Unified way to query various data sourcesTHE SOLUTION
LINQ (Language Integrated Query)
Seminar in Database Systems Technion
![Page 4: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/4.jpg)
IntroductionLINQ : Microsoft.NET 3.5 Solution
Accessing multiple data sources via the same API
Technology integrated into the programming language
Supports operations: Traversal – grouping, joins Filter – which rows Projection –which columns
var graduates = from student in students where student.Degree = “Graduate”
orderby student.Name, student.Gender, student.Age
select student;
BUT… The default implementation is simplistic Appropriate for small ad-hoc structures in
memory
Seminar in Database Systems Technion
![Page 5: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/5.jpg)
Introduction
THE GOAL OF THIS SESSION Introduce LINQ key principles Show model of customization of LINQ’s
Execution Model on Rich Data Structures Evaluate the results
Seminar in Database Systems Technion
![Page 6: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/6.jpg)
LINQ – Enabled Data Sources
LINQ – High Level Architecture
Seminar in Database Systems Technion
C# 3.0 Visual Basic Other Languages…
LINQ To Objects
LINQ To Datasets
LINQ To XML
<xml>
Objects Databases XML
.NET Language Integrated Query (LINQ)
LINQ To SQL
LINQ To Entities
![Page 7: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/7.jpg)
Compare two approachesIterationList<String> matches = new
List<String>();// Find the matchesforeach (string item in data) {
if (item.StartsWith("Eric")) {matches.Add(item);
}}
// Sort the matchesmatches.Sort();// Print out the matchesforeach (string item in matches)}
Console.WriteLine(item);{
LINQ// Find and sort matchesvar matches = from n in data
where n.StartsWith("Eric")orderby nselect n;
// Print out the matchesforeach (var match in
matches)}
Console.WriteLine(match);{
Seminar in Database Systems Technion
![Page 8: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/8.jpg)
Language IntegrationLambda
ExpressionsFunctionint StringLength(String s) { return s.Length();{
QuerySyntax
var matches = from n in data where n.StartsWith("Eric") orderby n select n;
ExtensionMethods
public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
Anonymous
Types
var name = "Eric";var age = 43;var person = new { Name = "Eric", Age = 43 };var names = new [] {"Eric", "Ryan", "Paul" };foreach (var item in names)
Lambda Expression
s => s.Length();
var matches = data .Where(n => n.StartsWith("Eric")) .OrderBy(n => n) .Select(n => n)
Seminar in Database Systems Technion
![Page 9: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/9.jpg)
LINQ - Example// Retrieve all CS students with more // than 105 pointsvar query =
from stud in studentswhere ( stud.Faculty == “CS” && stud.Points > 105)orderby stud.Points descendingselect new { Details = stud.Name +
“:” + stud.Phone };
// Iterate over resultsforeach(var student in query) {
Console.WriteLine(student.Details);}
Seminar in Database Systems Technion
Lambda Expressions
QuerySyntax
ExtensionMethods
AnonymousTypes
![Page 10: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/10.jpg)
Customizing LINQ Execution ModelEXPRESSION TREES
LINQ represents queries as in-memory abstract syntax tree
Query description and implementation are not tied together
THE PROBLEM The default implementation of the operations uses
fixed, general purpose algorithms
SUGGESTED SOLUTION Change how the query is executed without changing
how it’s expressed Analyze alternative implementations of a given query
and dynamically choose the most appropriate version depending on the context.Seminar in Database Systems Technion
1 5 7
+
*
![Page 11: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/11.jpg)
Customizing LINQ Execution Model (2)
PROBLEM EXAMPLE WHERE operator is implemented by performing a
sequential scan over the input and evaluating the selection predicate on each tuple!
Seminar in Database Systems Technion
var q = A.Where(x=>x<5).Select(x=>2*x);
int[] A = {1, 2, 3, 10, 20, 30};var q = from x in A
where x < 5 select 2*x;
foreach(int i in q)Console.WriteLine(i);
IEnumerable<int> res = new List<int>();foreach(int a in A)
if (AF1(a)) res.Add(AF2(a));return res;
IEnumerable<int> q = Enumerable.Project( Enumerable.Where(A, AF1), AF2);
bool AF1(int x) { return x<5; }int AF2(int x) { return 2*x; }
1
2
3Query Implementation:
![Page 12: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/12.jpg)
Rich Data Structures - DataSet
DataSet objectDataTable object
DataRow
DataColumn
DataTable object
UniqueConstraint
UniqueConstraint
ForeignKeyConstraint
In-memory cache of data Typically populated from a database Supports indexing of DataColumns
via DataViews
Seminar in Database Systems Technion
We will use LINQ on DataSet for demonstrating query optimization techniques
![Page 13: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/13.jpg)
LINQ on Rich Data StructuresEnable LINQ to work over DataSets.EXAMPLE
Given R and S – two DataTables
Seminar in Database Systems Technion
from r in R.AsEnumerable()join s in S.AsEnumerable()
on r.Field<int>(“x”) equals s.Field<int>(“y”)
select new { a = r.Field<int>(“a”), b = s.Field<int>(“b”) };
LINQ on DataSet
Standard C# Code
Interm. Language
Expression Tree
OptimizedExpression
Tree
Interm.Language
DataSetSelf-tuningState
Compile and run-time phases on an implementation of our prototype
Compile Time Run Time
![Page 14: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/14.jpg)
Expression Tree Optimizer
Seminar in Database Systems Technion
Cost ModelQuery Cost Estimator
StatisticsManager
Self Tuning OrganizerQuery
AnalyzerIndex
ReorganizerOscillationManager
Our solution will be built according to the following architecture
![Page 15: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/15.jpg)
Query Cost Estimator
Seminar in Database Systems Technion
Cost ModelStatisticsManager
Self Tuning OrganizerQuery
AnalyzerIndex
ReorganizerOscillationManager
Query Cost Estimator
![Page 16: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/16.jpg)
Query Estimation - Cost Model
Follow traditional database approach: COST: {execution plans} -> [expected
execution time] Relies on:
a set of statistics maintained in DataTables for some of its columns
formulas to estimate selectivity of predicates and cardinality of sub-plans
formulas to estimate the expected costs of query execution for every operator
Seminar in Database Systems Technion
![Page 17: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/17.jpg)
Cardinality EstimationReturns an approximate number of
rows that each operator in a query plan would output To reduce the overhead, we will use only
these statistical estimators: maxVal – maximum number of distinct
values minVal – minimum number of distinct
values dVal – number of distinct values in a
column If statistics are unavailable, rely on “magic
numbers” until automatically creation of statistics
Seminar in Database Systems Technion
![Page 18: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/18.jpg)
Predicate Selectivity EstimationLet: σp(T ) be an arbitrary
expression.The cardinality of T is defined:
Card(σp(T )) =sel(p)· Under this definition we define:
COSTT(Execution Plan) = Σ (COST(p))EXAMPLE: Consider full table scan of
table T): COST(T) = Card(T) * MEM_ACCESS_COST
Seminar in Database Systems Technion
Selectivity Estimation Predicate
sel(p1)· sel(p2) sel(p1 ^ p2)sel(p1) + sel(p2)−sel(p1 ^ p2) sel(p1 v p2)(dVal(c))-1 sel(c = c0)
sel(c0<=c<=c1)
For each p in {operators of T}
Average Cost Of Memory Access
![Page 19: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/19.jpg)
Predicate Selectivity EstimationLet: σp(T ) be an arbitrary
expression.The cardinality of T is defined:
Card(σp(T )) =sel(p)· Under this definition we define:
COSTT(Execution Plan) = Σ (COST(p))EXAMPLE: Consider full table scan of
table T: COST(T) = Card(T) * MEM_ACCESS_COST
Seminar in Database Systems Technion
Selectivity Estimation Predicate
sel(p1)· sel(p2) sel(p1 ^ p2)sel(p1) + sel(p2)−sel(p1 ^ p2) sel(p1 v p2)(dVal(c))-1 sel(c = c0)
sel(c0<=c<=c1)
For each p in {operators of T}
Average Cost Of Memory Access
c0minVal(c)
maxVal(c) Intuition:We model sel(co<=c<=c1) as the probability to get a “c” value in interval [c0, c1] among all possible “c” values
c1
c
![Page 20: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/20.jpg)
Predicate Selectivity EstimationLet: σp(T ) be an arbitrary
expression.The cardinality of T is defined:
Card(σp(T )) =sel(p)· Under this definition we define:
COSTT(Execution Plan) = Σ (COST(p))EXAMPLE: Consider full table scan of
table T): COST(T) = Card(T) * MEM_ACCESS_COST
Seminar in Database Systems Technion
Selectivity Estimation Predicate
sel(p1)· sel(p2) sel(p1 ^ p2)sel(p1) + sel(p2)−sel(p1 ^ p2) sel(p1 v p2)(dVal(c))-1 sel(c = c0)
sel(c0<=c<=c1)
For each p in {operators of T}
Average Cost Of Memory Access
Consider now a join predicate: T1 c1=c2T2
Card(T1 c1=c2 T2)=
)()(*
)()(*))(),(min(
2
2
1
121
cdValTCard
cdValTCardcdValcdVal
![Page 21: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/21.jpg)
Query Analyzer
Seminar in Database Systems Technion
Cost ModelStatisticsManager
Self Tuning OrganizerQuery
AnalyzerIndex
ReorganizerOscillationManager
Query Cost Estimator
![Page 22: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/22.jpg)
Execution AlternativesRely on indexes on DataColumns
when possible Example: σa=7∧(b+c)<20
Seminar in Database Systems Technion
Full Table Scan a=7 b+c < 20
5
3 7
2 4
Index on “a”column
c b a3 1 776 3 232 34 58 14 79 9 423 4 73 1 3
c b a3 1 776 3 232 34 58 14 79 9 923 4 73 1 8
c b a3 1 776 3 232 34 58 14 79 9 423 4 73 1 3
c b a3 1 776 3 232 34 58 14 79 9 423 4 73 1 3
c b a3 1 776 3 232 34 58 14 79 9 423 4 73 1 3
Alternative 1: Alternative 2:
![Page 23: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/23.jpg)
Analyzing Execution Plans Global vs. Local Execution Plan –
EXAMPLE:
Seminar in Database Systems Technion
Join
Products Join
Carts FilterCustom
ers
Global Execution PlanLocal Execution Plan
HashJoin? IndexJoin? MergeJoin?
![Page 24: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/24.jpg)
Enumeration Architecture Two phases:
First phase: Join reordering based on estimated cardinalities
Second phase: Choose the best physical implementation for each operator
EXAMPLE: Suppose we analyze JOIN operator. We evaluate the following JOIN implementations:
Hash Join Merge Join (inputs must be sorted in the join
columns) Index Join (index on the inner join column
must be available) Other possible calculation options
Choose the alternative with the smallest cost
Seminar in Database Systems Technion
![Page 25: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/25.jpg)
Query Analysis
Seminar in Database Systems Technion
Cost ModelStatisticsManager
Self Tuning OrganizerQuery
AnalyzerIndex
ReorganizerOscillationManager
Query Cost Estimator
![Page 26: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/26.jpg)
Self Tuning OrganizationWe want to reach the smallest query
execution time. Indexes can be used to speedup query
executionPROBLEM:
It might become problematic to forecast in advance what indexes to build for optimum performance
SOLUTION: Continuous monitoring/tuning component
that addresses the challenge of choosing and building adequate indexes and statistics automatically
Seminar in Database Systems Technion
![Page 27: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/27.jpg)
Self Tuning Organization - ExampleConsider the following execution
plan:
Seminar in Database Systems Technion
The selection predicate Name=“Pam” over Customers DataTable can be improved if an index on Customers(Name) is built
Both hash joins can be improved if indexes I2 and I3 are available, since we can transform hash join into index join* The three sub-plans
enclosed in dotted lines might be improved if suitable indexes were present
![Page 28: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/28.jpg)
Technion
Algorithm for automatic index tuning
Seminar in Database Systems
![Page 29: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/29.jpg)
Index TuningHigh-Level Description:
Identify a good set of candidate indexes that would improve performance if they were available.
Later, when the optimized queries are evaluated, we aggregate the relative benefits of both candidate and existing indexes.
Based on this information, we periodically trigger index creations or deletions, taking into account storage constraints, overall utility of the resulting indexes, and the cost to creating and maintaining them.Seminar in Database Systems Technion
![Page 30: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/30.jpg)
Technion
Algorithm for automatic index tuning
Seminar in Database Systems
![Page 31: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/31.jpg)
Index tuning algorithmNotation:
H – a set of candidate indexes to materialize T – task set for query qi
Ii – either a candidate or an existing index δIi – amount that I would speed up query q
Seminar in Database Systems Technion
Task SetI1, δI1 I2, δI2 In, δIn . . …
H (initially empty)
![Page 32: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/32.jpg)
Technion
Algorithm for automatic index tuning
Seminar in Database Systems
![Page 33: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/33.jpg)
Index tuning algorithmNotation:
ΔI – value maintained for each index I Materialized index – already created one
SELECT query: ΔI = ΔI + δI UPDATE query: ΔI = ΔI – δI
Seminar in Database Systems Technion
Task SetI1, δI1 I2, δI2 In, δIn . . …
H
I1, δI1
I1
![Page 34: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/34.jpg)
Index Tuning Algorithm
The purpose of ΔI:
Seminar in Database Systems Technion
We maintain ΔI on every query evaluation
If the potential aggregated benefit of materializing a candidate index exceeds its creation cost, we should create it, since we gathered enough evidence that the index is useful
![Page 35: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/35.jpg)
Technion
Algorithm for automatic index tuning
Seminar in Database Systems
![Page 36: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/36.jpg)
Index tuning algorithmRemove “bad” indexes phaseNotation:
Δmin – minimum Δ value for index I Δmax – maximum Δ value for index I BI – the cost of creating index I Residual(I) = BI – (Δmax – Δ)
(the “slack” an index has before being deemed “droppable)”
IF (Residual(I)) <= 0) THEN Drop(I) Net-Benefit(I) = (Δ-Δmin)-BI
(the benefit from creating the index)IF (Net-Benefit(I) >= 0) THEN Add(I)
Seminar in Database Systems Technion
![Page 37: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/37.jpg)
Technion
Algorithm for automatic index tuning
Seminar in Database Systems
![Page 38: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/38.jpg)
Index tuning algorithm
Notation: ITM – all the indexes from H which creation is
cost effective ITD – subset of existing indexes such that:
ITD fits in existing memory It’s still cost effective to create new index I
after possibly dropping members from ITD
If creating index I is more effective than maintaining existing indexes in ITD, DROP(ITD) && CREATE(I)
Remove I from H (set of candidate indexes to materialize)
Seminar in Database Systems Technion
![Page 39: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/39.jpg)
Experimental Evaluation
Seminar in Database Systems Technion
checkCarts($1) =from p in Products.AsEnumerable()join cart in Carts.AsEnumerable()
on p.Field<int>("id") equals cart.Field<int>("p_id")join c in Customers.AsEnumerable()
on cart.Field<int>("cu_id") equals c.Field<int>("id")where c.name = $1 select new { cart, p }
Possible IndexesI1 Categories(par_id)I2 Products(c_id)I3 Carts(cu_id)I4 Products(ca_id)I5 Customers(name)
browseProducts($1) =from p in Products.AsEnumerable()join c in Categories.AsEnumerable()on p.Field<int>("ca_id") equalsc.Field<int>("id")where c.par id = $1select pGenerated:
• 200,000 products• 50,000 customers• 1,000 categories• 5,000 items in the shopping
carts
Consider the following schema:
![Page 40: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/40.jpg)
Execution plans for evaluation queries
Seminar in Database Systems Technion
![Page 41: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/41.jpg)
Experimental Evaluation – Cont.
Seminar in Database Systems Technion
Generated schedule when tuning was disabled
![Page 42: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/42.jpg)
Experimental Evaluation – Cont.
Seminar in Database Systems Technion
Generated schedule when tuning was enabled
![Page 43: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/43.jpg)
Summary
We’ve discussed: LINQ – for declarative query formulation DataSet - a uniform way of representing in-
memory data. A lightweight optimizer for automatically
adjusting query execution strategies
Article’s main contribution: NOT a new query processing technique BUT: careful engineering of traditional
database concepts in a new context
Seminar in Database Systems Technion
![Page 44: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/44.jpg)
LOGO
Simon Zeltser
![Page 45: Adaptive Data Structures](https://reader036.fdocuments.us/reader036/viewer/2022062815/56816936550346895de0960a/html5/thumbnails/45.jpg)
LINQ Execution Model
Seminar in Database Systems Technion
Compiler merges LINQ
extension methods
Query syntax is converted to function calls and lambda expressions
Lambda expressions are converted to expression trees
Compiler finds a query pattern
Query is executed
lazily
Compiler infers types produced by queries
Adds query operations to IEnumerable<T>
At compile time Expressions are evaluated at run-time
Parsed and type checked at compile-time
Datasets are strongly typed
Operations ondata sets are strongly typed
Specialized or base Can optimize and re-write query
Expressions and operationscan execute remotely At run-time, when results are used We can force evaluations (ToArray())