1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia...
-
Upload
donna-boals -
Category
Documents
-
view
219 -
download
2
Transcript of 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia...
![Page 1: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/1.jpg)
1
Evaluating Top-Evaluating Top-KK Selection QueriesSelection Queries
Surajit ChaudhuriSurajit ChaudhuriMicrosoft ResearchMicrosoft Research
Luis GravanoLuis GravanoColumbia UniversityColumbia University
![Page 2: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/2.jpg)
2
Motivating Example
Find 4-bedroom houses Find 4-bedroom houses priced at $350,000priced at $350,000
Exact matches often too Exact matches often too restrictiverestrictive
Rank of houses that are closest Rank of houses that are closest to specification more desirableto specification more desirable
![Page 3: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/3.jpg)
3
Motivating Example (cont.)
Find 4-bedroom houses Find 4-bedroom houses priced at $350,000priced at $350,000
House 1House 1:: 5 bedrooms; $400,000; 5 bedrooms; $400,000; Score=0.9Score=0.9 House 2House 2: 4 bedrooms; $485,000; : 4 bedrooms; $485,000; Score=0.8Score=0.8 House 3House 3: 6 bedrooms; $785,000; : 6 bedrooms; $785,000; Score=0.3Score=0.3
![Page 4: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/4.jpg)
4
Top-K Queries over Precise Relational Data
Support approximate matches Support approximate matches with with minimal changes to the minimal changes to the relational enginerelational engine
Initial focus: Initial focus: Selection queriesSelection queries with “equality” conditionswith “equality” conditions
![Page 5: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/5.jpg)
5
Outline
Definition of top-Definition of top-kk queries queriesExecution alternatives Execution alternatives Mapping of top-Mapping of top-kk queries to queries to
selection queriesselection queriesExperimentsExperiments
![Page 6: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/6.jpg)
6
Top-K Selection Queries
Specify an Specify an nn-dimensional target point-dimensional target pointDefine scoring functionDefine scoring functionSpecify Specify kk
AnswerAnswer:: kk objects with the best score objects with the best score for the target point (i.e., the “top for the target point (i.e., the “top kk” ” objects)objects)
![Page 7: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/7.jpg)
7
Specifying Top-K Queries using SQL
Select *Select *From From RROrder Order [k][k] By By Scoring_FunctionScoring_Function
![Page 8: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/8.jpg)
8
Scoring Functions Measure Degree of Match
Assume attributes defined over Assume attributes defined over metric spacemetric space
Score on any one attribute is Score on any one attribute is well definedwell defined
How to aggregate scores How to aggregate scores acrossacross attributes?attributes?
![Page 9: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/9.jpg)
9
Scoring Functions
Normalize attribute scores to be Normalize attribute scores to be in [0,1] rangein [0,1] range
Combine scores using popular Combine scores using popular aggregate functionsaggregate functions MinMin EuclideanEuclidean Sum, Max, …Sum, Max, …
![Page 10: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/10.jpg)
10
Some Example Scoring Functions
Let Let q=(qq=(q11, …, q, …, qnn)) be the target point be the target point and and t=(tt=(t11, …, t, …, tnn)) a tuple: a tuple:
Min(q, t)Min(q, t) = = min{1-|min{1-|qq11--tt11|, …, 1-||, …, 1-|qqnn--ttnn|}|}
Euclidean(q, t)Euclidean(q, t) = = 1- sqrt((1- sqrt((qq11--tt11))22//nn+ … + (+ … + (qqnn--ttnn))22//nn))
![Page 11: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/11.jpg)
11
Executing Top-K Queries
Known techniques require at least one Known techniques require at least one sequential scansequential scan (or a functional index) (or a functional index) Evaluate Scoring_Function Evaluate Scoring_Function for each tuplefor each tuple SortSort tuples [Carey & Kossman ‘97; ‘98] tuples [Carey & Kossman ‘97; ‘98]
Question: How to avoid sequential Question: How to avoid sequential scans?scans?Exploit implicit selectivity of top-Exploit implicit selectivity of top-kk queries queries
![Page 12: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/12.jpg)
12
Mapping a Top-K Query to a Selection Query
Determine a Determine a search score search score SS such that: such that: Expected # of tuples with Expected # of tuples with score > Sscore > S is is kk No false dismissals No false dismissals
Turn the condition that Turn the condition that score > Sscore > S into a into a range selectionrange selection condition(s) condition(s)
Evaluate selection query using existing Evaluate selection query using existing query processor and access pathsquery processor and access paths
![Page 13: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/13.jpg)
13
Mapping a Top-K Query to a Selection Query
4-bedrooms; $350,000; k=104-bedrooms; $350,000; k=10
Retrieve all tuples with Retrieve all tuples with score > 0.5 score > 0.5 (at least (at least kk=10 tuples expected)=10 tuples expected)
Analyze scoring function to Analyze scoring function to determine selection range: determine selection range: Bedrooms: [3, 5] and Price: [$250K, Bedrooms: [3, 5] and Price: [$250K,
$450K]$450K]
![Page 14: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/14.jpg)
14
Mapping a Search Score to a Selection Range
For For search score search score SS , target point , target point q=(qq=(q11, q, q22)),, and scoring function and scoring function MinMin::
Selection range:Selection range: tt11 IN [ IN [qq11 - (1.0- - (1.0-SS), ), qq11 + (1.0- + (1.0-SS)])]
tt22 IN [IN [qq22 - (1.0- - (1.0-SS), ), qq22 + (1.0- + (1.0-SS)])]
![Page 15: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/15.jpg)
15
Determining a Search Score
MonotonicityMonotonicity: Consider tuple : Consider tuple tt that is no further that is no further from target than from target than t’t’ on any attribute: on any attribute:
Score of t should be at least that of t’Score of t should be at least that of t’ Therefore, Score cannot be high “far away” Therefore, Score cannot be high “far away”
from targetfrom target Sphere for Sphere for EuclideanEuclidean Box for Box for MinMin
……centered at target pointcentered at target point
““Tightness” of enclosing range varies with scoring Tightness” of enclosing range varies with scoring functionsfunctions
a
b
c
![Page 16: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/16.jpg)
16
The Min Scoring Function
![Page 17: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/17.jpg)
17
The Euclidean Scoring Function
![Page 18: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/18.jpg)
18
Comments on Mapping
Search score determines Search score determines efficiencyefficiency, , not correctnessnot correctness
Issues in efficiency:Issues in efficiency: Avoid retrieving too many tuplesAvoid retrieving too many tuples Avoid retrieving fewer than Avoid retrieving fewer than kk top top
tuples tuples (restarts)(restarts)
How to determine good search How to determine good search scores?scores?
![Page 19: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/19.jpg)
19
Determining Search Scores
Find Find kk points in data points in dataCompute their scoreCompute their scoreSet search score to lowest scoreSet search score to lowest score
Challenges:Challenges: Determining the initial Determining the initial kk points to points to
optimize executionoptimize execution Taking original query into accountTaking original query into account
![Page 20: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/20.jpg)
20
Using Histograms
Q4
20
11
10
![Page 21: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/21.jpg)
21
Picking K Representative “Tuples”
Collapse histogram bucket to a single Collapse histogram bucket to a single representative pointrepresentative point Furthest from Furthest from QQ in bucket in bucket (“NoRestarts”)(“NoRestarts”) Closest to Closest to QQ in bucket in bucket (“Restarts”)(“Restarts”)
Assign bucket frequency to the single Assign bucket frequency to the single representative pointrepresentative point
Include closest representative points Include closest representative points until we have until we have kk tuples tuples
![Page 22: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/22.jpg)
22
Using Histograms:“NoRestarts”
Q4
20
11
10
![Page 23: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/23.jpg)
23
Using Histograms:“Restarts”
4
20
11
10
Q
![Page 24: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/24.jpg)
24
Other Strategies for Determining Search Scores
Calculate search score for: Calculate search score for: nn = = NoRestarts NoRestarts (“pessimistic” (“pessimistic”
extreme)extreme) rr = = Restarts Restarts (“optimistic” extreme)(“optimistic” extreme)
Use intermediate scores:Use intermediate scores: InterInter11 = (2 = (2nn + + rr)/3)/3
InterInter22 = (= (nn + 2 + 2rr)/3)/3
0 RestartsNoRestarts 1
![Page 25: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/25.jpg)
25
Evaluating the Generated Selection Query
Sequential scanSequential scanIntersection of a set of indexes, Intersection of a set of indexes,
followed by data access followed by data access Special case: index-only accessSpecial case: index-only access
![Page 26: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/26.jpg)
26
Indexes and Statistics
IndexesIndexesnn-dim (concatenated-key) B-trees-dim (concatenated-key) B-trees
StatisticsStatistics MaxDiffMaxDiff as base 1-dim histogram as base 1-dim histogram
Multidimensional histograms:Multidimensional histograms:AVI, Phased, MHistAVI, Phased, MHist
![Page 27: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/27.jpg)
27
Experimental Evaluation
Is mapping to selection queries an Is mapping to selection queries an effectiveeffective technique? technique?
Sensitivity of relevant parameters:Sensitivity of relevant parameters: Scoring functionsScoring functions Data skew and dimensionalityData skew and dimensionality StatisticsStatistics
![Page 28: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/28.jpg)
28
Data Generation
Characterized by Characterized by ZZ = < = <zz11, …, , …, zznn>>
Generate Generate NN tuples by Zipfian distribution tuples by Zipfian distribution zz11
Group tuples by Group tuples by attrattr11
For a partition with For a partition with attrattr11 = = aa with with NN11 tuples: tuples: Generate Generate NN11 values values ww11, ..., w, ..., wN1N1 using Zipfian using Zipfian
distribution distribution zz22
Create pairs (Create pairs (aa, , ww11), …, (), …, (aa, , wwN1N1))
Repeat steps to fill in all attribute valuesRepeat steps to fill in all attribute values
![Page 29: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/29.jpg)
29
Metrics for Comparison
Fraction of data tuples accessed may Fraction of data tuples accessed may be compared to:be compared to: Ideal: Ideal: kk Worst case: size of data setWorst case: size of data set
% of restarts% of restarts
![Page 30: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/30.jpg)
30
Exploring Limits
Intrinsic limitations of range-query approach: Intrinsic limitations of range-query approach: Enclose actual top-Enclose actual top-kk tuples in tight tuples in tight nn--
rectanglerectangle Retrieve all tuples in Retrieve all tuples in nn-rectangle-rectangle
Less than 1% of database tuples in n-rectangleLess than 1% of database tuples in n-rectangle(k=10; 100,000 tuples)(k=10; 100,000 tuples)
Effect of retrieving tuples with score > Effect of retrieving tuples with score > SS using using an an nn-rectangle-rectangle
![Page 31: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/31.jpg)
31
Effect of Scoring Functions
MinMin has little/no gap between has little/no gap between target region and enclosing target region and enclosing nn--rectanglerectangle
As As kk increases, fraction of retrieved increases, fraction of retrieved tuples grows slowest for tuples grows slowest for MinMin
EuclideanEuclidean performs worse performs worseLess tight Less tight nn-rectangle -rectangle
![Page 32: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/32.jpg)
32
Tuples with Score > S v. Data Skew(Euclidean; PHASED histogram of 5KB; n=3)
![Page 33: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/33.jpg)
33
Effect of Mapping Strategies and Histograms
Multidimensional histograms aid Multidimensional histograms aid computation of tight search scorescomputation of tight search scores
NoRestartsNoRestarts dominates at high data dominates at high data skewskew
![Page 34: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/34.jpg)
34
Tuples Retrieved v. Data Skew(PHASED histogram of 5KB; n=3)
![Page 35: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/35.jpg)
35
Restarts v. Data Skew(PHASED histogram of 5KB; n=3)
![Page 36: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/36.jpg)
36
Related Work (1)
[Fagin ‘96; ‘98] [Fagin ‘96; ‘98] Multimedia attributes with query “subsystem”Multimedia attributes with query “subsystem” Multiple index scansMultiple index scans Independence assumptionIndependence assumption
[Chaudhuri & Gravano ‘96][Chaudhuri & Gravano ‘96] Multimedia attributes with query “subsystem”Multimedia attributes with query “subsystem” Map top-Map top-kk queries to “selection” queries queries to “selection” queries Independence assumptionIndependence assumption Limited scoring functionsLimited scoring functions
![Page 37: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/37.jpg)
37
Related Work (2)
[Carey & Kossman ‘97; ‘98][Carey & Kossman ‘97; ‘98]Optimized sorting phase using Optimized sorting phase using kk
Nearest-neighbor literatureNearest-neighbor literature [Donjerkovic & Ramakrishnan ‘99][Donjerkovic & Ramakrishnan ‘99]
Probabilistic optimization framework Probabilistic optimization framework No multidimensional scoring functionsNo multidimensional scoring functions Independence assumptionsIndependence assumptions
![Page 38: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/38.jpg)
38
SummaryDefined mapping of top-Defined mapping of top-kk queries to queries to
traditional selection queriestraditional selection queriesExploit existing database statistics and Exploit existing database statistics and
query processorsquery processorsStudied effect of scoring functions, Studied effect of scoring functions,
data skew, statistics on mappingdata skew, statistics on mapping
Full experimental analysis forthcoming!Full experimental analysis forthcoming!
![Page 39: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/39.jpg)
39
Tuples Retrieved v. Histogram Size(Euclidean; n=3; Z21)
![Page 40: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/40.jpg)
40
Tuples Retrieved v. n(PHASED histogram of 5KB; Z21)
![Page 41: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/41.jpg)
41
Restarts v. n(PHASED histogram of 5KB; Z21)
![Page 42: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/42.jpg)
42
Tuples Retrieved v. k(PHASED histogram of 5KB; Z21; n=3)
![Page 43: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/43.jpg)
43
Restarts v. k(PHASED histogram of 5KB; Z21; n=3)
![Page 44: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/44.jpg)
44
Restarts v. Data Skew(Euclidean; PHASED histogram of 5KB; n=3)
![Page 45: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/45.jpg)
45
Tuples Retrieved v. Histogram Size(Census Database; PHASED)
![Page 46: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/46.jpg)
46
Tuples Retrieved v. Data Skew(Euclidean; PHASED histogram of 5KB; n=3)
![Page 47: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/47.jpg)
47
The Sum Scoring Function
![Page 48: 1 Evaluating Top-K Selection Queries Surajit Chaudhuri Microsoft Research Luis Gravano Columbia University.](https://reader033.fdocuments.us/reader033/viewer/2022051314/551b0324550346cf5a8b4877/html5/thumbnails/48.jpg)
48
The Max Scoring Function