MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB –...

65
Santa Clara, California | April 23th – 25th, 2018 Sergey Petrunia MariaDB Project Vicen iu Ciorbaru MariaDB Foundation ț Santa Clara, California | April 23th – 25th, 2018 Sergey Petrunia MariaDB Project Vicen iu Ciorbaru MariaDB Foundation ț MariaDB Optimizer in 10.3, where does it stand?

Transcript of MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB –...

Page 1: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

Santa Clara, California | April 23th – 25th, 2018

Sergey Petrunia MariaDB ProjectVicen iu Ciorbaru MariaDB Foundationț

Santa Clara, California | April 23th – 25th, 2018

Sergey Petrunia MariaDB ProjectVicen iu Ciorbaru MariaDB Foundationț

MariaDB Optimizer in 10.3, where does it stand?

Page 2: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

2

Agenda● New releases of MySQL and MariaDB

– MariaDB 10.2 and 10.3

– MySQL 8.0

● Optimizer related features– Histograms

– Non-recursive CTEs● Derived table optimizations

– Window Functions

● Let’s look and compare– Also look at PostgreSQL and SQL Server

Page 3: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

Histograms

Page 4: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

4

Condition Selectivity

Query optimizer needs to decide on a plan to execute the query

Goal is to get the shortest running time• Chose access method

- Index Access, Hash Join, BKA, etc.• Choose correct join order to minimize the cost of reading rows

- Usually, minimizing rows read minimizes execution time- Sometimes reading more rows is advantageous, if table / index is all in memory

Page 5: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

5

Condition Selectivity

Query optimizer needs to decide on a plan to execute the query

Goal is to get the shortest running time• Chose access method

- Index Access, Hash Join, BKA, etc.• Choose correct join order to minimize the cost of reading rows

- Usually, minimizing rows read minimizes execution time- Sometimes reading more rows is advantageous, if table / index is all in memory

Use a cost model to estimate how long an execution plan would take

Page 6: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

6

Condition Selectivity

Query optimizer needs to decide on a plan to execute the query

Goal is to get the shortest running time• Chose access method

- Index Access, Hash Join, BKA, etc.• Choose correct join order to minimize the cost of reading rows

- Usually, minimizing rows read minimizes execution time- Sometimes reading more rows is advantageous, if table / index is all in memory

Use a cost model to estimate how long an execution plan would take

For each condition in the where clause (and having) we compute• Condition selectivity

- How many rows of the table is this condition going to accept? 10%, 20%, 90% ?

Page 7: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

7

Condition Selectivity

Query optimizer needs to decide on a plan to execute the query

Goal is to get the shortest running time• Chose access method

- Index Access, Hash Join, BKA, etc.• Choose correct join order to minimize the cost of reading rows

- Usually, minimizing rows read minimizes execution time- Sometimes reading more rows is advantageous, if table / index is all in memory

Use a cost model to estimate how long an execution plan would take

For each condition in the where clause (and having) we compute• Condition selectivity

- How many rows of the table is this condition going to accept? 10%, 20%, 90% ?

Getting the estimates right is important!

Page 8: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

8

Condition Selectivity

Suppose we have query with 10 tables: T1, T2, T3, … T10

Page 9: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

9

Condition Selectivity

Suppose we have query with 10 tables: T1, T2, T3, … T10

Query optimizer will:• Estimate the number of rows that it will read from each table• Based on the conditions in the where (and having) clause

Page 10: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

10

Condition Selectivity

Suppose we have query with 10 tables: T1, T2, T3, … T10

Query optimizer will:• Estimate the number of rows that it will read from each table• Based on the conditions in the where (and having) clauses

Assume estimates have an average error coefficient e• Total number of estimated rows read is:

- (e * #T1) * (e * #T2) * (e * #T3) * … * (e * #T10)• Where #T1..#T10 is the actual number of rows read for each table

Page 11: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

11

Condition Selectivity

Suppose we have query with 10 tables: T1, T2, T3, … T10

Query optimizer will:• Estimate the number of rows that it will read from each table• Based on the conditions in the where (and having) clauses

Assume estimates have an average error coefficient e• Total number of estimated rows read is:

- (e * #T1) * (e * #T2) * (e * #T3) * … * (e * #T10)• Where #T1..#T10 is the actual number of rows read for each table

The estimation error is amplified, the more tables there are in a join• If we under/over estimate by a factor of 2 final error factor is 1024!• If error is only 1.5 (off by 50%), final error factor is ~60

Page 12: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

12

Condition Selectivity

How does optimizer produce estimates?

• Condition analysis:- Is it possible to satisfy conditions? t1.a > 10 and t1.a < 5- Equality condition on a distinct column?

• Index dives to get number of rows in a range• Guesstimates (MySQL)• Histograms for non-indexed columns

Page 13: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

13

Histograms

Histograms estimate a distribution

Page 14: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

14

Histograms estimate a distribution

Multiple types of histograms• Equi-Width Histograms

Histograms

Page 15: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

15

Histograms estimate a distribution

Multiple types of histograms• Equi-Width Histograms

- Not uniform information- Many values in one bucket (5)- Other buckets take few values (1)

Histograms

Page 16: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

16

Histograms estimate a distribution

Multiple types of histograms• Equi-Width Histograms

- Not uniform information- Many values in one bucket (5)- Other buckets take few values (1)

Histograms

Page 17: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

17

Histograms estimate a distribution

Multiple types of histograms• Equi-Width Histograms

- Not uniform information- Many values in one bucket (5)- Other buckets take few values (1)

• Equi-Height Histograms- All bins have same #values- More bins where there are more

Values

Histograms

Page 18: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

18

Histograms estimate a distribution

Multiple types of histograms• Equi-Width Histograms

- Not uniform information- Many values in one bucket (5)- Other buckets take few values (1)

• Equi-Height Histograms- All bins have same #values- More bins where there are more

Values• Most Common Values Histograms

- Useful for ENUM columns- One bin per value

Histograms

Page 19: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

19

Histograms in MariaDB

MariaDB histograms are collected by doing a full table scan• Needs to be done manually using ANALYZE TABLE … PERSISTENT

Stored inside• mysql.table_stats, mysql.column_stats, mysql.index_stats• As a binary value (max 255 bytes), single / double precision• Special function to decode, decode_histogram()

Can be manually updated• One can run data collection on a slave, then propagate results

Not enabled by default, needs a few switches turned on to work

Page 20: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

20

Histograms in MySQL

MySQL histograms are collected by doing a full table scan• Needs to be done manually using ANALYZE TABLE … UPDATE HISTOGRAM• Can collect all data or perform sampling by skipping rows, based on max memory

allocation

Stored inside data dictionary• Can be viewed through INFORMATION_SCHEMA.column_statistics• Stored as Equi-Width (Singleton) or Equi-Height• Visible as JSON

Can not be manually updated• No obvious easy way to share statistics

Enabled by default, will be used when available

Page 21: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

21

Histograms in PostgreSQL

PostgreSQL histograms are collected by doing a true random read• Can be collected manually with ANALYZE• Also collected automatically when VACUUM runs

Stores equal-height and most common values at the same time• Equal-height histogram doesn’t cover MCV

Can be manually updated• One could import histograms from slave instances• VACUUM auto-collection seems to cover the use case

Page 22: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

22

Using Histograms

Histograms are useful for range conditions• Equi-width or equi-height:

- COLUMN > constant• Most Common Values (Singleton):

- COLUMN = constant

Problematic when multiple columns are involved:• t1.COL1 > 100 AND t1.COL2 > 1000

Most optimizers assume column values are independent• P(A ∩ B) = P(A) * P(B) vs P(A ∩ B) = P(A) * P(B | A)

PostgreSQL 10 has added support for multi-variable distributions.MySQL assumes independent values.MariaDB doesn’t handle multi-variable case well either.

Page 23: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

23

Using Histograms

Sample database world:

select city.namefrom citywhere (city.population > 10 mil or city.population < 10 thousand)

MariaDB MySQL PostgreSQL

Estimated Rows Filtered 1.95% 1.09% 1.05%

Actual Rows Filtered 1.05 %

Page 24: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

24

Using Histograms

Table with 2 columns A and B• t1.a always equals t1.b• 10 distinct values, each value occurs with 10% probability

select t1.A, t1.Bfrom t1where t1.A = t1.B and t1.A = 5

MariaDB MySQL PostgreSQL

Estimated Rows Filtered 1.03% 1% 10%

Actual Rows Filtered 10%

Page 25: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

25

Conclusions

MariaDB• Slightly less precise than MySQL, but smaller in size• Same problem with correlated data as MySQL• Performs full-table-scan, no sampling support• Easy to share between instances

MySQL• Histograms provide good estimates for real world data• Poor performance with highly correlated data• Performs full-table-scan, supports sampling

PostgreSQL• Estimates on par with MySQL and MariaDB• Support for multi-variable distributions!• True sampling

Page 26: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

Optimizations for derived tablesand non-recursive CTEs

Page 27: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

27

A set of related optimizations

Some are new, some are old:● Derived table merge● Condition pushdown

– Condition pushdown through window functions● GROUP BY splitting

Page 28: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

28

Background – derived table merge

● “VIP customers and their big orders from October”

select * from vip_customer, (select * from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' ) as OCT_ORDERSwhere OCT_ORDERS.amount > 1M and OCT_ORDERS.customer_id = customer.customer_id

Page 29: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

29

Naive executionselect * from vip_customer, (select * from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' ) as OCT_ORDERSwhere OCT_ORDERS.amount > 1M and OCT_ORDERS.customer_id = vip_customer.customer_id

orders

vip_customer

1 – compute oct_orders

2- do join OCT_ORDERS

amount > 1M

Page 30: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

30

Derived table mergeselect * from vip_customer, (select * from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' ) as OCT_ORDERSwhere OCT_ORDERS.amount > 1M and OCT_ORDERS.customer_id = vip_customer.customer_id

select * from vip_customer, orderswhere order_date BETWEEN '2017-10-01' and '2017-10-31' and orders.amount > 1M and orders.customer_id = vip_customer.customer_id

Page 31: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

31

Execution after merge

vip_customer

Join

orders

select * from vip_customer, orderswhere order_date BETWEEN '2017-10-01' and '2017-10-31' and orders.amount > 1M and orders.customer_id = vip_customer.customer_id

Made in October

amount > 1M

● Allows the optimizer to join customer→orders or orders→customer

● Good for optimization

Page 32: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

32

What if the subquery has a GROUP BY ?

● Merging is only possible when the “final” operation of the subquery is a join

● Can’t merge if it’s a GROUP BY/DISTINCT/ORDER BY LIMIT/etc

create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id

select * from OCT_TOTALS where customer_id=1

Page 33: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

33

Execution is inefficient

create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id

select * from OCT_TOTALS where customer_id=1

orders

1 – compute all totals

2- get customer=1

OCT_TOTALS

customer_id=1

Sum

Page 34: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

34

Condition pushdown optimization

select * from OCT_TOTALS where customer_id=1

create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id

● Can push down conditions on GROUP BY columns

● … to filter out rows that go into groups we don’t care about

Page 35: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

35

Condition pushdown

select * from OCT_TOTALS where customer_id=1

orders

1 – find customer_id=1

OCT_TOTALS,customer_id=1

customer_id=1

Sum

● Looking only at groups you’re interested in is much more efficient– Pushing into HAVING clause is useful, too.

create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id

orders

Page 36: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

36

Pushdown for inferred conditions (in MariaDB)

select customer.customer_name, TOTAL_AMTfrom customer, OCT_TOTALSwhere customer.customer_id=OCT_TOTALS.customer_id and customer.customer_id=1

create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id

OCT_TOTALS.customer_id=1

Page 37: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

37

Condition Pushdown through Window Functions● “Customer’s biggest orders”create view top_three_orders asselect * from ( select customer_id, amount, rank() over (partition by customer_id order by amount desc ) as order_rank from orders) as ordered_orderswhere order_rank<3

select * from top_three_orders where customer_id=1

+-------------+--------+------------+| customer_id | amount | order_rank |+-------------+--------+------------+| 1 | 10000 | 1 || 1 | 9500 | 2 || 1 | 400 | 3 || 2 | 3200 | 1 || 2 | 1000 | 2 || 2 | 400 | 3 |...

Page 38: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

38

Condition pushdown through Window Functions

Without condition pushdown

● Compute top_three_orders for all customers

● select rows with customer_id=1

select * from top_three_orders where customer_id=1

With condition pushdown

● Only compute top_three_orders for customer_id=1

– This is much faster– Can take advantage of

index on customer_id

Page 39: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

39

Summary so far● Derived table merge

– Available since MySQL/MariaDB 5.1 and in most other databases

● Condition pushdown

– Available in PostgreSQL, MariaDB 10.2

– Not available in MySQL 5.7 or 8.0

– Limitations:● MariaDB doesn’t push from HAVING into WHERE (MDEV-7486)● PostgreSQL doesn’t push inferred conditions

● Condition pushdown through window functions

– Available in PostgreSQL, MariaDB 10.3

Page 40: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

Split grouping optimization

Page 41: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

41

Split grouping use case

select * from customer, OCT_TOTALS where customer.customer_id=OCT_TOTALS.customer_id and customer.customer_name IN ('Customer 1', 'Customer 2')

create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id

● Compute a table of groups (OCT_TOTALS)

● Join the groups to another table (customer)

● The other table has a selective restriction (only need two customers)

● But condition pushdown can’t be used

Page 42: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

42

Execution, the old way

Sum

orders

select * from customer, OCT_TOTALS where customer.customer_id= OCT_TOTALS.customer_id and customer.customer_name IN ('Customer 1', 'Customer 2')

create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id

Customer 1

Customer 2

Customer 3

Customer 100

Customer 1Customer 2Customer 3

Customer 100

customer

Customer 1Customer 2

OCT_TOTALS

● Inefficient, OCT_TOTALS is computed for *all* customers.

Page 43: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

43

Split grouping execution (1)

Sum

customer

Customer 1

Customer 100

orders

Customer 1

Customer 1 Sum

● Similar to “LATERAL DERIVED”

● Pick Customer1, compute part of OCT_TOTALS table for him.

Page 44: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

44

Split grouping execution (2)

Sum

customer

Customer 2

Customer 2

Customer 1

Customer 100

orders

Customer 1

Customer 1

Customer 2

Sum

SumSum

● Similar to “LATERAL DERIVED”

● Pick Customer1, compute part of OCT_TOTALS table for him

● Pick Customer2, compute part of OCT_TOTALS table for him

Page 45: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

45

Split grouping execution (3)

Sum

customer

Customer 2

Customer 2

Customer 1

Customer 100

orders

Customer 1

Customer 1

Customer 2

Sum

SumSum

● Similar to “LATERAL DERIVED”

● Pick Customer1, compute part of OCT_TOTALS table for him

● Pick Customer2, compute part of OCT_TOTALS table for him

● ...

Page 46: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

46

Split Grouping prerequisites

Sum

customer

Customer 2

Customer 2

Customer 1

Customer 100

orders

Customer 1

Customer 1

Customer 2

Sum

SumSum

● There is a join condition that “selects” one GROUP BY group:

– OCT_TOTALS.customer_id=customer.customer_id

● The join order allows to make “lookups” in the grouped temp table

– customer→ OCT_TOTALS

● There is an index that allows to read only one GROUP BY group.

– INDEX(orders.customer_id)

OCT_TOTALS

Page 47: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

47

Split grouping execution

● Available since MariaDB 10.3● The optimizer makes a critera + cost-based choice whether to use the optimization● EXPLAIN shows “LATERAL DERIVED”

● @@optimizer_switch flag: split_materialization (ON by default)

select * from customer, OCT_TOTALS where customer.customer_id= OCT_TOTALS.customer_id and customer.customer_name IN ('Customer 1', 'Customer 2')

create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id

+------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+| 1 | PRIMARY | customer | ALL | PRIMARY | NULL | NULL | NULL | 1000 | || 1 | PRIMARY | <derived2> | ref | key0 | key0 | 4 | customer.customer_id | 36 | || 2 | LATERAL DERIVED | orders | ref | customer_id | customer_id | 4 | customer.customer_id | 365 | Using where |+------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+

Page 48: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

48

Summary so far● Derived table merge

– Available since MySQL/MariaDB 5.1 and in most other databases

● Condition pushdown

– Available in PostgreSQL, MariaDB 10.2

– Not available in MySQL 5.7 or 8.0

● Condition pushdown through window functions

– Available in PostgreSQL, MariaDB 10.3

● Split grouping optimization

– MariaDB 10.3 only

Page 49: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

Optimizations for non-recursive CTEs

Page 50: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

50

CTE name

CTE Body

CTE Usage

with engineers as ( select * from employees where dept='Engineering')select * from engineers where ...

WITHCTE syntax

Similar to DERIVED tables

“Query-local VIEWs”

Page 51: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

51

select *from( select * from employees where dept='Engineering') as engineerswhere...

with engineers as ( select * from employees where dept='Engineering')select * from engineers where ...

CTEs are like derived tables

Page 52: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

52

with engineers as ( select * from employees where dept in ('Development','Support')),eu_engineers as ( select * from engineers where country IN ('NL',...))select ...from eu_engineers;

Use case #1: CTEs refer to CTEs

More readable than nested FROM(SELECT …)

Page 53: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

53

with engineers as ( select * from employees where dept in ('Development','Support')),select * from engineers E1 where not exists (select 1 from engineers E2 where E2.country=E1.country and E2.name <> E1.name);

Use case #2: Multiple uses of CTE Anti-self-join

Page 54: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

54

select * from sales_product_year CUR, sales_product_year PREV,where CUR.product=PREV.product and CUR.year=PREV.year + 1 and CUR.total_amt > PREV.total_amt

with sales_product_year as ( select product, year(ship_date) as year, sum(price) as total_amt from item_sales group by product, year)

Use case #2: example 2 Year-over-year comparisons

Page 55: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

55

Optimizations for non-recursive CTEs

1. The same set as for derived tables– Merge– Condition pushdown

● through window functions– Lateral derived

2. Compute CTE once if it is used multiple times

Page 56: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

56

MergeCondition pushdown

Lateral derived

CTE reuse

MariaDB 10.3 ✔ ✔ ✔ ✘

MS SQL Server ✔ ✔ ? ✘

PostgreSQL ✘ ✘ ✘ ✔

MySQL 8.0 ✔ ✘ ✘ ✔

CTE Optimizations

Merge and Condition Pushdown are the most important

MariaDB supports them, like MS SQL.

PostgreSQL’s approach is *weird*

“CTEs are optimization barriers”

MySQL 8.0: “try merging, otherwise reuse”

Page 57: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

Window functions optimizations

Page 58: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

58

Window functions optimizations

● Window functions introduced in

– MariaDB 10.2

– MySQL 8.0

● Optimizations for window functions

– Condition pushdown

– Reduce the number of sorting passes

– Streamed computation

– ORDER BY-like optimizations

Page 59: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

59

Reduce the number of sorting passes

tbl

tbl

tbl

join

sort

select rank() over (order by col1), ntile(4)over (order by col2), rank() over (order by ...),from tbl1 join tbl2 on ...

● Each window function requires a sort● Identical PARTITION/ORDER BY must share the sort step● Compatible may share the sort step● Supported by all: MariaDB, MySQL 8, PostgreSQL, ...

computewindow function

Page 60: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

60

Streamed computationwin_func( ) over (partition by ... order by ... rows between preceding N1 and following N2)

● Window function is computed from rows in the window frame– O (n_rows * frame_size)

● Frame moves down with the current row● For most functions, one can update the value after the

frame has moved – this is streamed computation– SUM, COUNT, AVG

● For some, this doesn’t hold (e.g. MAX)

old_val

new_val

cur_row

Page 61: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

61

ORDER BY [LIMIT] like optimizations● Skip sorting if the rows come already sorted● ORDER BY … LIMIT and descending window function

select row_number() over (...) as RNfrom ...order by RN limit 10

● Restriction on ROW_NUMBER select *from (select row_number() over (...) as RN from ... ) as Twhere RN < 10

Page 62: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

62

Window functions optimization summaryReuse

compatible sorts

Streamed computation

Conditionpushdown

ORDER BY LIMIT-like

optimizations

MariaDB 10.3 ✔ ~✔ ✔ ✘

MS SQL Server ✔ ~✔ ✔ ✔

PostgreSQL ✔ ~✔ ✔ ✘

MySQL 8.0 ✔ ~✔ ✘ ✘

Everyone has this since it’s mandatoryfor identical

sorts

Essential,otherwise

O(N) computation becomes O(N^2)

Very nice to have for

analytic queries

Sometimes used for TOP-n queries by those with “big

database” background

Page 63: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

63

Summary

● Both MariaDB and MySQL now have histograms– MySQL’s are larger and more precise– Both are lagging behind PostgreSQL, still

● Derived tables: MariaDB got condition pushdown– MariaDB 10.3: Pushdown for window functions, Split grouping– Caught up with PostgreSQL and exceeded it.

● Non-recursive CTEs– See derived tables– PostgreSQL and MySQL 8 have made weird choice

● Window functions– Similar optimizations in all three– MySQL lacks condition pushdown (careful with VIEWs).

Page 64: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

Thank You!

Page 65: MariaDB Optimizer in 10.3, where does it stand?...2 Agenda New releases of MySQL and MariaDB – MariaDB 10.2 and 10.3 – MySQL 8.0 Optimizer related features – Histograms – Non-recursive

65

Rate Our Session