New features-in-mariadb-and-mysql-optimizers

36
Sergei Petrunia, MariaDB New features in MariaDB/MySQL query optimizer

Transcript of New features-in-mariadb-and-mysql-optimizers

Page 1: New features-in-mariadb-and-mysql-optimizers

Sergei Petrunia, MariaDB

New features

in MariaDB/MySQL

query optimizer

Page 2: New features-in-mariadb-and-mysql-optimizers

12:49:092

MySQL/MariaDB optimizer development

● Some features have common heritage● Big releases:

– MariaDB 5.3/5.5– MySQL 5.6– (upcoming) MariaDB 10.0

Page 3: New features-in-mariadb-and-mysql-optimizers

12:49:093

New optimizer features

Subqueries Batched Key Access(MRR)

Index Condition Pushdown

Extended Keys

EXPLAIN UPDATE/DELETE

Subqueries

FROM IN Others

PERFORMANCE_SCHEMA

Engine-independent statistics

InnoDB persistent statistics

Page 4: New features-in-mariadb-and-mysql-optimizers

12:49:094

New optimizer features

Subqueries Batched Key Access(MRR)

Index Condition Pushdown

Extended Keys

EXPLAIN UPDATE/DELETE

Subqueries

FROM IN Others

Engine-independent statistics

InnoDB persistent statistics

PERFORMANCE_SCHEMA

Page 5: New features-in-mariadb-and-mysql-optimizers

12:49:095

Subqueries in MySQL

● Subqueries are practially unusable● e.g. Facebook disabled them in the parser● Reason - “naive execution”.

Page 6: New features-in-mariadb-and-mysql-optimizers

12:49:096

Naive subquery execution

● For IN (SELECT... ) subqueries:

select * from hotelwhere hotel.country='USA' and hotel.name IN (select hotel_stays.hotel from hotel_stays where hotel_stays.customer='John Smith')

for (each hotel in USA ) { if (john smith stayed here) { … }}

● Naive execution:

● Slow!

Page 7: New features-in-mariadb-and-mysql-optimizers

12:49:097

Naive subquery execution (2)

● For FROM(SELECT …) subquereis:

1. Retrieve all hotels with > 500 rooms, store in a temporary table big_hotel;

2. Search in big_hotel for hotels near AMS.

● Naive execution:

● Slow!

select * from (select * from hotel where hotel.rooms > 500 ) as big_hotelwhere big_hotel.nearest_aiport='AMS';

Page 8: New features-in-mariadb-and-mysql-optimizers

12:49:098

New subquery optimizations

● Handle IN (SELECT ...)● Handle FROM (SELECT …)● Handle a lot of cases● Comparison with

PostgreSQL– ~1000x slower before– ~same order of magnitude now

● Releases– MySQL 6.0– MariaDB 5.5

● Sheeri Kritzer @ Mozilla seems happy with this one

– MySQL 5.6● Subset of MariaDB 5.5's

features

Page 9: New features-in-mariadb-and-mysql-optimizers

12:49:099

Subquery optimizations - summary

● Subqueries were generally unusable before MariaDB 5.3/5.5

● “Core” subquery optimizations are in– MariaDB 5.3/5.5– MySQL 5.6

● MariaDB has extra additions● Further information:

https://kb.askmonty.org/en/subquery-optimizations/

Page 10: New features-in-mariadb-and-mysql-optimizers

12:49:0910

Subqueries Batched Key Access(MRR)

Index Condition Pushdown

Extended Keys

EXPLAIN UPDATE/DELETE

Subqueries

FROM IN Others

Engine-independent statistics

InnoDB persistent statistics

PERFORMANCE_SCHEMA

Page 11: New features-in-mariadb-and-mysql-optimizers

12:49:0911

Batched Key Access - background

● Big, IO-bound joins were slow– DBT-3 benchmark could not finish*

● Reason?

● Nested Loops join hits the second table at random locations.

Page 12: New features-in-mariadb-and-mysql-optimizers

12:49:0912

Batched Key Access idea

Nested Loops Join Batched Key Access

Speedup reasons● Fewer disk head movements● Cache-friendliness● Prefetch-friendliness

Page 13: New features-in-mariadb-and-mysql-optimizers

12:49:0913

Batched Key Access benchmark

set join_cache_level=6; – enable BKA

select max(l_extendedprice) from orders, lineitemwhere l_orderkey=o_orderkey and o_orderdate between $DATE1 and $DATE2

Run with● Various join_buffer_size settings● Various size of $DATE1...$DATE2 range

Page 14: New features-in-mariadb-and-mysql-optimizers

12:49:0914

Batched Key Access benchmark (2)

-2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,0000

500

1000

1500

2000

2500

3000

BKA join performance depending on buffer size

query_size=1, regularquery_size=1, BKAquery_size=2, regularquery_size=2, BKAquery_size=3, regularquery_size=3, BKA

Buffer size, bytes

Qu

ery

tim

e, s

ec

Performance without BKA

Performance with BKA,given sufficient buffer size

Page 15: New features-in-mariadb-and-mysql-optimizers

12:49:0915

Batched Key Access summary

● Optimization for big, IO-bound joins– Orders-of-magnitude speedups

● Available in– MariaDB 5.3/5.5 (more advanced)– MySQL 5.6

● Not fully automatic yet– Needs to be manually enabled– Need to set buffer sizes.

Page 16: New features-in-mariadb-and-mysql-optimizers

12:49:0916

Subqueries Batched Key Access(MRR)

Index Condition Pushdown

Extended Keys

EXPLAIN UPDATE/DELETE

Subqueries

FROM IN Others

Engine-independent statistics

InnoDB persistent statistics

PERFORMANCE_SCHEMA

Page 17: New features-in-mariadb-and-mysql-optimizers

12:49:0917

Index Condition Pushdown

alter table lineitem add index s_r (l_shipdate, l_receiptdate);

select count(*) from lineitemwhere l_shipdate between '1993-01-01' and '1993-02-01' and datediff(l_receiptdate,l_shipdate) > 25 and l_quantity > 40

● A new feature in MariaDB 5.3/ MySQL 5.6

+----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+| 1 | SIMPLE | lineitem | range | s_r | s_r | 4 | NULL | 158854 | Using index condition; Using where |+----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+

1.Read index records in the rangel_shipdate between '1993-01-01' and '1993-02-01'

2.Check the index conditiondatediff(l_receiptdate,l_shipdate) > 25

3.Read full table rows4.Check the WHERE condition

l_quantity > 40

← New! ← Filters out records before

table rows are read

Page 18: New features-in-mariadb-and-mysql-optimizers

12:49:0918

Index Condition Pushdown - conclusions

Summary● Applicable to any index-based access (ref, range, etc)● Checks parts of WHERE after reading the index● Reduces number of table records to be read● Speedup can be like in “Using index”

– Great for IO-bound load (5x, 10x)– Some for CPU-bound workload (2x)

Conclusions● Have a selective condition on column?

– Put the column into index, at the end.

Page 19: New features-in-mariadb-and-mysql-optimizers

12:49:0919

Extended keys

● Before: optimizer has limited support for “tail” columns– 'Using index' supports it– ORDER BY col1, col2, pk1 support it

● After MariaDB 5.5/ MySQL 5.6– all parts of optimizer (ref access, range access, etc) can use the “tail”

CREATE TABLE tbl ( pk1 sometype, pk2 sometype, ... col1 sometype, col2 sometype, ... KEY indexA (col1, col2) ... PRIMARY KEY (pk1, pk2)

) ENGINE=InnoDB

indexA col1 col2 pk1 pk2

● Secondary indexes in InnoDB have invisible “tail”

Page 20: New features-in-mariadb-and-mysql-optimizers

12:49:0920

Subqueries Batched Key Access(MRR)

Index Condition Pushdown

Extended Keys

EXPLAIN UPDATE/DELETE

Subqueries

FROM IN Others

Engine-independent statistics

InnoDB persistent statistics

PERFORMANCE_SCHEMA

Page 21: New features-in-mariadb-and-mysql-optimizers

12:49:0921

Better EXPLAIN in MySQL 5.6

● EXPLAIN for UPDATE/DELETE/INSERT … SELECT– shows query plan for the finding records to update/deletemysql> explain update customer set c_acctbal = c_acctbal - 100 where c_custkey=12354;+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+| 1 | SIMPLE | customer | range | PRIMARY | PRIMARY | 4 | NULL | 1 | Using where |+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+

● EXPLAIN FORMAT=JSON– Produces [big] JSON output– Shows more information:

● Shows conditions attached to tables● Shows whether “Using temporary; using filesort” is done to handle

GROUP BY or ORDER BY.● Shows where subqueries are attached

– No other known additions

– Will be in MariaDB 10.0

The most useful addition!

Page 22: New features-in-mariadb-and-mysql-optimizers

12:49:0922

EXPLAIN FORMAT=JSON

What are the “conditions attached to tables”?

explainselect count(*) from orders, customer where customer.c_custkey=orders.o_custkey and customer.c_mktsegment='BUILDING' and orders.o_totalprice > customer.c_acctbal and orders.o_orderpriority='1-URGENT'

+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 1509871 | Using where || 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | dbt3sf10.customer.c_custkey | 7 | Using where |+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+

?

Page 23: New features-in-mariadb-and-mysql-optimizers

12:49:0923

EXPLAIN FORMAT=JSON (2)

{ "query_block": { "select_id": 1, "nested_loop": [ { "table": { "table_name": "customer", "access_type": "ALL", "possible_keys": [ "PRIMARY" ], "rows": 1509871, "filtered": 100, "attached_condition": "(`dbt3sf10`.`customer`.`c_mktsegment` = 'BUILDING')" } }, { "table": { "table_name": "orders", "access_type": "ref", "possible_keys": [ "i_o_custkey" ], "key": "i_o_custkey", "used_key_parts": [ "o_custkey" ], "key_length": "5", "ref": [ "dbt3sf10.customer.c_custkey" ], "rows": 7, "filtered": 100, "attached_condition": "((`dbt3sf10`.`orders`.`o_orderpriority` = '1-URGENT') and (`dbt3sf10`.`orders`.`o_totalprice` > `dbt3sf10`.`customer`.`c_acctbal`))" } } ] }}

+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 1509871 | Using where || 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | dbt3sf10.customer.c_custkey | 7 | Using where |+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+

Page 24: New features-in-mariadb-and-mysql-optimizers

12:49:0924

EXPLAIN ANALYZE (kind of)

● Does EXPLAIN match the reality?● Where is most of the time spent?● MySQL/MariaDB don't have “EXPLAIN ANALYZE” ...

select count(*) from orders, customer where customer.c_custkey=orders.o_custkey and customer.c_mktsegment='BUILDING' and orders.o_orderpriority='1-URGENT'

+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 149415 | Using where || 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | customer.c_custkey | 7 | Using index |+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+

Page 25: New features-in-mariadb-and-mysql-optimizers

12:49:0925

Traditional solution: Status variables

Problems:● Only #rows counters● all tables are counted together

mysql> flush status;Query OK, 0 rows affected (0.00 sec)

mysql> {run query}

mysql> show status like 'Handler%';+----------------------------+--------+| Variable_name | Value |+----------------------------+--------+| Handler_commit | 1 || Handler_delete | 0 || Handler_discover | 0 || Handler_icp_attempts | 0 || Handler_icp_match | 0 || Handler_mrr_init | 0 || Handler_mrr_key_refills | 0 || Handler_mrr_rowid_refills | 0 || Handler_prepare | 0 || Handler_read_first | 0 || Handler_read_key | 30142 || Handler_read_last | 0 || Handler_read_next | 303959 || Handler_read_prev | 0 || Handler_read_rnd | 0 || Handler_read_rnd_deleted | 0 || Handler_read_rnd_next | 150001 || Handler_rollback | 0 |...

. . .

Page 26: New features-in-mariadb-and-mysql-optimizers

12:49:0926

Newer solution: userstat

● In Facebook patch, Percona, MariaDB:

mysql> set global userstat=1;

mysql> flush table_statistics;

mysql> flush index_statistics;

mysql> {query}

mysql> show table_statistics;+--------------+------------+-----------+--------------+-------------------------+| Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |+--------------+------------+-----------+--------------+-------------------------+| dbt3sf1 | orders | 303959 | 0 | 0 || dbt3sf1 | customer | 150000 | 0 | 0 |+--------------+------------+-----------+--------------+-------------------------+

mysql> show index_statistics;+--------------+------------+-------------+-----------+| Table_schema | Table_name | Index_name | Rows_read |+--------------+------------+-------------+-----------+| dbt3sf1 | orders | i_o_custkey | 303959 |+--------------+------------+-------------+-----------+

● Counters are per-table– Ok as long as you don't have self-joins

● Overhead is negligible● Counters are server-wide (other queries affect them, too)

Page 27: New features-in-mariadb-and-mysql-optimizers

12:49:0927

Latest addition: PERFORMANCE_SCHEMA

● Allows to measure *time* spent reading each table● Has some visible overhead (Facebook's tests: 7%)● Counters are system-wide● Still no luck with self-joins

mysql> truncate performance_schema.table_io_waits_summary_by_table;

mysql> {query}

mysql> select object_schema, object_name, count_read, sum_timer_read, -- this is picoseconds sum_timer_read / (1000*1000*1000*1000) as read_seconds -- this is seconds from performance_schema.table_io_waits_summary_by_table where object_schema = 'dbt3sf1' and object_name in ('orders','customer');

+---------------+-------------+------------+----------------+--------------+| object_schema | object_name | count_read | sum_timer_read | read_seconds |+---------------+-------------+------------+----------------+--------------+| dbt3sf1 | orders | 334101 | 5739345397323 | 5.7393 || dbt3sf1 | customer | 150001 | 1273653046701 | 1.2737 |+---------------+-------------+------------+----------------+--------------+

Page 28: New features-in-mariadb-and-mysql-optimizers

12:49:0928

Subqueries Batched Key Access(MRR)

Index Condition Pushdown

Extended Keys

EXPLAIN UPDATE/DELETE

Subqueries

FROM IN Others

Engine-independent statistics

InnoDB persistent statistics

PERFORMANCE_SCHEMA

Page 29: New features-in-mariadb-and-mysql-optimizers

12:49:0929

What is table/index statistics?

select count(*) from customer, orderswhere customer.c_custkey=orders.o_custkey and customer.c_mktsegment='BUILDING';

+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 148305 | Using where || 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | customer.c_custkey | 7 | Using index |+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+

MariaDB > show table status like 'orders'\G*************************** 1. row *************************** Name: orders Engine: InnoDB Version: 10 Row_format: Compact Rows: 1495152 .............

MariaDB > show keys from orders where key_name='i_o_custkey'\G*************************** 1. row *************************** Table: orders Non_unique: 1 Key_name: i_o_custkey Seq_in_index: 1 Column_name: o_custkey Collation: A Cardinality: 212941 Sub_part: NULL .................

?

1495152 / 212941 = 7

“There are on average 7 orders for a given c_custkey”

Page 30: New features-in-mariadb-and-mysql-optimizers

12:49:0930

The problem with index statistics and InnoDB

MySQL 5.5, InnoDB● Statistics is calculated on-the-fly

– When the table is opened (server restart, DDL)– When sufficient number of records have been updated– ...

● Calculation uses random sampling– @@innodb_stats_sample_pages

● Result: – Statistics changes without warning

=> Query plans change, without warning● For example, DBT-3 benchmark

– 22 analytics queries– Plans-per-query: avg=2.8, max=7.

Page 31: New features-in-mariadb-and-mysql-optimizers

12:49:0931

Persistent table statistics

Persistent statistics v1 ● Percona Server 5.5 (ported to MariaDB 5.5)

– Need to enable it: innodb_use_sys_stats_table=1● Statistics is stored inside InnoDB

– User-visible through information_schema.innodb_sys_stats (read-only)● Setting innodb_stats_auto_update=OFF prevents unexpected updates

Persistent statistics v2● MySQL 5.6

– Enabled by default: innodb_stats_persistent=1● Stored in regular InnoDB tables

– mysql.innodb_table_stats, mysql.innodb_index_stats● Setting innodb_stats_auto_recalc=OFF prevents unexpected updates● Can also specify persistence/auto-recalc as a table option

Page 32: New features-in-mariadb-and-mysql-optimizers

12:49:0932

Persistent table statistics - summary

● Percona, then MySQL– Made statistics persistent– Disallowed automatic updates

● Remaining issue #1: it's still random sampling– DBT-3 benchmark– scale=30– Re-ran EXPLAINS for

benchmark queries– Counted different query

plans

● Remaining issue #2: limited amount of statistics– Only on index columns– Only AVG(#different_values)

Page 33: New features-in-mariadb-and-mysql-optimizers

12:49:0933

Upcoming: Engine-independent statistics

MariaDB 10.0: Engine-independent statistics● Collected/used on SQL layer● No auto updates, only ANALYZE TABLE

– 100% precise statics● More statistics

– Index statistics (like before)– Table statistics (like before)– Column statistics

● MIN/MAX values● Number of NULL / not NULL values● Histograms

● => Optimizer will be smarter and more reliable

Page 34: New features-in-mariadb-and-mysql-optimizers

12:49:0934

Conclusions

● Lots of new query optimizer features recently– Subqueries now just work– Big joins are much faster

● Need to turn it on

– More diagnostics

● Even more is coming

● Releases with features– MariaDB 5.5– MySQL 5.6, – (upcoming) MariaDB 10.0

Page 35: New features-in-mariadb-and-mysql-optimizers

12:49:0935

New optimizer features

Subqueries Batched Key Access(MRR)

Index Condition Pushdown

Extended Keys

EXPLAIN UPDATE/DELETE

Subqueries

FROM IN Others

PERFORMANCE_SCHEMA

Engine-independent statistics

InnoDB persistent statistics

Page 36: New features-in-mariadb-and-mysql-optimizers

12:49:0936

Thanks

Q & A