New features-in-mariadb-and-mysql-optimizers
-
Upload
sergey-petrunya -
Category
Technology
-
view
617 -
download
1
Transcript of New features-in-mariadb-and-mysql-optimizers
Sergei Petrunia, MariaDB
New features
in MariaDB/MySQL
query optimizer
12:49:092
MySQL/MariaDB optimizer development
● Some features have common heritage● Big releases:
– MariaDB 5.3/5.5– MySQL 5.6– (upcoming) MariaDB 10.0
12:49:093
New optimizer features
Subqueries Batched Key Access(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/DELETE
Subqueries
FROM IN Others
PERFORMANCE_SCHEMA
Engine-independent statistics
InnoDB persistent statistics
12:49:094
New optimizer features
Subqueries Batched Key Access(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/DELETE
Subqueries
FROM IN Others
Engine-independent statistics
InnoDB persistent statistics
PERFORMANCE_SCHEMA
12:49:095
Subqueries in MySQL
● Subqueries are practially unusable● e.g. Facebook disabled them in the parser● Reason - “naive execution”.
12:49:096
Naive subquery execution
● For IN (SELECT... ) subqueries:
select * from hotelwhere hotel.country='USA' and hotel.name IN (select hotel_stays.hotel from hotel_stays where hotel_stays.customer='John Smith')
for (each hotel in USA ) { if (john smith stayed here) { … }}
● Naive execution:
● Slow!
12:49:097
Naive subquery execution (2)
● For FROM(SELECT …) subquereis:
1. Retrieve all hotels with > 500 rooms, store in a temporary table big_hotel;
2. Search in big_hotel for hotels near AMS.
● Naive execution:
● Slow!
select * from (select * from hotel where hotel.rooms > 500 ) as big_hotelwhere big_hotel.nearest_aiport='AMS';
12:49:098
New subquery optimizations
● Handle IN (SELECT ...)● Handle FROM (SELECT …)● Handle a lot of cases● Comparison with
PostgreSQL– ~1000x slower before– ~same order of magnitude now
● Releases– MySQL 6.0– MariaDB 5.5
● Sheeri Kritzer @ Mozilla seems happy with this one
– MySQL 5.6● Subset of MariaDB 5.5's
features
12:49:099
Subquery optimizations - summary
● Subqueries were generally unusable before MariaDB 5.3/5.5
● “Core” subquery optimizations are in– MariaDB 5.3/5.5– MySQL 5.6
● MariaDB has extra additions● Further information:
https://kb.askmonty.org/en/subquery-optimizations/
12:49:0910
Subqueries Batched Key Access(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/DELETE
Subqueries
FROM IN Others
Engine-independent statistics
InnoDB persistent statistics
PERFORMANCE_SCHEMA
12:49:0911
Batched Key Access - background
● Big, IO-bound joins were slow– DBT-3 benchmark could not finish*
● Reason?
● Nested Loops join hits the second table at random locations.
12:49:0912
Batched Key Access idea
Nested Loops Join Batched Key Access
Speedup reasons● Fewer disk head movements● Cache-friendliness● Prefetch-friendliness
12:49:0913
Batched Key Access benchmark
set join_cache_level=6; – enable BKA
select max(l_extendedprice) from orders, lineitemwhere l_orderkey=o_orderkey and o_orderdate between $DATE1 and $DATE2
Run with● Various join_buffer_size settings● Various size of $DATE1...$DATE2 range
12:49:0914
Batched Key Access benchmark (2)
-2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,0000
500
1000
1500
2000
2500
3000
BKA join performance depending on buffer size
query_size=1, regularquery_size=1, BKAquery_size=2, regularquery_size=2, BKAquery_size=3, regularquery_size=3, BKA
Buffer size, bytes
Qu
ery
tim
e, s
ec
Performance without BKA
Performance with BKA,given sufficient buffer size
12:49:0915
Batched Key Access summary
● Optimization for big, IO-bound joins– Orders-of-magnitude speedups
● Available in– MariaDB 5.3/5.5 (more advanced)– MySQL 5.6
● Not fully automatic yet– Needs to be manually enabled– Need to set buffer sizes.
12:49:0916
Subqueries Batched Key Access(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/DELETE
Subqueries
FROM IN Others
Engine-independent statistics
InnoDB persistent statistics
PERFORMANCE_SCHEMA
12:49:0917
Index Condition Pushdown
alter table lineitem add index s_r (l_shipdate, l_receiptdate);
select count(*) from lineitemwhere l_shipdate between '1993-01-01' and '1993-02-01' and datediff(l_receiptdate,l_shipdate) > 25 and l_quantity > 40
● A new feature in MariaDB 5.3/ MySQL 5.6
+----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+| 1 | SIMPLE | lineitem | range | s_r | s_r | 4 | NULL | 158854 | Using index condition; Using where |+----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+
1.Read index records in the rangel_shipdate between '1993-01-01' and '1993-02-01'
2.Check the index conditiondatediff(l_receiptdate,l_shipdate) > 25
3.Read full table rows4.Check the WHERE condition
l_quantity > 40
← New! ← Filters out records before
table rows are read
12:49:0918
Index Condition Pushdown - conclusions
Summary● Applicable to any index-based access (ref, range, etc)● Checks parts of WHERE after reading the index● Reduces number of table records to be read● Speedup can be like in “Using index”
– Great for IO-bound load (5x, 10x)– Some for CPU-bound workload (2x)
Conclusions● Have a selective condition on column?
– Put the column into index, at the end.
12:49:0919
Extended keys
● Before: optimizer has limited support for “tail” columns– 'Using index' supports it– ORDER BY col1, col2, pk1 support it
● After MariaDB 5.5/ MySQL 5.6– all parts of optimizer (ref access, range access, etc) can use the “tail”
CREATE TABLE tbl ( pk1 sometype, pk2 sometype, ... col1 sometype, col2 sometype, ... KEY indexA (col1, col2) ... PRIMARY KEY (pk1, pk2)
) ENGINE=InnoDB
indexA col1 col2 pk1 pk2
● Secondary indexes in InnoDB have invisible “tail”
12:49:0920
Subqueries Batched Key Access(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/DELETE
Subqueries
FROM IN Others
Engine-independent statistics
InnoDB persistent statistics
PERFORMANCE_SCHEMA
12:49:0921
Better EXPLAIN in MySQL 5.6
● EXPLAIN for UPDATE/DELETE/INSERT … SELECT– shows query plan for the finding records to update/deletemysql> explain update customer set c_acctbal = c_acctbal - 100 where c_custkey=12354;+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+| 1 | SIMPLE | customer | range | PRIMARY | PRIMARY | 4 | NULL | 1 | Using where |+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+
● EXPLAIN FORMAT=JSON– Produces [big] JSON output– Shows more information:
● Shows conditions attached to tables● Shows whether “Using temporary; using filesort” is done to handle
GROUP BY or ORDER BY.● Shows where subqueries are attached
– No other known additions
– Will be in MariaDB 10.0
The most useful addition!
12:49:0922
EXPLAIN FORMAT=JSON
What are the “conditions attached to tables”?
explainselect count(*) from orders, customer where customer.c_custkey=orders.o_custkey and customer.c_mktsegment='BUILDING' and orders.o_totalprice > customer.c_acctbal and orders.o_orderpriority='1-URGENT'
+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 1509871 | Using where || 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | dbt3sf10.customer.c_custkey | 7 | Using where |+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+
?
12:49:0923
EXPLAIN FORMAT=JSON (2)
{ "query_block": { "select_id": 1, "nested_loop": [ { "table": { "table_name": "customer", "access_type": "ALL", "possible_keys": [ "PRIMARY" ], "rows": 1509871, "filtered": 100, "attached_condition": "(`dbt3sf10`.`customer`.`c_mktsegment` = 'BUILDING')" } }, { "table": { "table_name": "orders", "access_type": "ref", "possible_keys": [ "i_o_custkey" ], "key": "i_o_custkey", "used_key_parts": [ "o_custkey" ], "key_length": "5", "ref": [ "dbt3sf10.customer.c_custkey" ], "rows": 7, "filtered": 100, "attached_condition": "((`dbt3sf10`.`orders`.`o_orderpriority` = '1-URGENT') and (`dbt3sf10`.`orders`.`o_totalprice` > `dbt3sf10`.`customer`.`c_acctbal`))" } } ] }}
+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 1509871 | Using where || 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | dbt3sf10.customer.c_custkey | 7 | Using where |+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+
12:49:0924
EXPLAIN ANALYZE (kind of)
● Does EXPLAIN match the reality?● Where is most of the time spent?● MySQL/MariaDB don't have “EXPLAIN ANALYZE” ...
select count(*) from orders, customer where customer.c_custkey=orders.o_custkey and customer.c_mktsegment='BUILDING' and orders.o_orderpriority='1-URGENT'
+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 149415 | Using where || 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | customer.c_custkey | 7 | Using index |+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+
12:49:0925
Traditional solution: Status variables
Problems:● Only #rows counters● all tables are counted together
mysql> flush status;Query OK, 0 rows affected (0.00 sec)
mysql> {run query}
mysql> show status like 'Handler%';+----------------------------+--------+| Variable_name | Value |+----------------------------+--------+| Handler_commit | 1 || Handler_delete | 0 || Handler_discover | 0 || Handler_icp_attempts | 0 || Handler_icp_match | 0 || Handler_mrr_init | 0 || Handler_mrr_key_refills | 0 || Handler_mrr_rowid_refills | 0 || Handler_prepare | 0 || Handler_read_first | 0 || Handler_read_key | 30142 || Handler_read_last | 0 || Handler_read_next | 303959 || Handler_read_prev | 0 || Handler_read_rnd | 0 || Handler_read_rnd_deleted | 0 || Handler_read_rnd_next | 150001 || Handler_rollback | 0 |...
. . .
12:49:0926
Newer solution: userstat
● In Facebook patch, Percona, MariaDB:
mysql> set global userstat=1;
mysql> flush table_statistics;
mysql> flush index_statistics;
mysql> {query}
mysql> show table_statistics;+--------------+------------+-----------+--------------+-------------------------+| Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |+--------------+------------+-----------+--------------+-------------------------+| dbt3sf1 | orders | 303959 | 0 | 0 || dbt3sf1 | customer | 150000 | 0 | 0 |+--------------+------------+-----------+--------------+-------------------------+
mysql> show index_statistics;+--------------+------------+-------------+-----------+| Table_schema | Table_name | Index_name | Rows_read |+--------------+------------+-------------+-----------+| dbt3sf1 | orders | i_o_custkey | 303959 |+--------------+------------+-------------+-----------+
● Counters are per-table– Ok as long as you don't have self-joins
● Overhead is negligible● Counters are server-wide (other queries affect them, too)
12:49:0927
Latest addition: PERFORMANCE_SCHEMA
● Allows to measure *time* spent reading each table● Has some visible overhead (Facebook's tests: 7%)● Counters are system-wide● Still no luck with self-joins
mysql> truncate performance_schema.table_io_waits_summary_by_table;
mysql> {query}
mysql> select object_schema, object_name, count_read, sum_timer_read, -- this is picoseconds sum_timer_read / (1000*1000*1000*1000) as read_seconds -- this is seconds from performance_schema.table_io_waits_summary_by_table where object_schema = 'dbt3sf1' and object_name in ('orders','customer');
+---------------+-------------+------------+----------------+--------------+| object_schema | object_name | count_read | sum_timer_read | read_seconds |+---------------+-------------+------------+----------------+--------------+| dbt3sf1 | orders | 334101 | 5739345397323 | 5.7393 || dbt3sf1 | customer | 150001 | 1273653046701 | 1.2737 |+---------------+-------------+------------+----------------+--------------+
12:49:0928
Subqueries Batched Key Access(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/DELETE
Subqueries
FROM IN Others
Engine-independent statistics
InnoDB persistent statistics
PERFORMANCE_SCHEMA
12:49:0929
What is table/index statistics?
select count(*) from customer, orderswhere customer.c_custkey=orders.o_custkey and customer.c_mktsegment='BUILDING';
+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 148305 | Using where || 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | customer.c_custkey | 7 | Using index |+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+
MariaDB > show table status like 'orders'\G*************************** 1. row *************************** Name: orders Engine: InnoDB Version: 10 Row_format: Compact Rows: 1495152 .............
MariaDB > show keys from orders where key_name='i_o_custkey'\G*************************** 1. row *************************** Table: orders Non_unique: 1 Key_name: i_o_custkey Seq_in_index: 1 Column_name: o_custkey Collation: A Cardinality: 212941 Sub_part: NULL .................
?
1495152 / 212941 = 7
“There are on average 7 orders for a given c_custkey”
12:49:0930
The problem with index statistics and InnoDB
MySQL 5.5, InnoDB● Statistics is calculated on-the-fly
– When the table is opened (server restart, DDL)– When sufficient number of records have been updated– ...
● Calculation uses random sampling– @@innodb_stats_sample_pages
● Result: – Statistics changes without warning
=> Query plans change, without warning● For example, DBT-3 benchmark
– 22 analytics queries– Plans-per-query: avg=2.8, max=7.
12:49:0931
Persistent table statistics
Persistent statistics v1 ● Percona Server 5.5 (ported to MariaDB 5.5)
– Need to enable it: innodb_use_sys_stats_table=1● Statistics is stored inside InnoDB
– User-visible through information_schema.innodb_sys_stats (read-only)● Setting innodb_stats_auto_update=OFF prevents unexpected updates
Persistent statistics v2● MySQL 5.6
– Enabled by default: innodb_stats_persistent=1● Stored in regular InnoDB tables
– mysql.innodb_table_stats, mysql.innodb_index_stats● Setting innodb_stats_auto_recalc=OFF prevents unexpected updates● Can also specify persistence/auto-recalc as a table option
12:49:0932
Persistent table statistics - summary
● Percona, then MySQL– Made statistics persistent– Disallowed automatic updates
● Remaining issue #1: it's still random sampling– DBT-3 benchmark– scale=30– Re-ran EXPLAINS for
benchmark queries– Counted different query
plans
● Remaining issue #2: limited amount of statistics– Only on index columns– Only AVG(#different_values)
12:49:0933
Upcoming: Engine-independent statistics
MariaDB 10.0: Engine-independent statistics● Collected/used on SQL layer● No auto updates, only ANALYZE TABLE
– 100% precise statics● More statistics
– Index statistics (like before)– Table statistics (like before)– Column statistics
● MIN/MAX values● Number of NULL / not NULL values● Histograms
● => Optimizer will be smarter and more reliable
12:49:0934
Conclusions
● Lots of new query optimizer features recently– Subqueries now just work– Big joins are much faster
● Need to turn it on
– More diagnostics
● Even more is coming
● Releases with features– MariaDB 5.5– MySQL 5.6, – (upcoming) MariaDB 10.0
12:49:0935
New optimizer features
Subqueries Batched Key Access(MRR)
Index Condition Pushdown
Extended Keys
EXPLAIN UPDATE/DELETE
Subqueries
FROM IN Others
PERFORMANCE_SCHEMA
Engine-independent statistics
InnoDB persistent statistics
12:49:0936
Thanks
Q & A