MySQL DW Breakfast

28
1 A Breakfast Seminar in London 4 th Feb 2010 Data Warehousing Solutions with MySQL Sunday, 7 February 2010

Transcript of MySQL DW Breakfast

Page 1: MySQL DW Breakfast

1

A Breakfast Seminar in London4th Feb 2010

Data Warehousing Solutionswith MySQL

Sunday, 7 February 2010

Page 2: MySQL DW Breakfast

2

9:00 - Welcome Coffee and Tea

9:20 - Introduction

9:30 - MySQL for Data Warehousing

10:00 - Infobright

10:30 - Coffee/Tea Break

10:45 - Talend

11:30 - Seminar Ends.

Sunday, 7 February 2010

Page 3: MySQL DW Breakfast

Introduction

Sunday, 7 February 2010

Page 4: MySQL DW Breakfast

MySQL Market Segments

4

Open-Source Powers the Web & The Network

`

Web / Web 2.0 OEM / ISV's

On Demand, SaaS, Hosting Enterprise 2.0Telecommunications

Sunday, 7 February 2010

Page 5: MySQL DW Breakfast

TimelineSun acquired MySQL completed March 2008

Good acquisition, MySQL continues to grow

April 2009 : ORCL agreement to acquire Sun

The EC gives full clearance to the acquisition

We continue to develop, maintain, market, sell and support MySQL!

5

MAR2008

APR2009

JAN2010

FEB2010

Sunday, 7 February 2010

Page 6: MySQL DW Breakfast

Oracle’s MySQL Strategy• Becomes part of the Open Source GBU> Independent sales organisation - retained from Sun> Independent development organisation – retained from Sun

•Make MySQL better>Apply Oracle’s expertise and engineering processes>A natural extension of what Oracle has done with InnoDB

•Make MySQL support better>Leverage Oracle’s award winning global support infrastructure

•Make MySQL part of the Oracle stack>Many customers use both MySQL and Oracle database> Integrate with Enterprise Manager, Secure Backup, Audit Vault

6http://www.oracle.com/ocom/groups/public/@ocom/documents/webcontent/044521.pdf

Sunday, 7 February 2010

Page 7: MySQL DW Breakfast

7

Enjoy the event!

Sunday, 7 February 2010

Page 8: MySQL DW Breakfast

Data Warehousingwith MySQL

Sunday, 7 February 2010

Page 9: MySQL DW Breakfast

9

MySQL Data Warehousing Strategy• Strongly support common data warehouse use cases•Offer modern technology that adheres to MySQL’s

software priorities (reliability, performance, ease-of-use) • Partner with major BI/ETL vendors•Offer highly attractive total cost of ownership

Sunday, 7 February 2010

Page 10: MySQL DW Breakfast

The MySQL DW Ecosystem

10

RDBMS

STORAGE ENGINE

PLATFORM

ETL INTEGRATIONBI/REPORTINGTOOLS

Sunday, 7 February 2010

Page 11: MySQL DW Breakfast

Common Use Cases1.Small, semi real-time data marts2.Continuous, real-time/query data warehousing3.Traditional, standard reporting warehouse4.Massive historical, with ad-hoc queries warehouse5.BI, analytic in OLTP applications (emerging…)

11

Real-TimeData Mart Traditional AnalyticalHistorical

SQL

Sunday, 7 February 2010

Page 12: MySQL DW Breakfast

MySQL Technical Strategy• Provide open source architecture to maximize innovation• Offer core data warehousing feature set• Provide specialised data warehouse engines for key use

cases• Supply strategies for combating mixed workload

challenge

12

Sunday, 7 February 2010

Page 13: MySQL DW Breakfast

Pluggable Storage Engine Architecture

13

Sunday, 7 February 2010

Page 14: MySQL DW Breakfast

MySQL Enterprise

14

• Global Monitoring of All Servers

• Web-Based Central Console• Built-in Advisors and Expert Advice

• MySQL Query Analyzer• Replication Monitor

• MySQL Enterprise Server

• Monthly Rapid Updates• Quarterly Service Packs

• Hot Fix Program• Indemnification

• 24 x 7 x 365 Production Support

• Web-Based Knowledge Base• Consultative Help

• High Availability and Scale Out

Server

Monitor

Support

http://www.mysql.com/products/enterprise/Sunday, 7 February 2010

Page 15: MySQL DW Breakfast

MySQL Enterprise Monitor

• Single, consolidated view into entire MySQL environment

• Auto discovery of MySQL Servers, Replication Topologies

• New Query Analyzer• Customisable rules-based

monitoring and alerts• Identifies problems before they

occur• Reduces risk of downtime• Makes it easier

to scale-out without requiring more DBAs

15

“Your Virtual MySQL DBA”Assistant

http://www.mysql.com/products/enterprise/advisors.html

Sunday, 7 February 2010

Page 16: MySQL DW Breakfast

“Finds code problems before your customers do.”

• Centralised monitoring of Queries across all servers• No reliance on Slow Query Logs,

SHOW PROCESSLIST, VMSTAT, etc.• Aggregated view of query

execution counts, time, and rows• Saves time parsing atomic

executions for total query expense

MySQL Query Analyzer

16

Sunday, 7 February 2010

Page 17: MySQL DW Breakfast

The MySQL Technology behind a DW Strategy

17

REPLICATION MySQL PROXY

PARTITIONINGCol1 Col2 Col3 Col4 Col5 Col1 Col2 Col3 Col4 Col5

Col1 Col2 Col3 Col4 Col5

SHARDING

MEMCACHED QUERY CACHE

STORAGEENGINES

Sunday, 7 February 2010

Page 18: MySQL DW Breakfast

Warehouse use cases/mapping

18

Analytical

•MyISAM•InnoDB•CSV•Archive•Federated•Query Cache•Replication•Sharding•Proxy•Memcached

Historical

SQL

•MyISAM•InnoDB•CSV•Archive•Federated•Query Cache•Replication•Sharding•Proxy•Memcached

Traditional

•MyISAM•InnoDB•CSV•Archive•Federated•Query Cache•Replication•Sharding•Proxy•Memcached

Real-Time

•MyISAM•InnoDB•CSV•Archive•Federated•Query Cache•Replication•Sharding•Proxy•Memcached

Data Mart

•MyISAM•InnoDB•CSV•Archive•Federated•Query Cache•Replication•Sharding•Proxy•Memcached

Sunday, 7 February 2010

Page 19: MySQL DW Breakfast

MySQLData WarehouseCookbook

Sunday, 7 February 2010

Page 20: MySQL DW Breakfast

Partitioning• Partition Pruning

• Partitioning key must result in an INT

• Check table lock with MyISAM

• Check the number of open files

• Foreign Keys, Fulltext and spatial indexes are not supported

• No MyISAM, LOAD INDEX or INSERT DELAYED

• For DW, it is mainly limited to InnoDB and MyISAM

20

Vertical PartitioningCol1 Col2 Col3 Col4 Col5 Col1 Col2 Col1 Col3 Col4 Col5

Horizontal PartitioningCol1 Col2 Col3 Col4 Col5 Col1 Col2 Col3 Col4 Col5

Col1 Col2 Col3 Col4 Col5

Sunday, 7 February 2010

Page 21: MySQL DW Breakfast

SQL Generation•Multipass SQL or Subqueries• Avoid complex queries>More efficient use of query cache, key buffer and buffer pool>More shard friendly>More scalable for the current version of MySQL

–No parallel query

•Use temp tables and stored procedures•Check with EXPLAIN> ALL (sequential scan)> Using filesort> Using temporary (for GROUP BY and ORDER BY)

21

Sunday, 7 February 2010

Page 22: MySQL DW Breakfast

Server Tuning

22

Thread Buffers• join_buffer_size• read_buffer_size• read_rnd_buffer_size• sort_buffer_size• For large resultsets and for high number of concurrent users,

they should be set individually or by role

Temporary Tables• tmp_table_size• max_heap_table_size• Implicit tmp tables can be tricky to control

• Store intermediate results

• Connect > Query > Disconnect

Query Cache• SELECT...SQL_NO_CACHE• query_cache_type• query_cache_limit• query_cache_size• No time functions

Sunday, 7 February 2010

Page 23: MySQL DW Breakfast

Modelling

23

PK Key Key Key Key Met Met Met Met Met

Key Desc Key Desc Key Desc Key Desc Key Desc Key Desc

Key Desc Key Desc Key Desc Key Desc Key Desc Key Desc

PK Key Key Key Key Met Met Met Met Met

Key Key Key Desc

Key Key Desc

Key Desc

Key Key Key Desc

Key Key Desc

Key Desc

Key Key Key Desc

Key Key Desc

Key Desc

Key Key Key Desc

Key Key Desc

Key Desc

PK Key Key Key Key ... Key Met Met Met PK Met Met Met Met Met Met Met

• Multidimensional, but with care

• Snowflake vs Star Schema> Do not denormalise descriptions> Multiple fact tables with 1:1 relationships

• Queries> Query on Dimension N > Temp Table> Query on Fact 1 > Temp Table> Query on Fact 2 Join Temp Table

Sunday, 7 February 2010

Page 24: MySQL DW Breakfast

Storage Engines

24

MyISAM• Compressed Tables• Use different spindles for data and indexes• Fast inserts - Insert already sorted data (when possible)• Key Buffers

• Multiple Key Buffers• SET GLOBAL <key_cache_name>.key_buffer_size...• CACHE INDEX ... IN ...• key_cache_block_size• bulk_insert_buffer_size

• Spatial and Fulltext indexes• All active shared disk cluster

InnoDB• innodb_file_per_table• innodb_flush_log_at_trx_commit• innodb_buffer_pool_size• The new Innodb plugin

• Fast index creation• Data compression

• Do not use FK or constraints

CSV• Good ETL trick• No Partitioning, no indexing, no nulls

Archive• Data compression and fast retrieve• INSERT & SELECT• No index (autoincrement only)

Federated• Limited indexing• Tips:

• Queries can be executed on multiple servers + result collection

• Use of stored procedures to consolidate results and control the access to the FEDERATED tables

Sunday, 7 February 2010

Page 25: MySQL DW Breakfast

Replication• [For some] The easiest way to

provide real time data marts• Tips:>Delayed replication>Rotating servers> Support to more power users

25

SourceMaster

RotatingSlaves

UpdatingQuerying

Read

Write

BI/ReportServers

SourceMaster

Real Time

-10Min

-30Min

Yesterday-1

Hour-12

Hours

Sunday, 7 February 2010

Page 26: MySQL DW Breakfast

Sharding• Sharding> Great to distribute the workload> Fantastic if the queries can be executed in parallel thanks to a middle or a client

layer> Tips:

– Replicate the dimensions

– specialise shards on facts– partition facts on shards

26

Read

Write

BI/ReportServers

Dimensions Master

Shards

A1 A2 B C1 C2 D

Sunday, 7 February 2010

Page 27: MySQL DW Breakfast

• Webinars• http://www-it.mysql.com/news-and-events/web-seminars/

• Consulting• MySQL Architecture & Design

• MySQL Performance tuning

http://www.mysql.com/consulting/

• Training• MySQL 5.1 for developers

• MySQL 5.1 for DBAs

http://www.mysql.com/training/

• White Papers• http://www.mysql.com/why-mysql/white-papers/

27

More Resources Available

Sunday, 7 February 2010