Making MySQL Great For Business Intelligence
-
Upload
calpont -
Category
Technology
-
view
106 -
download
0
description
Transcript of Making MySQL Great For Business Intelligence
![Page 1: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/1.jpg)
1
2010 Calpont Corporation – Confidential & Proprietary
Making MySQL Great for Business
Intelligence
Robin SchumacherVP Products
Calpont
![Page 2: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/2.jpg)
2
2010 Calpont Corporation – Confidential & Proprietary
Agenda
• Quick overview of BI• Looking at the right technology foundation• General physical MySQL design decisions that
impact success• A look at row vs. column MySQL databases• Conclusions
![Page 3: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/3.jpg)
3
2010 Calpont Corporation – Confidential & Proprietary
A Quick Overview of Business Intelligence
![Page 4: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/4.jpg)
4
2010 Calpont Corporation – Confidential & Proprietary
What is Business Intelligence?
Business Intelligence (BI) refers to skills, processes, technologies, applications and practices used to support decision making.
BI technologies provide historical, current, and predictive views of business operations. Common functions of Business Intelligence technologies are reporting, online analytical processing, analytics, data mining, business performance management, benchmarking, text mining, and predictive
analytics.
![Page 5: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/5.jpg)
5
2010 Calpont Corporation – Confidential & Proprietary
Why Business Intelligence?
• All companies now recognize the need for BI• Information is a weapon that both large and small
companies use to better understand their customer, competitors, and marketplace
• Making poorly informed decisions can be disastrous
![Page 6: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/6.jpg)
6
2010 Calpont Corporation – Confidential & Proprietary
Overview of Most BI Frameworks
OLTP
Files/XML
Log Files
Operational
Source Data
Stag
ing
or O
DS
ETL
Fina
l ET
L
Rep
ortin
g, B
I, N
otifi
catio
n La
yer Ad-Hoc
Dashboards
Reports
Notifications
Users
Staging
Area
Data
Warehouse
Warehouse
Archive
Purge/Archive
Data Warehouse and Metadata Management
![Page 7: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/7.jpg)
7
2010 Calpont Corporation – Confidential & Proprietary
Simple Reporting Databases
OLTP Database Read Shard OneReporting Database
Application Servers
End Users
ETL
Data Archiving Link
Replication
![Page 8: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/8.jpg)
8
2010 Calpont Corporation – Confidential & Proprietary
Building the Right Technical Foundation
![Page 9: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/9.jpg)
9
2010 Calpont Corporation – Confidential & Proprietary
What is the Key Component for Success?
In other words, what you do with your MySQL Server – in terms of physical design, schema design, and
performance design – will be the biggest factor on whether a BI system hits the mark…
* Philip Russom, “Next Generation Data Warehouse Platforms”, TDWI, 2009.
*
![Page 10: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/10.jpg)
10
2010 Calpont Corporation – Confidential & Proprietary
What Technology Decisions are Being Made?
* Philip Russom, “Next Generation Data Warehouse Platforms”, TDWI, 2009.
*
![Page 11: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/11.jpg)
11
2010 Calpont Corporation – Confidential & Proprietary
What General MySQL Design Decisions Help Success?
![Page 12: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/12.jpg)
12
2010 Calpont Corporation – Confidential & Proprietary
First – Get/Use a Modeling Tool
![Page 13: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/13.jpg)
13
2010 Calpont Corporation – Confidential & Proprietary
Horizontal Partitioning Model
![Page 14: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/14.jpg)
14
2010 Calpont Corporation – Confidential & Proprietary
Read Sharding / Horizontal Partitioning
![Page 15: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/15.jpg)
15
2010 Calpont Corporation – Confidential & Proprietary
Vertical Partitioning Model
![Page 16: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/16.jpg)
16
2010 Calpont Corporation – Confidential & Proprietary
General List of Top BI Design Decisions
• Storage Engine Selection
• Physical Table/Index Partitioning
• Indexing Creation and Placement
• Set proper amounts for memory caches, etc.
• Row vs. Column Engine / Database
![Page 17: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/17.jpg)
17
2010 Calpont Corporation – Confidential & Proprietary
• No practical storage limits (1 tablespace=110TB)• Automatic storage management• ANSI-SQL support for all datatypes (including BLOB and XML)• Data/Index partitioning (range, hash, key, list, composite)• Built-in Replication• Main memory tables (for dimension tables)• Variety of indexes (b-tree, fulltext, clustered, hash, GIS)• Multiple-configurable data/index caches• Pre-loading of index data into index caches• Unique query cache (caches result set + query; not just data)• Parallel data load (5.1 and higher – multiple files)• Multi-insert DML• Data compression (depends on engine) • Read-only tables• Fast connection pooling• Cost-based optimizer • Wide platform support
Core BI Features for MySQL
![Page 18: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/18.jpg)
18
2010 Calpont Corporation – Confidential & Proprietary
MyISAM
Archive
Memory
CSV
• High-speed query/insert engine• Non-transactional, table locking• Good for data marts, small
warehouses
• Compresses data by up to 80%• Fastest for data loads• Only allows inserts/selects• Good for seldom accessed data
• Main memory tables• Good for small dimension tables• B-tree and hash indexes
• Comma separated values• Allows both flat file access and
editing as well as SQL query/DML• Allows instantaneous data loads
Also:Merge for pre-5.1 partitioning
Storage Engines Internal to MySQL
![Page 19: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/19.jpg)
2010 Calpont Corporation – Confidential & Proprietary
Partitioning and Performance (5.1+)mysql> CREATE TABLE part_tab
-> ( c1 int ,c2 varchar(30) ,c3 date )
-> PARTITION BY RANGE (year(c3)) (PARTITION p0 VALUES LESS THAN (1995),
-> PARTITION p1 VALUES LESS THAN (1996) , PARTITION p2 VALUES LESS THAN (1997) ,
-> PARTITION p3 VALUES LESS THAN (1998) , PARTITION p4 VALUES LESS THAN (1999) ,
-> PARTITION p5 VALUES LESS THAN (2000) , PARTITION p6 VALUES LESS THAN (2001) ,
-> PARTITION p7 VALUES LESS THAN (2002) , PARTITION p8 VALUES LESS THAN (2003) ,
-> PARTITION p9 VALUES LESS THAN (2004) , PARTITION p10 VALUES LESS THAN (2010),
-> PARTITION p11 VALUES LESS THAN MAXVALUE );
mysql> create table no_part_tab (c1 int,c2 varchar(30),c3 date);
*** Load 8 million rows of data into each table ***
mysql> select count(*) from no_part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31';
+----------+
| count(*) |
+----------+
| 795181 |
+----------+
1 row in set (38.30 sec)
mysql> select count(*) from part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31';
+----------+
| count(*) |
+----------+
| 795181 |
+----------+
1 row in set (3.88 sec)
90% Response Time Reduction
![Page 20: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/20.jpg)
20
2010 Calpont Corporation – Confidential & Proprietary
Index Creation and Placement
• If query patterns are known and predictable, and data is relatively static, then indexing isn’t that difficult
• If the situation is a very ad-hoc environment, indexing becomes more difficult. Must analyze SQL traffic and index the best you can
• Over-indexing a table that is frequently loaded / refreshed / updated can severely impact load and DML performance. Test dropping and re-creating indexes vs. doing in-place loads and DML. Realize, though, any queries will be impacted from dropped indexes
• Index maintenance (rebuilds, etc.) can cause issues in MySQL (locking, etc.)
• Remember some storage engines don’t support normal indexes (Archive, CSV)
![Page 21: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/21.jpg)
21
2010 Calpont Corporation – Confidential & Proprietary
Row vs. Column Engines / Databases
![Page 22: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/22.jpg)
22
2010 Calpont Corporation – Confidential & Proprietary
Column vs. Row Orientation
A column-oriented architecture looks the same on the surface, but stores data differently than legacy/row-based databases…
![Page 23: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/23.jpg)
23
2010 Calpont Corporation – Confidential & Proprietary
• Column databases only read the columns needed to satisfy a query vs. full rows
• If you are only selecting a subset of columns from a table and / or are using very wide tables, column DB’s are a great choice for BI
• Column databases (most of them…) remove the need for indexing because the column is the index
• Column databases automatically eliminate unnecessary I/O both logically and physically, so they do away with partitioning needs too as well as materialized views, etc.
• As a rule of thumb, column databases provide 5-10x (or more) the query performance of legacy RDBMS’s
Why a Column Database?
![Page 24: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/24.jpg)
24
2010 Calpont Corporation – Confidential & Proprietary
Why a Column Database?
"If you're bringing back all the columns, a column-store database isn't going to perform any better than a row-store DBMS, but
analytic applications are typically looking at all rows and only a few columns. When you put that type of application on a column-
store DBMS, it outperforms anything that doesn't take a column-store approach."
- Donald Feinberg, Gartner Group
![Page 25: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/25.jpg)
25
2010 Calpont Corporation – Confidential & Proprietary
• If you routinely have SELECT * queries or queries that request the majority of columns in a table
• If you constantly are doing lots of singleton inserts and deletes. As these are row-based operations they will normally run somewhat slower on a column DB than a row-oriented DB (more block touches are needed). Updates tend to run OK as they are a column operation
• If you want to do pure OLTP work. Some column DB’s are transactional (so data integrity is ensured), but they are not suited for straight OLTP work
• If you have a small database: such a DB eclipses the benefit column databases offer over row DB’s
Why Not a Column Database?
![Page 26: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/26.jpg)
26
2010 Calpont Corporation – Confidential & Proprietary
What is Calpont’s InfiniDB?
InfiniDB is an open source, column-oriented database architected to handle data warehouses, data marts, analytic/BI systems, and other read-intensive applications. It delivers true scale up (more CPU’s/cores, RAM) and massive
parallel processing (MPP) scale out capabilities for MySQL users. Linear performance gains are achieved when adding either more capabilities to one
box or using commodity machines in a scale out configuration.
Scale up Scale Out
![Page 27: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/27.jpg)
27
2010 Calpont Corporation – Confidential & Proprietary
InfiniDB vs. a Leading Row RDBMS
2 TB’s of raw data; 16 CPU 16GB RAM 14 SAS 15K RPM RAID-0 512MB Cache
![Page 28: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/28.jpg)
28
2010 Calpont Corporation – Confidential & Proprietary
Percona’s Test of Column Databases
610 GB of raw data; 8 Core Machinehttp://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/
![Page 29: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/29.jpg)
29
2010 Calpont Corporation – Confidential & Proprietary
Calpont Solutions
Calpont Analytic Database Server EditionsCalpont Analytic Database Solutions
InfiniDB Community Server
Column-OrientedMulti-threaded
Terabyte CapableSingle Server
InfiniDBEnterprise Server
Scale out /Parallel Processing Automatic
Failover
InfiniDBEnterprise Solution
Monitoring
24x7Support
Auto PatchManagement
Alerts & SNMPNotifications
Hot FixBuilds
ConsultativeHelp
![Page 30: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/30.jpg)
30
2010 Calpont Corporation – Confidential & Proprietary
InfiniDB Community & Enterprise Server Comparison
Core Database Server Features InfiniDB
Community
InfiniDB
Enterprise
MySQL front end Yes Yes
Column-oriented Yes Yes
Logical data compression Yes Yes
High-Speed bulk loader w/ no blocking queries while loading Yes Yes
Crash-recovery Yes Yes
Transaction support (ACID compliant) Yes Yes
INSERT/UPDATE/DELETE (DML) support Yes Yes
Multi-threaded engine (queries/writes will use all CPU’s/cores on box) Yes Yes
No indexing necessary Yes Yes
Automatic vertical (column) and logical horizontal partitioning of data Yes Yes
MVCC support – snapshot read (readers don’t block writers) Yes Yes
Alter Table with online add column capability Yes Yes
High concurrency supported Yes Yes
Terabyte database capable Yes Yes
Multi-Node, MPP scale out capable w/ failover No Yes
Support Forums Only Formal Production
Support
![Page 31: Making MySQL Great For Business Intelligence](https://reader035.fdocuments.us/reader035/viewer/2022062617/54c5e0544a79598a118b4616/html5/thumbnails/31.jpg)
31
2010 Calpont Corporation – Confidential & Proprietary
For More Information
• Download InfiniDB Community Edition• Download InfiniDB documentation• Read InfiniDB technical white papers• Read InfiniDB intro articles on MySQL dev zone• Visit InfiniDB online forums• Trial the InfiniDB Enterprise Edition: http://www.calpont.com
www.infinidb.orgwww.calpont.com