Kickfire June 18

download Kickfire June 18

of 33

Transcript of Kickfire June 18

  • 8/14/2019 Kickfire June 18

    1/33

    2008 Kickfire, Inc. All rights reserved.

    Turbocharging MySQL Reportingand Data Warehousing

  • 8/14/2019 Kickfire June 18

    2/33

    2008 Kickfire, Inc. All rights reserved.

    Agenda

    MySQL intro

    Kickfire background Technology overview

    Demo

    Customer case studies Q&A

  • 8/14/2019 Kickfire June 18

    3/33

    2008 Kickfire, Inc. All rights reserved.

    Agenda

    MySQL intro

    Kickfire background Technology overview

    Demo

    Customer case studies Q&A

  • 8/14/2019 Kickfire June 18

    4/33

    2008 Kickfire, Inc. All rights reserved.

    Growing use for data warehousing and BI MySQL DW deployed at 28% of MySQL customers

    Strong DW ecosystem support

    MySQL Data Warehousing Market

  • 8/14/2019 Kickfire June 18

    5/33

    2008 Kickfire, Inc. All rights reserved.

    MySQL and Kickfire Break Records

    100 GB TPC-H records1

    #1 performance, non clustered #1 price/performance

    300 GB TPC-H records2

    #1 performance, non clustered

    #1 price/performance

    System tested is fully ACID compliant

    1. The Kickfire Database Appliance Series 2300 achieved 49,228 QphH@100G. The price performance is $0.70/QphH@100GB USD.The total Kickfire system price over three years is $34,425 USD.

    2. The Kickfire Database Appliance Series 2400 achieved 54,895 QphH@300G. The price performance is $0.89/QphH@300GB UDS.The total Kickfire system price over three years is $48,790 USD.

    The appliances will be available October 14, 2008. TPC-H is a registered trademark of the TPC council.

  • 8/14/2019 Kickfire June 18

    6/33

    2008 Kickfire, Inc. All rights reserved.

    Agenda

    MySQL intro

    Kickfire background Technology overview

    Demo

    Customer case studies Q&A

  • 8/14/2019 Kickfire June 18

    7/33

  • 8/14/2019 Kickfire June 18

    8/33

    2008 Kickfire, Inc. All rights reserved.

    DW and Reporting Problems We Hear About

    Technical issues Reports and queries take too long to run

    Lots of tuning and maintenance necessary

    Difficulty supporting ad-hoc queries or complex reports

    Key features needed Parallel query

    I/O subsystem performance

    Data warehouse-focused optimizer

    Business impact Difficulty in growing the system with the business

    Too much time, money, and effort spent on maintenance

    Value for business and for customers limited

  • 8/14/2019 Kickfire June 18

    9/33

    2008 Kickfire, Inc. All rights reserved.

    Who is Kickfire?

    Worlds first high-performance appliance for MySQL

    Makes MySQL rock for reporting and queries Affordable, low-power, load-and-go appliance Scalable from GBs to TBs

  • 8/14/2019 Kickfire June 18

    10/33

  • 8/14/2019 Kickfire June 18

    11/33

    2008 Kickfire, Inc. All rights reserved.

    Quick Poll #2

    Do you have a dedicated MySQL reporting and/or data

    warehousing server today?

    Yes

    No

    No, but plan to

  • 8/14/2019 Kickfire June 18

    12/33

    2008 Kickfire, Inc. All rights reserved.

    The Technology (1/2)

    SQL: MOPS

    =

    1:10

    Query Parallelization

    done on SINGLE SQCdrastically reducing

    h/w footprint

    SQC uses column-store, compression,

    intell indexes and pre-fetches from storage

    SQC operates directly

    out of Memory, NOT

    limited by Registers

    Parallel Query

    =

    Massive h/w buildout

    von Neumann bottleneck

    =

    CPUs idle

    SQL: Assembly Code

    =

    1:1,000,000,000,000

    I/O bottleneck

    =

    Massive storage

  • 8/14/2019 Kickfire June 18

    13/33

    2008 Kickfire, Inc. All rights reserved.

    The Technology (2/2)

    Dataflow Pipelined Parallel Processing (SQL Chip)

    Storage

    Array

    Unified

    Database

    Memory

    Dataflow

    Engine

    (SQL Chip)

    Unified

    Database

    Memory

    Host

    Memory

    x86

    x86

  • 8/14/2019 Kickfire June 18

    14/33

    2008 Kickfire, Inc. All rights reserved.

    1. SQL execution

    2. Memory management

    3. Loader acceleration

    1. Connectivity

    2. Security

    3. Administration

    1. Optimizer

    2. Column store & cache

    3. Transactional engine

    KDBMySQL

    Whats in the Box

  • 8/14/2019 Kickfire June 18

    15/33

    2008 Kickfire, Inc. All rights reserved.

    Deploying the Box

    Setup Pre-configured appliance to deliver high performance on broad set

    of schemas and queries

    Data movement and loading Utility to migrate from existing MySQL data sources

    Fast Loader can load 100GB/hr; Incremental Loader for changes

    Works with ETL tools certified for MySQL

    Reporting and queries Supports full MySQL SQL syntax Works with business intelligence tools certified for MySQL

  • 8/14/2019 Kickfire June 18

    16/33

    2008 Kickfire, Inc. All rights reserved.

    Product Line Overview

    2000 Series

    Size: 2 RU Capacity: up to 0.8TB

    Power use: ~600 W

    Price: starting at ~$20k

    3000 Series Size: 3 RU

    Capacity: up to 3TB

    Power use: ~700 W

    Price: starting at ~$65k

  • 8/14/2019 Kickfire June 18

    17/33

    2008 Kickfire, Inc. All rights reserved.

    Quick Poll #3

    If you use MySQL for reporting or data warehousing, how

    much data are you reporting on today?

    1GB - 49GB

    50GB 99GB

    100GB 999GB

    1TB 5TB

    > 5TB

  • 8/14/2019 Kickfire June 18

    18/33

    2008 Kickfire, Inc. All rights reserved.

    Demo

  • 8/14/2019 Kickfire June 18

    19/33

    2008 Kickfire, Inc. All rights reserved.

    Agenda

    MySQL intro

    Kickfire background Technology overview

    Demo

    Customer case studies

    Q&A

  • 8/14/2019 Kickfire June 18

    20/33

    2008 Kickfire, Inc. All rights reserved.

    Customer Case Study #1: Overview

    Customer Public company with large online communities

    Problems Canned reports are too slow and report development too slow

    because need too many aggregate tables

    Cant create a new revenue service based on ad hoc queries

    Customer environment

    1 TB MySQL DW with multiple customer DB of clickstream data

    Two dual-CPU servers with 16GB RAM

    Kickfire system

    Kickfire Database Appliance 2200 with 32GB RAM

  • 8/14/2019 Kickfire June 18

    21/33

    2008 Kickfire, Inc. All rights reserved.

    Customer Case Study #1: Results

    Test platform

    One 45GB customer DB 50 million rows in fact table

    Sample query below

    Results

    Average 35X improvement

  • 8/14/2019 Kickfire June 18

    22/33

    2008 Kickfire, Inc. All rights reserved.

    Customer Case Study #2: Overview

    Customer Successful high-tech startup in network management domain

    Problems Cant scale to larger data volumes of network data

    Revenue tied to scaling of network monitoring service

    Customer environment Up to 1 TB DW of customer network data

    Kickfire system Kickfire Database Appliance 2300 with 64GB RAM

  • 8/14/2019 Kickfire June 18

    23/33

    2008 Kickfire, Inc. All rights reserved.

    Customer Case Study #2: Results

    Test platform

    450M rows in each fact table Over 125 tables

    450GB total

    Sample query below

    Results

    Average 600X improvementover current system results

    In lab were able to improve 60Xwith partitions etc. but requiredmajor app re-architecture

  • 8/14/2019 Kickfire June 18

    24/33

    2008 Kickfire, Inc. All rights reserved.

    Q&A

    Additional information

    Product is currently in beta; program currently oversubscribed If interested in trials, please email [email protected]

    For more info, download our white paper and data sheet

    Stay in touch Read our blog www.kickfire.com/blog

    Send us an email [email protected]

    Call us at 408.450.5400

  • 8/14/2019 Kickfire June 18

    25/33

    2008 Kickfire, Inc. All rights reserved.

    Appendix

  • 8/14/2019 Kickfire June 18

    26/33

    2008 Kickfire, Inc. All rights reserved.

    TPC-H Queries

    Multiple aggregations over large number of rows (Q1) Multiple tables in the query (Q2, Q5, Q7, Q8, Q9) Like predicate [over a large number of rows] and string functions

    (Q2, Q9, Q13, Q16, Q22) Order by limit (top 10-100 rows) (Q2, Q3, Q10, Q18, Q21) Correlated subqueries (Q4, Q17, Q20, Q21, Q22) Case expression (will highlight how well you can handle conditional

    expressions) (Q8, Q12, Q14) Large groupby key (Q10, Q18) Left Outer join ( Q13) Queries over views (Q15) Complex filters involving AND-OR (Q19)

    Randomized data and query parameters

  • 8/14/2019 Kickfire June 18

    27/33

    2008 Kickfire, Inc. All rights reserved.

    100GB Performance: Kickfire vs. MyISAM

    Ran all 22 queries tocompletion

    MyISAM timed outafter three hours on 12of the queries

    Over 1000x faster onaverage

    Query 22

    Query 21

    Query 20

    Query 19

    Query 18Query 17

    Query 16

    Query 15

    Query 14

    Query 13

    Query 12Query 11

    Query 10

    Query 9

    Query 8

    Query 7

    Query 6

    Query 5

    Query 4

    Query 3

    Query 2

    Query 1

    3.7

    23.6

    2.1

    2.2

    260.6

    5.9

    4.4

    1.5

    36.8

    52.7

    4

    9.9

    0.8

    5.1

    2.3

    4.6

    3.6

    7.5

    2.2

    34.1

    Kickfire (secs)

    100.72

    10800

    1450.42

    4682.19

    108001895.55

    693.56

    10800

    2345.68

    10800

    108002726.2

    10800

    10800

    4122.62

    4328.22

    10800

    10800

    10800

    10800

    10800

    3737.54

    MyISAM (secs)

    27

    458

    691

    2128

    4153159

    118

    2455

    1564

    293

    21601010

    2700

    1091

    5153

    849

    4696

    2348

    3000

    1440

    4909

    110

    Speedup

  • 8/14/2019 Kickfire June 18

    28/33

    2008 Kickfire, Inc. All rights reserved.

    p0(a)p0(a)

    p1(a,b)

    p1(a,b)

    T1.a

    Chip Exploits fine-grained Parallelism

    Pipelined parallelism

    Data-partitioned parallelism

    Independent-operator parallelism

    Inter-query parallelism

    f(.)f(.)

    T1.a

    T2.bT2.bp0(a)p0(a)

    T1.a

    p1(a,b)

    p1(a,b)

    T2.bT2.b

    p2(c,d)

    p2(c,d)

    T2.dT2.c

  • 8/14/2019 Kickfire June 18

    29/33

    2008 Kickfire, Inc. All rights reserved.

    KDB: Transactional Storage Engine

    Full ACID compliant

    KDB supports serializable isolation level

    TPC-H benchmark requires full ACID

    Automatic crash recovery using the write-ahead logs

    Automatic check-pointing for fast recovery from crashes

    Automatic deadlock detection and rollback

    Support for streaming DML for operational BI

    Future release to allow concurrent updates with long running queries

    Similar to MVCC but without the expensive overhead

  • 8/14/2019 Kickfire June 18

    30/33

    2008 Kickfire, Inc. All rights reserved.

    Loading Utilities

    Fast loader for initial load

    Efficiently loads data using bulk processing - 100GB in 1hr

    Leverages the SQL Chip to execute the bulk operations

    Incremental Loader for periodic loading

    Bulk loading of Inserts, Deletes, and Updates SQL chip to offload

    Row store like efficiency on column store data

    Data Migration tools utilize Fast and Incremental Loaders

    Schema migration + data type mapping + index migration

    Foreign key migration or declaration

  • 8/14/2019 Kickfire June 18

    31/33

    2008 Kickfire, Inc. All rights reserved.

    KDB supported Indexing

    Basic indexing is automatically generated

    All primary keys have a B+ tree index automatically

    All Foreign keys have a FK-index automatically

    The following indices can be created/deleted by user

    Any Date/Time column can have a Date-range index

    Any string column can have Word index B+ tree index for any data type

    Any automatically created indices can be deleted

  • 8/14/2019 Kickfire June 18

    32/33

    2008 Kickfire, Inc. All rights reserved.

    Leveraging MySQL SE Architecture

    KDBStorageEngine

    Parsedquery isinterceptedby KDB &

    executed inChip

    Answer isreturnedthrough

    MySQL

    MySQLOptimizer &execution is

    by-passed

    Source: MySQL 5.0 Pluggable Storage Engine Architecture document

  • 8/14/2019 Kickfire June 18

    33/33

    2008 Kickfire, Inc. All rights reserved.

    Typical Features in DW Queries

    Many columns; Complex expressions; Aggregations;

    In-line-views (Subselect)

    Complex selection conditions: Large IN lists; Case/If; OR conditions;

    Functions: Extract, Substring, Like

    Complex Join conditions: FK joins; EQ joins

    Exists, NOT Exists, IN, NOT IN subselect (Correlated)

    Multi-column, Expressions

    Condition on Aggregations

    Orderby on multi-columns and Aggregations, with limit

    From

    Select

    Having

    Where

    Group By

    Order By

    Proj. Optimization

    Primary key condition

    OrderBy limted values

    Functions/HW limits