Introduction EsgynDB, based on Apache Trafodion, by Rao Kakarlamudi, Esgyn

14
Apache Trafodion™ (incubating) Push Hadoop Beyond Analytics trafodion.apache.org Speaker: Rao Kakarlamudi ([email protected])

Transcript of Introduction EsgynDB, based on Apache Trafodion, by Rao Kakarlamudi, Esgyn

Page 1: Introduction EsgynDB, based on Apache Trafodion, by Rao Kakarlamudi, Esgyn

Apache Trafodion™ (incubating)Push Hadoop Beyond Analyt icst rafodion.apache.org

Speaker : Rao Kakar lamudi ( rao.kakar [email protected])

Page 2: Introduction EsgynDB, based on Apache Trafodion, by Rao Kakarlamudi, Esgyn

© 2015 Esgyn Corporation

Use CaseInternet of Things

Business Needs

◦ Enormous vehicle fleet◦ Real-time capture, monitoring, and analysis at scale

with high concurrency

Problem

◦ Optimize usage◦ Understand scheduling◦ Understand maintenance

◦ Real-time customer information

Challenge

◦ 559 million vehicle records per day◦ Sub-second response time◦ Sustained performance at >100 concurrent users

Solution

◦ Trafodion on standard x86 Linux cluster◦ Data load, query, and extract in parallel◦ Users can query both current and historical data

Page 3: Introduction EsgynDB, based on Apache Trafodion, by Rao Kakarlamudi, Esgyn

© 2015 Esgyn Corporation

Use CaseFinance

Business Needs◦ Customers need to query their recent balances and

their transactions from months or even years ago. ◦ They also want more information than can easily be

stored in a separated architecture story (vendor name, ATM location, transfer location, etc.)

Problem

◦ Retail business get transactions at a high overall volume from a wide variety of sources, like credit card transactions, tellers, electronic transfers, and ATMs.

◦ Customers make queries about individual transactions in the last day, month, and year, but the storage and query performance required to give full information about all transactions is beyond the capacity of traditional architectures

Challenge◦ Query data from the current day’s transactions with high

reliability and low latency, without impacting the performance of the primary transactional system

Solution

◦ EsgynDB initially provides an ODS for the mission-critical transaction system, offloading near-real-time queries there to allow the primary transactional system to meet its SLAs.

◦ The same data lake also includes the historical data, allowing for seamless connection of data over time, with no extra data replication. And with EsgynDB’ s ability to integrate structured, semi-structured, and unstructured data, customers and employees have access to more information about each transaction.

Page 4: Introduction EsgynDB, based on Apache Trafodion, by Rao Kakarlamudi, Esgyn

© 2015 Esgyn Corporation

Use CaseTelecommunications

Business Needs

◦ 24x7 ingest and analysis of voice, SMS, and data file business transactions

◦ Build new solutions for 100s of millions of users

Problem

◦ Up-to-date information within few minutes◦ Support and upsell◦ Trust your data

Challenge

◦ Load GB of data in minutes on an ongoing basis◦ Comprehensive queries against historical and recent

data◦ Data quality and rapid analysis to engage customer

Solution

◦ Trafodion on standard x86 Linux cluster◦ Ingest raw data at arrival, rate and load into Trafodion◦ Transactional inquiries◦ Detail reports

Page 5: Introduction EsgynDB, based on Apache Trafodion, by Rao Kakarlamudi, Esgyn

© 2015 Esgyn Corporation

Use CaseE-Commerce

Business Need

◦ Ad-driven revenue model◦ Need near-real time decisions to optimize ad

placement

Problem

◦ Log files Hive Traditional Database◦ Too slow to meet business requirements

Challenge

◦ 2 TB of data daily, 42 GB/hour peak◦ Misses critical data + lots of redundant data◦ High-volume transactions and concurrency◦ Produce account summaries in hours

Solution

◦ Query Hive data directly; store in Trafodion◦ Same data lake no ETL needed◦ Near-real time data access using SQL

Page 6: Introduction EsgynDB, based on Apache Trafodion, by Rao Kakarlamudi, Esgyn

(C) Copyright 2015 Esgyn Corporation Esgyn Confidential

Trafodion Brings:• Open Source Apache Trafodion (Incubating) project and license

• Hadoop HBase scalability up to petabytes

• Full ANSI SQL support

• ACID Transactions (Atomic, Consistent, Isolated, Durable) across rows, tables, and servers

• Cost effective scale out

• Enterprise ready active-active replication across multiple data centers

• ODBC / JDBC / ADO.Net / Hibernate support

• Proven and hardened database engine with 20+ years of Tandem / Compaq / HP innovation

• Data federation (e.g. Kafka) and schema flexibility

• Optimized for real-time transaction processing, operational reporting, and operational data store (ODS) workloads that demand sub-second response times with high concurrency

Page 7: Introduction EsgynDB, based on Apache Trafodion, by Rao Kakarlamudi, Esgyn

Trafodion Stack Overview – Running Queries

HBase

Native HBase Tables KVS, Columnar

Hive

Native Hive TablesMulti-Structured

ESP

CMP Master

HBase

ESPDTM

Storage Engine

JDBC ODBC

Compiler and Optimizer

SQL ParallelismDistributed Transaction

Management

HDFS

. . . .

User and ISV Operational Applications

Database Connectivity

Data Store Integration

Driver

Relational Schema

Trafodion Tables

Client

SQL

© Copyright 2015 Esgyn Corporation Esgyn Confidential

Page 8: Introduction EsgynDB, based on Apache Trafodion, by Rao Kakarlamudi, Esgyn

Why Apache Trafodion?Ingredients for a world-class relational database

1. Time, Money, and Talent◦ 20+ years of investment◦ $300+ million invested◦ Database developers grew up on

◦ Shared nothing Massively Parallel Architecture◦ With a single system image across clusters

◦ 300+ years of database experience◦ On building OLTP and BI engines

ANSI and non-ANSI functionality supported, performance, scalability, concurrency, throughput, stability, high availability, transactional, UDF, SPJ, OLAP, etc.

Page 9: Introduction EsgynDB, based on Apache Trafodion, by Rao Kakarlamudi, Esgyn

Why Apache Trafodion?Ingredients for a World-class Relational Database

2. World Class Optimizer◦ Rule-driven and cost-based optimizer◦ Based on Cascades & Large Scope Rules

◦ Reduces search space◦ Recognizes patterns such as star joins

◦ Considers multiple join strategies◦ Nested and nested cache for operational◦ Merge and hybrid hash for large complex queries

◦ Optimizes inner, outer, & full outer joins◦ Considers serial & parallel plans based on

cardinality◦ Uses equal-height histograms to indicate skew◦ Leverages skew buster to eliminate skew◦ Un-nests subqueries◦ Converts correlated subqueries to joins

◦ Pushes down predicates to lowest operation◦ Filters e.g. row selection (start-stop key)◦ Coprocessors e.g. pre-aggregation

◦ Leverages Multi-Dimensional Access (MDAM)◦ To avoid full scans when no predicates on leading key

columns specified◦ Considers sort avoidance strategies

◦ Uses hash group by to avoid sorts◦ Leverages key order◦ Does in-memory sort when possible

◦ Uses sophisticated plan caching techniques◦ And a lot more …

Built & tuned to handle complexities & differences inherent in varied enterprise class workloads

© Copyright 2015 Esgyn Corporation Esgyn Confidential

Page 10: Introduction EsgynDB, based on Apache Trafodion, by Rao Kakarlamudi, Esgyn

Node 1 Node 2 Node n

Client Application

HDFS

HBase HBase HBaseFilters

HDFS HDFS HDFS HDFS

Ethernet

Coprocessors

3. World Class Parallel Data Flow Execution Engine

◦ Data Flow pipeline parallel architecture◦ Intermediate results materialized only for blocking

operations like sorts◦ Data overflow to disk only for large hash joins

◦ Adaptive Segmentation to use only needed resources◦ Co-located joins & repartitioning when necessary◦ Uses Inner and outer child broadcasts ◦ Parallel secondary index maintenance

Why Apache Trafodion?Ingredients for a world-class relational database

Master

ESP ESP ESP ESP ESP

ESP ESP ESP ESP ESP

Master

Multi-fragment

Supports salting of data across region servers

Page 11: Introduction EsgynDB, based on Apache Trafodion, by Rao Kakarlamudi, Esgyn

Why Apache Trafodion?Ingredients for a World-class Relational Database

4. World Class Distributed Transaction Management system

© Copyright 2015 Esgyn Corporation Esgyn Confidential

Page 12: Introduction EsgynDB, based on Apache Trafodion, by Rao Kakarlamudi, Esgyn

PerformanceYCSB and Order Entry scale linearly!

Transactional Order Entry

Thro

ughp

ut

YCSB

Selects Updates

50/50

Thro

ughp

ut

Thro

ughp

ut

Thro

ughp

ut

Page 13: Introduction EsgynDB, based on Apache Trafodion, by Rao Kakarlamudi, Esgyn

Try and Contribute Apache Trafodion Download:

◦ trafodion.apache.org

Try Trafodion on AWS:◦ https://aws.amazon.com/marketplace/pp/B018RBMFG0

Documentation:◦ trafodion.apache.org

Become a contributor – add a new feature, fix a bug, translate documentation, more◦ Discuss your changes on the dev mailing list◦ Create a JIRA issue◦ Setup your development environment◦ Prepare a patch containing your changes◦ Submit the patch

Page 14: Introduction EsgynDB, based on Apache Trafodion, by Rao Kakarlamudi, Esgyn

Thank You

Rao Kakarlamudi ([email protected])