Table of contents · Leveraging HBase for performance, scalability, and availability As stated...
Transcript of Table of contents · Leveraging HBase for performance, scalability, and availability As stated...
![Page 1: Table of contents · Leveraging HBase for performance, scalability, and availability As stated previously, Trafodion is able to leverage all of the features and thereby all the advantages](https://reader035.fdocuments.us/reader035/viewer/2022062414/5f76a50d4eebe873f757d517/html5/thumbnails/1.jpg)
Apache Trafodion ENTERPRISE CLASS OPERATIONAL SQL-ON-HADOOP
Table of contents
Introducing Trafodion ................................................................................................................................. 2
Trafodion overview .................................................................................................................................. 2
Targeted Hadoop workload profile ............................................................................................................ 2
Transactional SQL application characteristics and challenges ............................................................. 3
Trafodion innovations built upon Hadoop software stack ...................................................................... 3
Leveraging HBase for performance, scalability, and availability ............................................................ 4
Trafodion innovation – value add improvements over vanilla HBase ................................................... 4
Salting of row keys .................................................................................................................................. 5
Trafodion feature overview ........................................................................................................................ 5
Full-functioned ANSI SQL language support ............................................................................................ 5
Trafodion software architecture overview ................................................................................................ 6
Integrating with native Hive and HBase data stores ............................................................................... 6
Trafodion process overview and SQL execution flow .............................................................................. 7
Trafodion’s optimizer technology .............................................................................................................. 7
Extensible optimizer technology ............................................................................................................ 7
Optimized execution plans based on statistics .................................................................................... 8
Trafodion’s data flow SQL executor technology with optimized DOP .................................................... 8
Trafodion optimizations for transactional SQL workloads ..................................................................... 9
Trafodion innovation - Distributed Transaction Management .............................................................. 10
High availability and data integrity features........................................................................................... 10
Summary of Trafodion benefits ............................................................................................................... 11
Where to go for more information ........................................................................................................... 12
![Page 2: Table of contents · Leveraging HBase for performance, scalability, and availability As stated previously, Trafodion is able to leverage all of the features and thereby all the advantages](https://reader035.fdocuments.us/reader035/viewer/2022062414/5f76a50d4eebe873f757d517/html5/thumbnails/2.jpg)
Technical white paper | Trafodion
2
Trafodion Apache Trafodion (incubating) is an open source initiative to deliver an enterprise
class SQL-on-Hadoop DBMS engine that specifically targets transactional
protected operational workloads. Trafodion represents the combination of Apache
HBase and transactional SQL technologies that have been developed leveraging
more than 20 years of investments into database technology and solutions.
Introducing Trafodion
Trafodion is an open source initiative to develop an enterprise class SQL-on-Hadoop DBMS engine that
specifically targets big data transactional or operational workloads as opposed to analytic workloads.
Transactional SQL encompasses workloads previously described as OLTP (online transaction processing)
workloads which were generated in support of traditional enterprise-level transactional applications (ERP,
CRM, etc.) and enterprise business processes. Additionally, transactions have evolved to include social and
mobile data interactions and observations using a mixture of structured and semi-structured data.
Trafodion overview
• Comprehensive and full-functioned SQL DBMS which allows companies to reuse and leverage existing SQL
skills to improve developer productivity.
• Extends Hadoop® HBase by adding support for ACID (atomic, consistent, isolated and durable) transaction
protection that guarantees data consistency across multiple rows, tables, SQL statements.
• Includes many optimizations for low-latency read and write transactions in support of the fast response
time requirements of the transactional SQL workloads.
• Hosted applications can seamlessly access and join data from Trafodion, native HBase, and Hive tables
without expensive replication or data movement overhead.
• Provides interoperability with new or existing applications and 3rd party tools via support for standard ODBC
and JDBC access.
• Designed to seamlessly fit within the existing IT infrastructure with no vendor lock-in by remaining neutral to
the underlying Linux and Hadoop distributions.
Targeted Hadoop workload profile
Hadoop workloads can be broadly categorized into 4 different workload types as shown in Figure 1 i.e.
Operational, Interactive, Non-Interactive, and Batch. These categories vary greatly in terms of their response
time expectations as well as the amount of data that is typically processed. The rightmost 3 categories are
where the marketplace (vendors and customers) have predominantly focused their attention and therefore
these are the most mature in nature in terms of development efforts and solution offerings. For the most part
these categories represent efforts centered around “analytics” and business intelligence processing on “big
data” problems. These workloads are well positioned to leverage Hadoop strengths and capabilities, map-
reduce in particular.
In contrast, the leftmost workload defined as “Operational” is an emerging Hadoop market category and
therefore the least mature in nature. In part, this is a direct result of Hadoop being perceived as having a
number of weaknesses (or gaps) in terms of addressing the requirements for transactional SQL workloads.
Traditionally these workloads have been relegated to the domain of relational databases but there is growing
interest and pressure to embrace these workloads in Hadoop due to Hadoop’ s perceived benefits of
significantly reduced costs, reduced vendor lock-in, and its ability to seamlessly scale to larger workloads and
data. This is exactly the workload that Trafodion is targeting. Let’s next look at the characteristics and
requirements of this workload to better understand Hadoop’ s gaps and weaknesses and to better understand
how Trafodion will address these.
![Page 3: Table of contents · Leveraging HBase for performance, scalability, and availability As stated previously, Trafodion is able to leverage all of the features and thereby all the advantages](https://reader035.fdocuments.us/reader035/viewer/2022062414/5f76a50d4eebe873f757d517/html5/thumbnails/3.jpg)
Technical white paper | Trafodion
3
Figure 1. Hadoop Workload Profiles
Transactional SQL application characteristics and challenges
Transactional protected operational workloads are typically deemed mission critical in nature because they
help companies make money, touch their customers or prospects, or help them run and operate their
business. Typically they have very stringent requirements in terms of response times (sub-second)
expectations, transactional data integrity, number of users, concurrency, availability, and data volumes. With
the advent of the “growing internet of things”, the number and types of access devices has driven tremendous
transaction and data growth and also changes in the type of data that needs to be captured and utilized as
part of these transactions. These next generation operational applications often require multi-structured data
types which implies that operational data is evolving rapidly to include a variety of data formats and types of
data, for example transactional structured data combined with visual images.
Combined, these requirements can expose Hadoop limitations in terms of transaction support, bulletproof
data integrity, real time performance, operational query optimization, and managing workloads comprised of a
complex mix of concurrently executing transactions all with varying priorities. Trafodion addresses each of
these limitations and as a result provides a differentiated DBMS capable of hosting these applications and
their data.
Trafodion innovations built upon Hadoop software stack
Trafodion is designed to build upon and leverage Apache Hadoop and HBase core modules. Operational
applications using Trafodion transparently gain Hadoop’s advantages of affordable performance, scalability,
elasticity, availability, etc. Figure 2 depicts a subset of the Hadoop software stack and those items colored in
orange are specifically leveraged by Trafodion, namely HBase, HDFS, and Zookeeper. To this stack, Trafodion
adds (items colored in green) ODBC/JDBC drivers, the Trafodion database software, and a new HBase
distributed transaction management (DTM) subsystem for distributed transaction protection across multiple
HBase regions.
Trafodion interfaces to Hadoop services using their standard APIs. By maintaining API compatibility, Trafodion
becomes Hadoop distribution neutral thereby eliminating vendor lock-in by offering customers a choice of
distributions to choose from.
Trafodion is initially targeted to deliver innovation on top of Hadoop in these key areas:
• A full-featured ANSI SQL implementation whose database services are accessible via a standard
ODBC/JDBC connection
• Provides a SQL relational schema abstraction which makes Trafodion look and feel like any other relational
database
• Distributed ACID transaction protection
• Performant response times for transactions comprised of both reads and writes
• Parallel optimizations for both transactional and operational reporting workloads
![Page 4: Table of contents · Leveraging HBase for performance, scalability, and availability As stated previously, Trafodion is able to leverage all of the features and thereby all the advantages](https://reader035.fdocuments.us/reader035/viewer/2022062414/5f76a50d4eebe873f757d517/html5/thumbnails/4.jpg)
Technical white paper | Trafodion
4
Figure 2. Trafodion and Hadoop Ecosystem
Leveraging HBase for performance, scalability, and availability
As stated previously, Trafodion is able to leverage all of the features and thereby all the advantages attributed
to HBase including parallel performance, virtually unlimited scalability, elasticity, and availability/disaster
recovery protection.
These features are key to supporting operational workloads in production. For example:
• Fine grained load balancing, scalability, and parallel performance is provided via standard HBase services
such as autosharding Trafodion table data across multiple regions and region servers.
• Data availability and recovery in the event a server or disk fails or is decommissioned is provided by
standard Hadoop and HBase services such a replication and snapshots.
Additionally Trafodion is able to transparently leverage Hadoop distribution (e.g. Cloudera, Hortonworks)
specific features and capabilities since it accesses these distribution services via native HBase API’s.
Powerful features such as compression or encryption can be supplied “under the covers” for Trafodion defined
tables as a result. Next let’s look at how Trafodion brings innovation and value add to vanilla HBase.
Trafodion innovation – value add improvements over vanilla HBase
Although Trafodion stores its database objects in HBase/HDFS storage structures, it differs and brings value-
add over vanilla HBase in a multitude of ways as described below:
• Trafodion provides a relational schema abstraction on top of HBase which allows customers to leverage
known and well tested relational design methodologies and SQL programming skills.
• From a physical layout perspective, Trafodion uses standard HBase storage mechanisms (column family
store using key-value pairs) to store and access objects. Trafodion currently stores all columns in a single
column family to improve access efficiency and speed for operational data. Additionally Trafodion
incorporates a column name encoding mechanism to save space on disk and to reduce messaging
overhead for the purposes of improving SQL performance.
• Unlike vanilla HBase that treats stored data as an uninterpreted array of bytes, Trafodion defined columns
are assigned specific data types that are enforced by Trafodion when inserting or updating its data
contents. This not only greatly improves data quality/integrity, it also eliminates the need to develop
application logic to parse and interpret the data contents.
• Vanilla HBase provides ACID transaction protection only at the row level. Trafodion extends ACID protection
to application defined transactions that can span multiple SQL statements, multiple tables, and rows. This
greatly improves database integrity by protecting the database against partially completed transactions i.e.
ensuring that either the whole transaction is completely materialized in the database or none of it.
• HBase’s native API is at a very low level and is not a commonly used programming API. In contrast,
Trafodion’s API is ANSI SQL which is a familiar and well known programming interface and allows
companies to leverage existing SQL knowledge and skills.
![Page 5: Table of contents · Leveraging HBase for performance, scalability, and availability As stated previously, Trafodion is able to leverage all of the features and thereby all the advantages](https://reader035.fdocuments.us/reader035/viewer/2022062414/5f76a50d4eebe873f757d517/html5/thumbnails/5.jpg)
Technical white paper | Trafodion
5
• Unlike HBase’s key structure that is comprised of a single uninterpreted array of bytes, Trafodion supports
the common relational practice of allowing the primary key to be a composite key comprised of multiple
columns.
• Finally unlike vanilla HBase, Trafodion supports the creation of secondary indexes that can be used to speed
transaction performance when accessing row data by a column value that is not the row key.
Salting of row keys
One known problematic area for HBase is supporting transactional workloads where data is inserted into a
table in row key order. When this happens, all of the I/O gets concentrated to a single HBase region which in
turn creates a server and disk hotspot and performance bottleneck. To alleviate this problem, Trafodion
provides an innovative feature called “salting the row key”.
To enable this feature the DBA specifies the number of partitions (i.e. regions) the table is to be split over
when creating the table e.g. “SALT USING 4 PARTITIONS”. Trafodion creates the table pre-split with one region
per salt value. An internal hash value column, “_SALT_”, is added as a prefix to the row key. Salting is handled
automatically by Trafodion and is transparent to application written SQL statements. As data is inserted into
the table, Trafodion computes the salt value and directs the insert to the appropriate region. Likewise,
Trafodion calculates the salt value when data is retrieved from the table and automatically generates
predicates where feasible. MDAM technology (which is described in more detail in the section entitled
“Trafodion optimizations for transactional SQL workloads”) makes this process especially efficient. This is a
very lightweight operation with little overhead or impact to direct key access operations.
The benefits of salting are that you get more even data distributions across regions and improved
performance via hotspot elimination.
In summary, Trafodion incorporates a number of enhancements over vanilla HBase for the purposes of
improving transaction performance, data integrity, and DBA/developer productivity (i.e. by reducing application
complexity through the use of standard and well known relational practices and APIs).
Trafodion feature overview
Let’s now look at a high level overview of the Trafodion features. A more detailed drill down of each of these
features is provided in the sections below.
Trafodion includes:
• An enterprise-class SQL DBMS that provides all of the features you would expect from one of the merchant
relational database products that are on the market. The difference is that Trafodion leverages Hadoop
services i.e. HBase/HDFS for data storage.
• Full-functioned ANSI SQL language support including data definition, data manipulation, transaction control,
and database utilities.
• Linux and Windows ODBC/JDBC drivers.
• Distributed transaction management protection.
• Many SQL optimizations designed to improve operational workload performance.
All while retaining and extending expected Hadoop benefits! Now let’s dive into more details on these features.
Full-functioned ANSI SQL language support
Unlike most (if not all) NOSQL and other SQL-on-Hadoop products, Trafodion provides comprehensive ANSI
SQL language support including full-functioned data definition (DDL), data manipulation (DML), transaction
control (TCL) and database utility support.
• Unlike vanilla HBase, Trafodion provides support for creating and managing traditional relational database
objects including tables, views, secondary indexes, and constraints. Columns (table attributes) are assigned
trafodion enforced data types including numeric, character, varchar, date, time, interval, etc.
Internationalization (I18N) support is provided via Unicode encoding including UTF-8, UCS2, and ISO 8859-1
for both user data as well as the database metadata. Comparisons and data manipulation between differing
data encodings is transparently handled via implicit casting and translation support.
![Page 6: Table of contents · Leveraging HBase for performance, scalability, and availability As stated previously, Trafodion is able to leverage all of the features and thereby all the advantages](https://reader035.fdocuments.us/reader035/viewer/2022062414/5f76a50d4eebe873f757d517/html5/thumbnails/6.jpg)
Technical white paper | Trafodion
6
• Trafodion provides comprehensive and standard SQL data manipulation support including SELECT, INSERT,
UPDATE, DELETE, and UPSERT/MERGE syntax with language options including join variants, unions, where
predicates, aggregations (group by and having), sort ordering, sampling, correlated and nested sub-queries,
cursors, and many SQL functions.
• Utilities are provided for updating table statistics used by the optimizer for costing (i.e.
selectivity/cardinality estimates) plan alternatives, for displaying the chosen SQL execution plan, plan
shaping, and a command line utility for interfacing with the database engine.
• Explicit control statements are provided to allow applications to define transaction boundaries and to abort
transactions when warranted.
• Trafodion will support ANSI’s grant/revoke semantics to define user privileges in terms of managing and
accessing the database objects.
Trafodion software architecture overview
The Trafodion software architecture consists of 3 distinct layers: the client layer; the SQL database services
layer; and the storage engine layer (see Figure 3).
Figure 3. Trafodion's 3-layer software architecture
The first layer is the Client Services layer
where the operational application resides.
The operational application can be either
customer written or enabled via a 3rd party
ISV tool/application. Access to the
Trafodion database services layer is
completed via a standard ODBC/JDBC
interface using a Trafodion supplied
Windows or Linux client driver. Both type
2 and type 4 JDBC drivers are supported
and the choice is dependent on the
application requirements for response
times, number of connections, security,
and other factors.
The second layer is the SQL layer which
consists of the all the Trafodion database services. This layer encapsulates all of the services required for
managing Trafodion database objects as well as efficiently executing submitted SQL database requests.
Services include connection management, SQL statement compilation and optimized execution plan creation,
SQL execution (both parallel and non parallel) against Trafodion database objects, transaction management,
and workload management. Trafodion provides transparent parallel SQL execution as warranted thereby
eliminating the need for complex map-reduce programming development.
The third layer is the Storage Engine layer which consists of standard Hadoop services that are leveraged by
Trafodion including HBase, HDFS, and Zookeeper. Trafodion database objects are stored into native Hadoop
(HBase/HDFS) database structures. Trafodion handles the mapping of SQL requests into native HBase calls
transparently on behalf of the operational application. Trafodion provides a relational schema abstraction on
top of HBase. In this way traditional relational database objects (tables, views, secondary indexes) are
supported using familiar DDL/DML semantics including object naming, column definition and data types
support, etc.
Integrating with native Hive and HBase data stores
One of the more powerful capabilities of Trafodion is its extensibility to also support and access data stored in
native Hive or HBase tables (non-Trafodion tables) using their native storage engines and data formats. The
benefits that can be realized include:
• Ability to run queries against native HBase or Hive tables without needing to copy them into a Trafodion
table structure
![Page 7: Table of contents · Leveraging HBase for performance, scalability, and availability As stated previously, Trafodion is able to leverage all of the features and thereby all the advantages](https://reader035.fdocuments.us/reader035/viewer/2022062414/5f76a50d4eebe873f757d517/html5/thumbnails/7.jpg)
Technical white paper | Trafodion
7
• Optimized access to HBase and Hive tables without complex map-reduce programming
• Data can be joined across disparate data sources (e.g. Trafodion, Hive, HBase)
• Ability to leverage HBase’s inherent schema flexibility capabilities
Trafodion process overview and SQL execution flow
The Trafodion SQL Layer is comprised of a number of services or processes used for the purposes of handling
connection requests and SQL execution.
• The process flow begins with the operational application or 3rd party client tool. The Windows or Linux client
accesses the Trafodion DBMS via supplied ODBC/JDBC drivers.
• When the client requests to open a connection, Trafodion’s database connection services (DCS) process the
request and assigns the connection to a Trafodion Master SQL process. Trafodion uses Zookeeper to
coordinate and manage the distribution of connection services across the cluster for load-balancing
purposes as well as to ensure that a client can immediately reconnect in the event the assigned Master
process should fail.
• The Master process is responsible for coordinating the execution of SQL statements passed from the client
application.
• The Master calls upon the Compiler and Optimizer process (CMP) to parse, compile, and generate the
optimized execution plan for the SQL statements.
• If the optimized plan calls for parallel execution, the Master divides the
work among Executive Server Processes (ESP) to perform the work in
parallel on behalf of the Master process. The results are passed back
to the Master for consolidation. In some situations where there a
highly complex plan specified (e.g. large n-way joins or aggregations),
multiple layers of ESPs may be requested. If a non-parallel plan is
generated, then the Master calls upon HBase services directly for
optimal performance.
• For distributed transaction protection services the Trafodion DTM
service is called upon to ensure the ACID protection of transactions
across the Hadoop cluster. The DTM calls upon a Trafodion supplied
HBase TRX service that provides transaction resource management
on behalf of HBase.
• Last, but not least, vanilla HBase, HBase-trx, and HDFS services are
called upon by either the Master or ESP processes using standard and native API’s to complete the I/O
requests i.e. retrieving and maintaining the database objects. Where appropriate Trafodion will push down
SQL execution into the HBase layer using Filters or Coprocessors.
Trafodion’s optimizer technology
Optimizer technology represents one of Trafodion’s greatest sources of differentiation versus alternative SQL-
on-Hadoop projects or products. There are two primary areas to call out: the first is the extensible nature of the
optimizer to adapt to change and add improvements and the second is the sophistication and maturity level of
the optimizer to choose the best optimized plan for execution.
Extensible optimizer technology
Trafodion’s optimizer is based on the Cascades optimization framework. Cascades is recognized as one of the
most advanced and extensible optimizer frameworks available. The Cascades framework is a hybrid
optimization engine in that it combines logical and physical operator transformation rules with costing models
to generate the Trafodion Optimizer.
![Page 8: Table of contents · Leveraging HBase for performance, scalability, and availability As stated previously, Trafodion is able to leverage all of the features and thereby all the advantages](https://reader035.fdocuments.us/reader035/viewer/2022062414/5f76a50d4eebe873f757d517/html5/thumbnails/8.jpg)
Technical white paper | Trafodion
8
New rules or new costing models can be easily added or changed to generate an improved optimizer. In this
way, the optimizer can quickly evolve and new operators can be rapidly added or changed to generate
improved SQL optimization plan generation.
Optimized execution plans based on statistics
The second area of differentiation is the sophistication and maturity level of Trafodion’s optimizer technology.
First let’s explain the role of the various elements of the optimizer:
SQL Normalizer – the parsed SQL statement is passed to the normalizer which performs unconditional
transformations, including subquery transformations, of the SQL into a canonical form which renders the SQL
in a form that can be optimized internally.
SQL Analyzer - analyzes alternative join connectivity patterns, table
access paths and methods, matching partition information, etc. to be
used by the optimizer’s rules. The results are passed to the plan
generator for consideration in costing various plan alternatives.
Table Statistics – captured equal-height histogram statistics identifies
data distributions for column data and correlations between columns.
Sampling is used for large tables to reduce the overhead of generating
the statistics.
Cardinality Estimator - cardinalities, data skew, and histograms are
computed for intermediate results throughout the operator tree.
Cost Estimator - estimates Node, I/O, and message cost for each
operator while accounting for data skew at the operator level.
Plan Generator - using cost estimates the optimizer considers alternative plans and chooses the plan which
has the lowest cost. Where feasible the optimizer will elect plans that incorporate SQL pushdown, sort
elimination, and in-memory storage vs. overflow to disk. Also it determines the optimal degree of parallelism
including non-parallel plans.
In summary, the optimizer is designed to choose the execution plan that minimizes the system resource used
and delivers the best response time. It provides optimizations for both operational transactions and reporting
workloads.
Trafodion’s data flow SQL executor technology with optimized DOP
Trafodion’s SQL executor uses a dataflow and scheduler-driven task
model to execute the optimized query plan. Each operator of the plan
is an independent task and data flows between operators through in-
memory queues (up and down) or by interprocess communication.
Queues between tasks allow operators to exchange multiple
requests or result rows at a time. A scheduler coordinates the
execution of tasks and runs whenever it has data in one of its input
queues. Trafodion’s executor model is starkly different from
alternative SQL-on-Hadoop DBMS that store intermediate results on
disk—for example, spool space. In most cases, the Trafodion
executor is able to process queries with data flowing entirely through
memory, providing superior performance and reduced dependency on disk space and I/O bandwidth. The
executor incorporates several types of parallelism, such as:
• Partitioned parallelism which is the ability to work on multiple data partitions in parallel. In a partitioned
parallel plan, multiple operators all work on the same plan. Results are merged by using multiple queues, or
pipelines, enabling the preservation of the sort order of the input partitions. Partitioning is also called “data
parallelism” because the data is the unit that gets partitioned into independently executable fractions.
• Pipelined parallelism is an inherent feature of the executor resulting from its dataflow architecture. This
architecture interconnects all operators by queues with the output of one operator being piped as input to
the next operator, and so on. The result is that each operator works independently of any other operator,
producing its output as soon as its input is available. Pipelining occurs naturally and is engaged in almost
all query plans.
![Page 9: Table of contents · Leveraging HBase for performance, scalability, and availability As stated previously, Trafodion is able to leverage all of the features and thereby all the advantages](https://reader035.fdocuments.us/reader035/viewer/2022062414/5f76a50d4eebe873f757d517/html5/thumbnails/9.jpg)
Technical white paper | Trafodion
9
• Operator parallelism is also an inherent feature of the executor architecture. In operator parallelism, two or
more operators can execute simultaneously, that is, in parallel. Except for certain synchronization
conditions, the operators execute independently.
Trafodion naturally provides parallelism without special processing such as Hadoop map-reduce programming
or coding on the part of the application client. An individual query plan produced by the optimizer can contain
any combination of partitioned, pipelined, or operator parallelism. The degree of parallelism at any plan stage
may vary depending on the optimizer’s heuristics.
Trafodion optimizations for transactional SQL workloads
Trafodion provides many compile and run-time optimizations for varying operational workloads ranging from
singleton row accesses for OLTP like transactions to highly complex SQL statements used for operational
reporting purposes. Figure 4 depicts a number of these optimization features:
• A Type 2 JDBC driver may be used which provides the client direct JNI access to HBase services to
minimize service times
• For many OLTP like transactions, the Master can issue “directed” key access requests to HBase without
needing intermediate ESP processes.
• For transactions including highly complex SQL statements (e.g. n-way joins or aggregations requiring
rebroadcasting or redistribution of data), a parallel plan involving ESPs or multi-layers of ESP’s can be used
to significantly reduce the service time.
Additional optimizations include:
• Masters and ESPs are retained after a connection is dropped and can be reused thereby eliminating the
startup and shutdown overhead.
• Compiled SQL plans are cached thereby eliminating unnecessary recompilation overhead.
• SQL pushdown using standard HBase services such as filters (e.g. start-stop key predicates) and
coprocessors (e.g. count aggregates).
• Secondary index support.
• A patented access method known as the
Multidimensional Access Method (MDAM)
to accelerate row retrieval performance
using “dimensional” predicates. For example
assume you have a table where the row-key
is Week, Item, and Store but the application
supplies only Item and Store predicates.
Without MDAM, this would mean that the
the DBMS must perform a full table scan or
a secondary index on item and store would
have to be created. In contrast, MDAM
utilizes the inherent HBase clustering row-
keys to issue a series of probes and range
jumps through the table reading only the
minimal set of rows required to process the
SQL statement. MDAM usage can be
extended to a broad range of data retrieval
requests (e.g. IN lists on multiple key index
columns, NOT equal (<>) predicates,
multivalued predicates, etc.) thus improving
response times and reducing the need for
additional secondary indexes. It is also used to access tables with a “salted” row key efficiently.
• Rowsets support which is the ability to batch multiple SQL statements in a single request thus reducing the
number of message exchanges between the client and the database engine.
• Availability enhancements including: service persistence (via Zookeeper) and automatic query
resubmission.
Figure 4. Optimized parallel execution
![Page 10: Table of contents · Leveraging HBase for performance, scalability, and availability As stated previously, Trafodion is able to leverage all of the features and thereby all the advantages](https://reader035.fdocuments.us/reader035/viewer/2022062414/5f76a50d4eebe873f757d517/html5/thumbnails/10.jpg)
Technical white paper | Trafodion
10
Figure 5 below summarizes many of the Trafodion optimizations discussed to this point. This is proof that
Trafodion provides optimizations for both operational transaction workloads that typically have very stringent
response time requirements (e.g. sub-second in nature) as well as operational query and reporting workloads
that typically have more relaxed response time requirements (e.g. minutes to hours) and may include SQL
statements that require highly complex SQL operations that are best run in a parallel manner.
Figure 5. Trafodion workload optimizations
Trafodion innovation - Distributed Transaction Management
Vanilla HBase provides only single table, row level ACID protection. Trafodion’s distributed transaction
management (DTM) in combination with the HBase-TRX service extends transaction protection to
transactions spanning multiple SQL statements, multiple tables, or multiple rows of a single table. Additionally
Trafodion DTM provides protection in a distributed cluster configuration across multiple HBase regions using
an inherent 2-phase commit protocol. Transaction protection is automatically propagated across Trafodion
components and processes. Trafodion eliminates the two-phase commit protocol overhead for read-only
transactions and transactions updating only a single row. In the latter case, native HBase ACID protection is
used.
The DTM provides support for implicit (auto-commit) and explicit (BEGIN, COMMIT, ROLLBACK WORK)
transaction control.
Using HBase’s Multi-Version Concurrency Control (MVCC) algorithm, Trafodion allows multiple transactions to
be accessing the same rows concurrently. However, in the case of update, the first transaction to complete
wins and other transactions are notified at commit that the transaction failed due to update conflict.
High availability and data integrity features
Trafodion leverages the inherent availability and data integrity features of HBase and HDFS as shown in the
chart below.
Additionally, Trafodion can leverage any Hadoop distribution provided enterprise-class availability extensions
that may be offered.
On top of the HBase and HDFS offered features, Trafodion provides a number of high availability features
including:
![Page 11: Table of contents · Leveraging HBase for performance, scalability, and availability As stated previously, Trafodion is able to leverage all of the features and thereby all the advantages](https://reader035.fdocuments.us/reader035/viewer/2022062414/5f76a50d4eebe873f757d517/html5/thumbnails/11.jpg)
Technical white paper | Trafodion
11
• Persistent connectivity services that ensure that a client is able to reestablish a connection in the event it’s
DCS service fails
• Automatic query resubmission (AQR) which resubmits a failed SQL statement in the event the statement
fails inflight
Summary of Trafodion benefits
Trafodion delivers on the promise of a full featured and optimized transactional SQL-on-Hadoop DBMS
solution with full transactional data protection. This combination of HBase and an enterprise-class
transactional SQL engine overcomes Hadoop’s weaknesses in terms of supporting operational workloads.
Customers gain the following recognized benefits:
• Ability to leverage their in-house SQL learnings and expertise versus having to learn complex map/reduce
programming.
• Seamless support for existing and new customer written or ISV operational applications drives investment
protection and improved development productivity.
• Workload optimizations provide the foundation for the delivery of next generation real-time transaction
processing applications.
• Guaranteed transactional consistency across multiple SQL statements, tables, and rows.
• Complements exisiting Hadoop investments and benefits - reduced cost, scalability, and elasticity.
• All with open source project sponsorship!
![Page 12: Table of contents · Leveraging HBase for performance, scalability, and availability As stated previously, Trafodion is able to leverage all of the features and thereby all the advantages](https://reader035.fdocuments.us/reader035/viewer/2022062414/5f76a50d4eebe873f757d517/html5/thumbnails/12.jpg)
© Copyright 2015 Esgyn Corporation. August 2015
Where to go for more information
Learn more at http://www.esgyn.com
Email questions to [email protected]