Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf ·...

26
Presented by Nanditha Thinderu

Transcript of Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf ·...

Page 1: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

Presented by

Nanditha Thinderu

Page 2: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

� Enterprise systems are highly distributed and heterogeneous which makes administration a complex task

� Application Performance Management tools developed to retrieve information about failures rates and resource utilization.

� APM platform for monitoring big data with a tight resource budget and fast response time

Page 3: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

� APM is refers to monitoring and managing the enterprise software systems.

� The two approaches are� Black –box approach�API based approach� By capturing every method invocation in an enterprise system, APM tools can generate a vast amount data

Page 4: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

� APM data consists of a metric name, a value and a time stamp.

� In storage system, the queries can be two major types

� Single value lookups to retrieve the most current value

� Small scans for retrieving systems health information

Metric NameMetric NameMetric NameMetric Name valuvaluvaluvalueeee

MinMinMinMin MaxMaxMaxMax TimestaTimestaTimestaTimestampmpmpmp

DurationDurationDurationDuration

Page 5: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

� Yahoo! Cloud servicing Benchmark is designed for evaluation of key values stores using APM properties.

� We define five workloads (R,W,RSW,RW,RSW) as APM data is append only .

� It comprises a data generator, a workload generator as well as drivers for several key-value stores

Page 6: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

� The goal was not only to get a pure performance comparison but also a broad overview of available solutions.

� Data stores used can be classified into categories

� Key-value stores : project Voldemort and Redis� Extensible record stores: HBase and Cassandra

� Scalable relational stores: My SQL Cluster an VoltDB

Page 7: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

�We used Hbase v0.90.4 running on top of Hadoop v0.20.205.0.�Hbase uses HDFS it also requires the installation and configuration of Hadoop�Tables in Hbase can be accessed through API

Page 8: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

�We used the recent 1.0.0.rc2 version and default Random Partitioner that distributes the data across the nodes randomly

�Implemented Cassandra YCSB client which is required to set just one column family to store all fields, each of them corresponding to a column

�It’s a systematic system and employs consistent hashing for distributing the values across the nodes

Page 9: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

•We used 0.90.1 with embedded BerkeleyDB

storage and already

implemented Voldemort

configuration was easy

for most part.

•It is highly scalable

storage system with a simpler design

compared to relational

database

Page 10: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

�We used 2.4.2 version as cluster version was in an unstable state and could not run a complete test.

�The default updated Redis YCBS client to use SharedJedisPool

�For data storage, YCSB uses a hash map as well as sorted set.

Page 11: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

•We used VoltDB v2.1.3

and the default

configuration

•YCSB client driver for

the VoltDB that

connects to all servers

is implemented

Page 12: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

•We used MySQL

v5.5.17 and InnoDB as

the storage engine

•RDBMS YCSB client

which is implemented

and connects to

databases using JDBC

Page 13: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get
Page 14: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

� The workload has the most read intensive with 95% and only

5% writes. We present latencies and throughout using

logarithmic scale

� Redis has highest throughput

� Hbase has highest Read latency

� Cassandra has highest write latency

Page 15: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

� In the second experiment, workload RW is

used which has 50%writes

� VoltDB achieves highest throughput for one

node which is slightly lower compare to

workload R

� In write latency Hbase and MySQL have

important differences compared to Workload

RW

Page 16: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

� Workload is one that is closest to APM use case

� It has 99% write rate

� The throughput results is similar to workload RW

� For the read latency, the apparent change is the high latency of Hbase

� For write latency, Hbase has increased significantly

Page 17: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

� The workload RS has 47% read and scan and 6% write

operations

� The MYSQL has best throughput for a single node

� Cassandra, HBase obtain a linear increase in throughput for

number of nodes

Page 18: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

� This workload has 50% reads of which 25% are scans

� The most of results are similar to RS

Page 19: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

� In this we used 8 nodes of each system

� The results are calculated for workload R

� We observe varying latencies for different key store

values

� The write latencies have similar development for

Cassandra, Voldemort, Redis

Page 20: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

� The most efficient system in storage is Hbase

� REDIS an VoltDB are omitted as do not store

data on disk

� Cassandra stores the data most efficiently

� The disk usage can be reduced by

compression

Page 21: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

� Series of tests conducted on cluster D

� The throughput increases for all systems with

higher ratios

� Project Voldemort has best read latency

� HBase has a low write latency but it is best for

workload RW

Page 22: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get
Page 23: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

� Cassandra: Its achieves highest throughput for maximum

number of nodes and its performance is best for high rates.

� Hbase: Hbase throughput is lowest for one node. But

increases linearly with number of nodes. It has low write

latency, however read latency is much higher than other

systems.

� Project Voldemort: At low the read and write latencies are

similar and are stable.

� MYSQL: It achieved high throughput, however latency

decreases with the number of nodes.

� Redis: It has high throughput which exceeds all other

systems for read intensive. But latencies decreases for

both read and write operations

� VoltDB: The performance is high for single instance but

never achieved throughput increase with more than one

node

Page 24: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get
Page 25: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get

� we optimized each system for our workload and tested it with a number of open connections which was 4 times higher than the number of cores in the host CPUs.

� Higher numbers of connections led to congestion and slowed down the systems considerably while lower numbers did not fully utilize the systems.

� This configuration resulted in an average latency of the request processing that was much higher than in previously published performance measurements.

� Since our use case does not have the strict latency requirements that are common in online applications and similar environments, the latencies in most results are still adequate

Page 26: Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf · generator as well as drivers for several key-value stores . The goal was not only to get