Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf ·...

Post on 14-Jun-2020

11 views 0 download

Transcript of Presented by Nanditha Thinderucis.csuohio.edu/~sschung/cis611/Nandithacis611termpaper.pdf ·...

Presented by

Nanditha Thinderu

� Enterprise systems are highly distributed and heterogeneous which makes administration a complex task

� Application Performance Management tools developed to retrieve information about failures rates and resource utilization.

� APM platform for monitoring big data with a tight resource budget and fast response time

� APM is refers to monitoring and managing the enterprise software systems.

� The two approaches are� Black –box approach�API based approach� By capturing every method invocation in an enterprise system, APM tools can generate a vast amount data

� APM data consists of a metric name, a value and a time stamp.

� In storage system, the queries can be two major types

� Single value lookups to retrieve the most current value

� Small scans for retrieving systems health information

Metric NameMetric NameMetric NameMetric Name valuvaluvaluvalueeee

MinMinMinMin MaxMaxMaxMax TimestaTimestaTimestaTimestampmpmpmp

DurationDurationDurationDuration

� Yahoo! Cloud servicing Benchmark is designed for evaluation of key values stores using APM properties.

� We define five workloads (R,W,RSW,RW,RSW) as APM data is append only .

� It comprises a data generator, a workload generator as well as drivers for several key-value stores

� The goal was not only to get a pure performance comparison but also a broad overview of available solutions.

� Data stores used can be classified into categories

� Key-value stores : project Voldemort and Redis� Extensible record stores: HBase and Cassandra

� Scalable relational stores: My SQL Cluster an VoltDB

�We used Hbase v0.90.4 running on top of Hadoop v0.20.205.0.�Hbase uses HDFS it also requires the installation and configuration of Hadoop�Tables in Hbase can be accessed through API

�We used the recent 1.0.0.rc2 version and default Random Partitioner that distributes the data across the nodes randomly

�Implemented Cassandra YCSB client which is required to set just one column family to store all fields, each of them corresponding to a column

�It’s a systematic system and employs consistent hashing for distributing the values across the nodes

•We used 0.90.1 with embedded BerkeleyDB

storage and already

implemented Voldemort

configuration was easy

for most part.

•It is highly scalable

storage system with a simpler design

compared to relational

database

�We used 2.4.2 version as cluster version was in an unstable state and could not run a complete test.

�The default updated Redis YCBS client to use SharedJedisPool

�For data storage, YCSB uses a hash map as well as sorted set.

•We used VoltDB v2.1.3

and the default

configuration

•YCSB client driver for

the VoltDB that

connects to all servers

is implemented

•We used MySQL

v5.5.17 and InnoDB as

the storage engine

•RDBMS YCSB client

which is implemented

and connects to

databases using JDBC

� The workload has the most read intensive with 95% and only

5% writes. We present latencies and throughout using

logarithmic scale

� Redis has highest throughput

� Hbase has highest Read latency

� Cassandra has highest write latency

� In the second experiment, workload RW is

used which has 50%writes

� VoltDB achieves highest throughput for one

node which is slightly lower compare to

workload R

� In write latency Hbase and MySQL have

important differences compared to Workload

RW

� Workload is one that is closest to APM use case

� It has 99% write rate

� The throughput results is similar to workload RW

� For the read latency, the apparent change is the high latency of Hbase

� For write latency, Hbase has increased significantly

� The workload RS has 47% read and scan and 6% write

operations

� The MYSQL has best throughput for a single node

� Cassandra, HBase obtain a linear increase in throughput for

number of nodes

� This workload has 50% reads of which 25% are scans

� The most of results are similar to RS

� In this we used 8 nodes of each system

� The results are calculated for workload R

� We observe varying latencies for different key store

values

� The write latencies have similar development for

Cassandra, Voldemort, Redis

� The most efficient system in storage is Hbase

� REDIS an VoltDB are omitted as do not store

data on disk

� Cassandra stores the data most efficiently

� The disk usage can be reduced by

compression

� Series of tests conducted on cluster D

� The throughput increases for all systems with

higher ratios

� Project Voldemort has best read latency

� HBase has a low write latency but it is best for

workload RW

� Cassandra: Its achieves highest throughput for maximum

number of nodes and its performance is best for high rates.

� Hbase: Hbase throughput is lowest for one node. But

increases linearly with number of nodes. It has low write

latency, however read latency is much higher than other

systems.

� Project Voldemort: At low the read and write latencies are

similar and are stable.

� MYSQL: It achieved high throughput, however latency

decreases with the number of nodes.

� Redis: It has high throughput which exceeds all other

systems for read intensive. But latencies decreases for

both read and write operations

� VoltDB: The performance is high for single instance but

never achieved throughput increase with more than one

node

� we optimized each system for our workload and tested it with a number of open connections which was 4 times higher than the number of cores in the host CPUs.

� Higher numbers of connections led to congestion and slowed down the systems considerably while lower numbers did not fully utilize the systems.

� This configuration resulted in an average latency of the request processing that was much higher than in previously published performance measurements.

� Since our use case does not have the strict latency requirements that are common in online applications and similar environments, the latencies in most results are still adequate