Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance...

37
Adaptation in distributed NoSQL data stores Kostas Magoutis Department of Computer Science and Engineering University of Ioannina, Greece Institute of Computer Science (ICS) Foundation for Research and Technology – Hellas (FORTH)

Transcript of Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance...

Page 1: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Adaptation in distributed NoSQL data stores

Kostas Magoutis

Department of Computer Science and Engineering

University of Ioannina, Greece

Institute of Computer Science (ICS)

Foundation for Research and Technology – Hellas (FORTH)

Page 2: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Workload variations lead to SLO violations

SummerSOC 2018

Impact of workload increase Adapt by adding a node

Latency

Time

SLO

However, rebalancing has an impact

Page 3: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Background tasks lead to SLO violations

SummerSOC 2018

Impact of background-task induced overload at leader node

SLOThroughput

Adapt by reorganizing replica groups (changing leader)

Page 4: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Agenda

• Workload or resource variations ⇒ SLO violations

– Need to adapt to maintain SLO

– Examples: Elasticity, rebalancing, reconfiguration

• Feedback-loop based adaptation

– Performance modeling via systematic measurements

– Importance of fast, light rebalancing actions

• Adapting via overhead-hiding operations

– Replica group leadership change

– Hide overhead at the leader

SummerSOC 2018

Page 5: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

backup backupprimary

NoSQL data stores: overview

Data model

Horizontal partitions

(shards)

mapping

Servers

B+-treeIndexing

LSMReplicas

mapping

SummerSOC 2018

Page 6: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

QoS architecture for NoSQL data stores

Maria Chalkiadaki and Kostas Magoutis, Managing Service Performance in the Cassandra Distributed Storage System, in Proc. of 5th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2013)

Page 7: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Provisioning methodology

• Prediction of service capacity requirements

• Tables of measured performance results– Response time

– Throughput

SummerSOC 2018

Page 8: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Provisioning methodology

• 100% reads • Zipf distribution• Load: 512 threads• Resp. time: 35ms

QoS specification

SummerSOC 2018

Page 9: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Exploring the accuracy of different regression approaches

Flora Karniavoura and Kostas Magoutis, A Measurement-based Approach to Performance Prediction in NoSQL Systems, in Proc. of 25th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2017)

• Interpolation exhibits 70-80% (avg) prediction accuracy in most cases – can we improve on this?

• Evaluate prediction accuracy using more advanced regression methods

– Multivariate adaptive regression splines (MARS)

– Support vector regression (SVR)

– Artificial neural networks (ANN)

SummerSOC 2018

Page 10: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Overall results

Predict performance for different cluster sizes

Flora Karniavoura and Kostas Magoutis, A Measurement-based Approach to Performance Prediction in NoSQL Systems, in Proc. of 25th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2017)

SummerSOC 2018

Page 11: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Overall results

Predict performance for different load levels

Flora Karniavoura and Kostas Magoutis, A Measurement-based Approach to Performance Prediction in NoSQL Systems, in Proc. of 25th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2017)

SummerSOC 2018

Page 12: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Overall results

Predict performance for different update settings

Flora Karniavoura and Kostas Magoutis, A Measurement-based Approach to Performance Prediction in NoSQL Systems, in Proc. of 25th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2017)

SummerSOC 2018

Page 13: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Overall results

MARS provides better accuracy in all test cases

• MARS provides excellent accuracy• SVR, ANN involve tuning (kernel, activation function)

Flora Karniavoura and Kostas Magoutis, A Measurement-based Approach to Performance Prediction in NoSQL Systems, in Proc. of 25th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2017)

SummerSOC 2018

Page 14: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Feedback-based control

r(k): desired value of measured output, e.g., 66% CPU utilization

difference between reference input and measured output

u(k): setting of parameter(s) that manipulate the system

y(k): measurable characteristic of target system (e.g. CPU)

Transform the measured output so that it can be compared to reference input (e.g., smoothing)

T. Abdelzaher et al. “Introduction to control theory and its application to computing systems,” Performance Modeling and Engineering, Springer, 2008

Page 15: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Behavior of a stable system

Reference input rss changes from 0 to 2

T. Abdelzaher et al. “Introduction to control theory and its application to computing systems,” Performance Modeling and Engineering, Springer, 2008

Measured output, eventually converges to yss=3

Settling time ks

Maximum overshoot

Steady-state error ess=rss-yss=-1

Goals: Stability, Accuracy, Short settling times, does not Overshoot (SASO)

Page 16: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Integral control

y(k): measurable characteristic of target system (e.g. CPU)

Transform the measured output so that it can be compared to reference input (e.g., smoothing)

T. Abdelzaher et al. “Introduction to control theory and its application to computing systems,” Performance Modeling and Engineering, Springer, 2008

Integral controller: provides incremental adjustments to u(k)

Page 17: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Reducing the impact of data rebalancing via incremental elasticity

Results in smoother elasticity action Processing capacity at joining node

SummerSOC 2018

Antonis Papaioannou and Kostas Magoutis, Incremental elasticity for NoSQL data stores, in Proc. of 37th IEEE International Conference on Distributed Computing Systems (ICDCS 2017), Atlanta, GA, USA, June 5-8, 2017 (poster)

Antonis Papaioannou and Kostas Magoutis, Incremental elasticity for NoSQL data stores, in Proc. of 36th Symposium on Reliable Distributed Systems (SRDS 2017), Hong Kong, China, September 27-29, 2017 (full paper)

Page 18: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Impact on response-time SLO

• YCSB Workload B (95%-5%), SLO 50ms• Load surge 20->30 YCSB threads• Elasticity action 5 mins after surge

Further response-time increase Smoother transition to new state

SummerSOC 2018

Page 19: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Adapting to impact of background tasks

SummerSOC 2018

Impact of background-task induced overload at leader node

SLOThroughput

Adapt by reorganizing replica groups (changing leader)

Page 20: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Replica-group leadership change as a performance enhancing mechanism

• Proactive replica group reorganizations provide rapid remedy to upcoming performance issues

– Lightweight adaptation actions

• Replica group management increasingly possible via programmable APIs in NoSQL data stores

– Examples: MongoDB, RethinkDB (both primary-backup)

SummerSOC 2018

P. Garefalakis, P. Papadopoulos, K. Magoutis, “ACaZoo: A Distributed Key-Value Store based on Replicated LSM-Trees ”, Proc. 33rd IEEE Symposium on Reliable Distributed Systems (SRDS’14) 2014, Nara, Japan, Oct 6-9, 2014. Best Student Paper

A. Papaioannou, K. Magoutis, “Replica-group leadership change as a performance enhancing mechanism in NoSQL data stores”, 38th IEEE International Conference on Distributed Computing Systems (ICDCS’18), Vienna, Austria, Jul 6-9, 2018

Page 21: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

High-level view of replicated data store

LSM

LSM

LSM

LSMLSM

LSM

SummerSOC 2018

Page 22: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Log-structured merge (LSM) trees

Memory

Disk

A18-v1

XYZ18-v2

cf2:col2-XYZ

B18-v3 foobar18-v1

row-6

cf1:col-B cf2:foobar

row-5

Foo18-v1

cf2:col-Foo

row-2

row-7

row-1

cf1:col-A

row-10

row-18 A18 - v1

Column Family 1 Column Family 2

Coordinates for a Cell: Row Key Column Family Name Column Qualifier Version

B18 - v3

Peter - v2

Bob - v1

Foo18 - v1

XYZ18 - v2

Mary - v1

foobar18 - v1

CF Prefix

memtable

Write-ahead log (WAL)

SSTables

put (write)

commit flush

Compactions

C0C1CK

SummerSOC 2018

Page 23: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

High-level view of replicated data store

LSM

LSM

LSM

LSMLSM

LSM

SummerSOC 2018

Key idea: Change leader before it starts a compaction

Page 24: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

When to change a leader, whom to elect

SummerSOC 2018

A. Papaioannou, K. Magoutis, “Replica-group leadership change as a performance enhancing mechanism in NoSQL data stores”, 38th IEEE International Conference on Distributed Computing Systems (ICDCS’18), Vienna, Austria, Jul 6-9, 2018

Page 25: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Experimental results

SummerSOC 2018

90% reads, 10% writes 50% reads, 50% writes

Standard MongoDB RocksDB

Page 26: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Experimental results

SummerSOC 2018

90% reads, 10% writes 50% reads, 50% writes

MongoDB RocksDB with leadership changes

Page 27: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Data backup

SummerSOC 2018

MongoDB RocksDB MongoDB RocksDB with leadership changes

Page 28: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Cross-layer management of data stores

Management API

Cluster•HTTP

•HTTP

SummerSOC 2018

Page 29: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Cross-layer management of data stores

Management API

Cluster

Pod

•HTTP

•HTTP

ReplicaSet

Monitoring

monitor

Autoscaler

SummerSOC 2018

Page 30: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Cross-layer management of data stores

Management API

Cluster

Pod

•HTTP

•HTTP

ReplicaSet

Monitoring

Container hooks

monitor hook

Autoscaler

Manager

Events

Events

Starting upGoing downReduced performance

Events:

SummerSOC 2018

Page 31: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Cross-layer management of data stores

E. Bekas, K. Magoutis, “Cross-layer management of a containerized NoSQL data store”, 15th IFIP/IEEE International Symposium on Integrated Network Management (IM 2017), 8-12 May 2017

Management API

Cluster

Pod

•HTTP

•HTTP

ReplicaSet

Monitoring

Container hooks

monitor hook

Autoscaler

ManagerActions

Events Actions

Actions

Events

A. Papaioannou, D. Metallidis, K. Magoutis, “Cross-layer management of distributed applications on multi-clouds”, 13th

IFIP/IEEE International Symposium on Integrated Network Management (IM 2015), Ottawa, Canada, May 11-15, 2015

Actions

Change primaryMigrate replicaAdd server

Actions:

Starting upGoing downReduced performance

Events:

Set ReplicaSetSet Autoscaler

Actions:

SummerSOC 2018

Page 32: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Experimental testbed

2 vCPUs 2.6GHz Intel Xeon E513GB RAMSSD

shard = horizontal partition

YCSB settings • 50-50 reads/writes• 16 client threads• target 1000 operations/sec

SummerSOC 2018

Page 33: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Temporary offload via RG reorganization

• Move S1 primary out of Node 1

SummerSOC 2018

Page 34: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Temporary offload via RG reorganization

SummerSOC 2018

Page 35: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Summary

• Proactive replica group reorganizations provide rapid remedy to upcoming performance issues

– Lightweight adaptation actions

• Functionality previously unavailable as infrastructure-level events invisible to NoSQL middleware

– Richer feedback useful: How long is impact expected to last?

SummerSOC 2018

Page 36: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

References

• A. Papaioannou, K. Magoutis, “Replica-group leadership change as a performance enhancing mechanism in NoSQL data stores”, 38th IEEE International Conference on Distributed Computing Systems (ICDCS’18), Vienna, Austria, Jul 6-9, 2018

• Antonis Papaioannou and Kostas Magoutis, “Incremental elasticity for NoSQL data stores”, in Proc. of 36th Symposium on Reliable Distributed Systems (SRDS 2017), Hong Kong, China, Sep 27-29, 2017

• Flora Karniavoura and Kostas Magoutis, “A Measurement-based Approach to Performance Prediction in NoSQL Systems”, in Proc. of 25th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2017)

• Antonis Papaioannou and Kostas Magoutis, “Incremental elasticity for NoSQL data stores”, in Proc. of 37th IEEE International Conference on Distributed Computing Systems (ICDCS 2017), Atlanta, GA, USA, June 5-8, 2017

• E. Bekas, K. Magoutis, “Cross-layer management of a containerized NoSQL data store”, in Proc. of 15th IFIP/IEEE International Symposium on Integrated Network Management (IM 2017), 8-12 May 2017

• A. Papaioannou, D. Metallidis, K. Magoutis, “Cross-layer management of distributed applications on multi-clouds”, IFIP/IEEE International Symposium on Integrated Network Management (IM 2015), Ottawa, Canada, May 11-15, 2015

• P. Garefalakis, P. Papadopoulos, K. Magoutis, “ACaZoo: A Distributed Key-Value Store based on Replicated LSM-Trees”, Proc. 33rd IEEE Symposium on Reliable Distributed Systems (SRDS’14) 2014, Nara, Japan, Oct 6-9, 2014. Best Student Paper

• Maria Chalkiadaki and Kostas Magoutis, “Managing Service Performance in the Cassandra Distributed Storage System”, in Proc. of 5th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2013), Bristol, UK, December 2-5, 2013

SummerSOC 2018

Page 37: Adaptation in distributed NoSQL data stores · y(k): measurable ... “Managing Service Performance in the Cassandra ... H2020 GA no. 731846 EU project. Title: PowerPoint Presentation

Questions?

H2020 GA no. 731846 EU project