Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge...

Low Latency for Cloud Data Management

Felix GessertDecember 18, 2018, Universität Hamburg, DBIS Group

Presentationis loading

+1% Revenue

Web PerformanceAn Open Challenge With a Huge Impact

100 ms faster → = $1.7 Billion

+20% Ad Sales→ = $19 Billion500 ms faster

3Greg Linden. Make Data Useful. 2006

Tammy Everts. Time Is Money: The Business Value of Web Performance. O’Reilly Media, 2016.

2. Network Delays

1. Backend Processing

4

3. Frontend Rendering

Cloud-Based Web ApplicationsThree Sources of Page Load Time

Latency is the ProblemThroughput vs. Latency

5

0

500

1000

1500

2000

2500

3000

3500

1 2 3 4 5 6 7 8 9 10

Pa

ge

Lo

ad

Tim

e (

ms)

Bandwidth in MBit/s (at 60ms latency)

0

500

1000

1500

2000

2500

3000

3500

4000

4500

240 220 200 180 160 140 120 100 80 60 40 20 0

Pa

ge

Lo

ad

Tim

e (

ms)

Latency in ms (at 5MBit/s bandwidth)

Mike Belshe. More Bandwidth Doesn’t Matter (much). Technical report, Google Inc., 2010.

Throughput Latency

VS

Latency is the ProblemThroughput vs. Latency

6Mike Belshe. More Bandwidth Doesn’t Matter (much). Technical report, Google Inc., 2010.

VS

2× Throughput=

Same Load Time

½ Latency≈

½ Load Time

Problem StatementFour Challenges

7

Latency ofDynamic Data

PolyglotPersistence

Transaction Abort Rates

Direct ClientAccess

1 2

3 4

Problem StatementResearch Question

8

How can the latency of retrieving dynamic datafrom cloud services be minimized in an

application- and database-independent way while maintaining strict consistency guarantees?

1. Latency

9

Problem StatementFour Challenges

2. Direct Access 3. Transactions

4. Polyglot Persistence

Cloud Data Manage-ment Middleware

Caching Dynamic Data

End-to-End Latency in Cloud-based Architectures

Background &Motivation

Outlook and Summary

10

Outline

Providing Modern NoSQL Systems as a Low-Latency DBaaS

1 2

Cache Sketches: Solving Staleness of Reads and Queries

3

The Future of Polyglot Data Management in the Cloud

4

Why is end-to-end latency an open problem?

Background & Motivation

Network Cloud Data ManagementFrontend

12



State of the Art: Web interface based on HTML, files,

APIs, and application logic Performance defined by critical

rendering path

Problems: No direct access or integration

to data management Storage maintained manually

13



State of the Art: Service interfaces use REST & HTTP Web caching can reduce

end-to-end latency

Problem: Web caching not compatible with

consistent dynamic data

13



State of the Art: SaaS, PaaS, and IaaS models Scalability & multi-tenancy

Problems: Combination of cloud services

entails high latency Common application building blocks

often re-implemented13



State of the Art: Scalability & high availability through

NoSQL systems Sharding & replication Database-as-a-Service (DBaaS) model

Problems: Lack of common data management

abstractions DBaaS model not supported Polyglot persistence manual & error-prone

13

Data Management

13


RDBMS

DocumentStore

Key-ValueStore

Wide-ColumnStore

DistributedFile System

Volatile Data

Large Data Sets

Critical Data

Static Files

Nested Data

Challenge:

mapping problem?

data database

How to tackle the

Data Management Techniques

18

LoggingUpdate-in-Place

CachingIn-Memory Storage

Append-Only Storage

Global Secondary IndexingLocal Secondary Indexing

Query PlanningAnalytics FrameworkMaterialized Views

Commit/Consensus ProtocolSynchronous

AsynchronousPrimary Copy

Update Anywhere

Range-ShardingHash-Sharding

Entity-Group ShardingConsistent Hashing

Shared-Disk

Query Processing

Sharding

Replication

Storage Management

Elasticity

Consistency

Read Latency

Write Throughput

Read Availability

Write Availability

Durability

Write Latency

Write Scalability

Read Scalability

Data ScalabilityScan Queries

ACID Transactions

Conditional or Atomic Writes

Joins

Sorting

Filter Queries

Full-text Search

Aggregation and Analytics

FunctionalRequirements

Non-FunctionalRequirements

[GWFR16]

19

LoggingUpdate-in-Place

CachingIn-Memory Storage

Append-Only Storage

Global Secondary IndexingLocal Secondary Indexing

Query PlanningAnalytics FrameworkMaterialized Views

Commit/Consensus ProtocolSynchronous

AsynchronousPrimary Copy

Update Anywhere

Range-ShardingHash-Sharding

Entity-Group ShardingConsistent Hashing

Shared-Disk

Query Processing

Sharding

Replication

Storage Management

Elasticity

Consistency

Read Latency

Write Throughput

Read Availability

Write Availability

Durability

Write Latency

Write Scalability

Read Scalability

Data ScalabilityScan Queries

ACID Transactions

Conditional or Atomic Writes

Joins

Sorting

Filter Queries

Full-text Search

Aggregation and Analytics

NoSQLToolbox

Access

Fast Lookups

RAM

RedisMemcache

Unbounded

AP CP

CassandraRiak

VoldemortAerospike

HBaseMongoDBCouchBaseDynamoDB

Complex Queries

HDD-Size Unbounded

Analytics

MongoDBRethinkDB

HBase, Accumulo

ElasticSearch, Solr

Hadoop, SparkParallel DWH

Cassandra, HBaseRiak, MongoDB

ACID Availability

RDBMSNeo4j

RavenDBMarkLogic

CouchDBMongoDBSimpleDB

Ad-hoc

CacheShopping-

basketOrder

HistoryOLTP Website

SocialNetwork

Big Data

VolumeVolume

CAP Query PatternConsistency

Example Applications

DecisionTree

[GWFR16]

Backend-as-a-Service

Database-as-a-Service

20

NoSQLToolbox

DecisionTree

Unified DataManagement API

QuerySupport

TransactionProcessing

Data Validation

Partial Updates

Schema Management

CodeExecution

AccessControl

Indexing & Configuration

ObjectPersistence

[GFW+14]

How can cloud data management be unified & combined with low latency?

Orestes: GoalsA Data Management Middleware for Low Latency

22

DatabaseIndependence

Low Latency withTunable Consistency

Scalable, Available, Multi-Tenant

DBaaS & BaaSFunctionality

Orestes ConceptOverview

23

Heterogeneous Data Stores

UnmodifiedDatabase Systems

Web and Mobile Applications

Scalable Data Management Platform(Multi-Tenancy, Scaling, Caching, Failover, …)

Data and Default Modules

Web Cachingfor Low Latency

DBaaS/BaaSMiddleware

Unified REST API

[GBR14, GB13]

How can dynamic data be accelerated through web caching?

The Web‘s Caching Model

25

Expiration-Based Caches:

An object x is consideredfresh for TTLx seconds

Server assigns TTLs for eachobject

Invalidation-Based Caches:

Expose object evictionoperation to the server

Client

Expiration-based Caches

Invalidation-based Caches

RequestPath

Server/DB

Invalidations,Objects

CacheHits

Browser Caches, Forward Proxies, ISP Caches

Content Delivery Networks, Reverse Proxies

[GSW+15]

ExpirationCache

InvalidationCache

Web Caching for Data ManagementOverview of Cache Sketch Method

26

0 4 020

invalidate

Add to Server Cache Sketch

30 1 110Compact Cache Sketch

ValidateFreshness

1

Data Cachedfor Fixed TTLWithout Cache Sketch:

Stale Cached Data

[GSW+15, GSW+17]

Client



RequestPath

Server/DB

Invalidations,Objects

CacheHits

Needs Invalidation? Server Cache Sketch

10201040

10101010

Counting Bloom Filter

Non-expiredObject Keys

Report Expirations and Writes

The Cache Sketch Approach

27

Needs Revalidation?

Client Cache Sketch

10101010Bloom filter

atconnect

Periodicevery Δ

seconds

attransaction

begin

MinimizeStaleness

MinimizeInvalidations

1

4

Initializationfrom Cache

1

2Δ-AtomicConsistency

3Cache-Aware Transactions

4InvalidationMinimization

32

[GSW+15]

Cache SketchMain Properties

28

To ensure Δ-atomicity the Cache Sketch at time t contains

key(x) of every object x that was written before it expired in

all caches.

t1

r(x)TTL

t1 + TTLt2

w(x)

t

ct

Retrieved Cache Sketch

t3

r(x) Δ

StalenessBound

timespan for which x c[GSW+15, GSW+17]

Cache SketchConstruction

29

To ensure compactness the Cache Sketch stores n keys

in a Bloom filter with m bits, k hash functions and a false

positive rate of 𝑓 ≈ 1 − exp𝑘⋅𝑛

𝑚

𝑘.

k hash functions m Bloom filter bits

1 0 0 1 1 0 1 1h1

hk

...keyfind(key)

Client Cache Sketch

Bits = 1

no

yes

GET request

Revalidation

Cache

Hit

Miss

key

key

Example

↓

11 KB in size

20 000 entries & 5% false positives

[GSW+15, GSW+17]

Writes Follow

Reads

Read Your

Writes

Monotonic

Reads

Monotonic

Writes

Δ-Atomicity

Linearizability

PRAM

Causal

Consistency

Sequential

Consistency

(Δ,p)-

Atomicity

Δ-Atomicity

(Δ,p)-

Atomicity

Read Your

Writes

Monotonic

Reads

Monotonic

Writes

PRAM

Writes Follow

Reads

Causal

Consistency

Linearizability

Sequential

Consistency

Controllable Consistency LevelsCache Sketch Guarantees

30

ControllableStaleness

DefaultGuarantees

Opt-in Guarantees

With CacheBypassing

P. Viotti and M. Vukoli´c. Consistency in Non-Transactional Distributed Storage Systems. ACM Computing Surveys, 2016.[WGW+18, GWR17, GWFR16, GR16, FWGR14, FWGR14]

Determining TTLsTrade-Off

31

Longer TTLsShorter TTLs

⇩ Cache Misses⇩ Invalidations⇩ False Positive Rate

VS

[GSW+17, GSW+15]

TTL EstimationOptimizing Expiration & Cacheability

32

1. Collect workload statistics for reads and writes

2. Estimate time to next write 𝐸[𝑇𝑤] or mark uncacheable

Constrained Adaptive TTL Estimator

C-LMModel

LWMAEstimator

EWMAEstimator

Ideal for PoissonProcesses

Quick Adaption to Changes

Converges forStatic Workloads

Highly Space-Efficient

[GSW+15]

Average throughput for YCSB workloadsA and B (YCSB benchmark):Average latency for YCSB workloadsA and B (YCSB benchmark):

Evaluation ResultsSimulation & Benchmarking

33

CDNClient MongoDBOrestesSetup:

Page load times with cachedinitialization (Simulation):

Ireland California

[GSW+15]

247 ms3837 ms

2763 ms1456 ms

4442 ms1576 ms

California

260 ms4226 ms

2645 ms2122 ms

753 ms1836 ms

266 ms5263 ms

3214 ms1622 ms

6573 ms1944 ms

277 ms9963 ms

6505 ms6325 ms

9321 ms4697 ms

BaqendAzureParse

FirebaseKinvey

Apiomat

Baqend

Baqend

Baqend

Frankfurt

Tokyo

Sydney

Evaluation ResultsIndustry Evaluation of Commercial Implementation

34[GSW+17, Ges17]

How can object caching be extended to query results?

Query CachingChallenges

36

Invalidation Detection

CacheCoherence

Query ResultRepresentation

When do queryresults change?

How to apply Cache Sketches to queries?

What is the best resultstructure for caching?

Q

[GSW+17]

Invalidation DetectionCache Coherence for Query Results

37

Update Orestes Cached Query Result

Cache Invalidation

0 1 110

Updated Cache Sketch

Real-TimeQueries

Add

Change

Remove

Query Events

Product A

Product B

Scalable Streaming System (InvaliDB)

Query Expression

↓Normalized

String

[WRG18, WGF+17, GSW+17, WGFR16]

Solution: Cost-based decision model weighsexpected round-trips vs. invalidations

[𝑢𝑟𝑙1, 𝑢𝑟𝑙2, 𝑢𝑟𝑙3]

Object ListsID Lists

[{𝑖𝑑: 𝑜𝑏𝑗1, 𝑛𝑎𝑚𝑒: "𝑎𝑙𝑖𝑐𝑒"},

{𝑖𝑑: 𝑜𝑏𝑗2, 𝑛𝑎𝑚𝑒: "𝑏𝑜𝑏"},{𝑖𝑑: 𝑜𝑏𝑗3, 𝑛𝑎𝑚𝑒: "𝑒𝑣𝑒"}]

⇩ Invalidations ⇩ Round-Trips

VS

38

Learning Result RepresenationsHandling Changes to Query Results

[GSW+17]

Evaluation ResultsQuery Caching for YCSB-Based Workloads

39

Throughput with growingrequest parallelsim:

Average end-to-end query andread latency:

11×

ThroughputImprovement

47×

Lower Query Latency

9×

Lower Read Latency

[GSW+17]

Can Cache Sketches improve transaction performance?

Problem of Optimistic TransactionsAbort Rates Depend on Latency

41

10 ms

50 ms100 ms150 ms

Transaction Abort Rates Increase

Exponentially withLatency

[GBR14]

Distributed Cache-Aware TransactionsDCAT Solves Latency Problem

42

Orestes Server

Orestes Server

Orestes Server

DB

Coordinator

Client Cache

Cache

Cache

Begin Transaction

Cache Sketch

Reads

WritesBuffer

Commit: read-set and updates

Committed OR aborted + stale objects

Mutual Exclusion

Writes

Read all

1. Cache Sketch: staleness barrier at transaction begin

2. Shorter duration through cached reads

[GBR14]

ResultsSimulation-Based Abort Analysis

43

15×

Faster Transactions

7×

More Objects BeforeExceeding 2 Seconds

Can polyglot data management be automated in the future?

VisionAutomated Choice of Databases

45

Latency < 20ms

AnnotatedSchema

Polyglot Persistence Mediator

Application

DB1 DB2 DB3

[SGR15]

Towards Automated Polyglot PersistenceThree-Step Process

46

Requirements specifiedas SLA annotations forschemas (based on NoSQL Toolbox)

Find or provision a suitable combination ofdatabases throughranking algorithm

Mediate data allocationand database operationsbetween applications and databases

1. Requirements 2. Resolution 3. Mediation

CounterTop-k Query

20 ms Write LatencyCounter

Redis

MongoDBCounterUpdate Redis

[SGR15]

Evaluation ResultsCase Study

47

Article

IDTitle…

Imp.

Imp.ID

Document Sorted Set

2.1×

Higher Throughput

10 ms

Predictable Write Latency

Scenario:News Articles With Impression Counts

[SGR15]

Future WorkThree Promising Areas

48

Proactive SLA Enforcement

Monitor & predict database behavior

Action: change routing, live migration, polyglot scaling

Reinforcement Learning of Caching Decisions

Learn best TTLs for any workload

Applications define goals

Polyglot Transaction Processing

Optimal choice of concurrency & commit protocol – across DBs

[SKE+18, SG16]

What are themain contributions?

Publications (1/4)

50

[SKE+18]Michael Schaarschmidt, Alexander Kuhnle, Ben Ellis, Kai Fricke, Felix Gessert, and Eiko Yoneki. LIFT: Reinforcement Learning in Computer Systems by Learning FromDemonstrations. arXiv preprint arXiv:1808.07903 (under submission), 2018.

[WRG18]Wolfram Wingerath, Norbert Ritter, and Felix Gessert. Real-Time & Stream Data Management: Push-Based Data in Research & Practice. Springer, book to be published in late 2018.

[WGW+18]

Wolfram Wingerath, Felix Gessert, Erik Witt, Steffen Friedrich, and Norbert Ritter. Real-time Data Management for Big Data. In Proceedings of the 21th International Conference on Extending Database Technology, EDBT 2018, Vienna, Austria, March 26-29, 2018. OpenProceedings.org, 2018.

[GSW+17]Felix Gessert, Michael Schaarschmidt, Wolfram Wingerath, Erik Witt, Eiko Yoneki, and Norbert Ritter. Quaestor: Query Web Caching for Database- as-a-Service Providers. Proceedings of the VLDB Endowment, 2017.

[GWR17]Felix Gessert, Wolfram Wingerath, and Norbert Ritter. Scalable Data Management: An In-Depth Tutorial on Nosql Data Stores. In BTW (Workshops), volume P-266 of LNI, pages399–402. GI, 2017.

Publications (2/4)

51

[WGF+17]

Wolfram Wingerath, Felix Gessert, Steffen Friedrich, Erik Witt, and Norbert Ritter. The Case for Change Notifications in Pull-Based Databases. In Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, 2.-3. März 2017, Stuttgart, Germany, 2017.

[GR17]Felix Gessert and Norbert Ritter. SCDM 2017 - Vorwort. In BTW (Workshops), volume P-266 of LNI, pages 211–213. GI, 2017.

[Ges17]Felix Gessert. Lessons Learned Building a Backend-as-a-Service. Baqend Tech Blog, May 2017. (Accessed on 08/11/2017).

[GWFR16]Felix Gessert, Wolfram Wingerath, Steffen Friedrich, and Norbert Ritter. NoSQL Database Systems: A Survey and Decision Guidance. Computer Science - Research and Development, November 2016.

[GR16]Felix Gessert and Norbert Ritter. Scalable Data Management: NoSQL Data Stores in Research and Practice. In 32nd IEEE International Conference on Data Engineering, ICDE, 2016.

[SG16]Michael Schaarschmidt and Felix Gessert. Learning Runtime Parameters in Computer Systems with Delayed Experience Injection. In Deep Reinforcement Learning Workshop, NIPS, 2016.

Publications (3/4)

52

[WGFR16]Wolfram Wingerath, Felix Gessert, Steffen Friedrich, and Norbert Ritter. Real- Time Stream Processing for Big Data. it - Information Technology, 58(4), January 2016.

[GSW+15]

Felix Gessert, Michael Schaarschmidt, Wolfram Wingerath, Steffen Friedrich, and Norbert Ritter. The Cache Sketch: Revisiting Expiration-based Caching in the Age of Cloud Data Management. In Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme". GI, 2015.

[GR15a]Felix Gessert and Norbert Ritter. Polyglot Persistence. Datenbank-Spektrum, 15(3):229–233, November 2015.

[GR15b]Felix Gessert and Norbert Ritter. Skalierbare NoSQL- und Cloud-Datenbanken in Forschung und Praxis. In Datenbanksysteme für Business, Technologie und Web (BTW2015) - Workshopband, 2.-3. März 2015, Hamburg, Germany, pages 271–274, 2015.

[Ges15]Felix Gessert. Low Latency Cloud Data Management through Consistent Caching and Polyglot Persistence. In Proceedings of the 9th Advanced Summer School on Service Oriented Computing, SummerSOC, 2015.

[SGR15]Michael Schaarschmidt, Felix Gessert, and Norbert Ritter. Towards Automated PolyglotPersistence. In Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme", 2015.

Publications (4/4)

53

[WFGR15]

Wolfram Wingerath, Steffen Friedrich, Felix Gessert, and Norbert Ritter. Who Watches theWatchmen? On the Lack of Validation in NoSQL Benchmarking. In Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme", 2015.

[GBR14]Felix Gessert, Florian Bücklers, and Norbert Ritter. ORESTES: a Scalable Database-as-a-Service Architecture for Low Latency. In CloudDB, Data Engineering Workshops (ICDEW), pages 215–222. IEEE, 2014.

[GFW+14]

Felix Gessert, Steffen Friedrich, Wolfram Wingerath, Michael Schaarschmidt, and Norbert Ritter. Towards a Scalable and Unified REST API for Cloud Data Stores. In 44. Jahrestagung der Gesellschaft für Informatik, Informatik 2014, Big Data - Komplexität meistern, 22.-26. September 2014 in Stuttgart, Deutschland, volume 232 of LNI, pages 723–734. GI, 2014.

[FWGR14]

Steffen Friedrich, Wolfram Wingerath, Felix Gessert, and Norbert Ritter. NoSQL OLTP Benchmarking: A Survey. In 44. Jahrestagung der Gesellschaft für Informatik, Informatik 2014, Big Data - Komplexität meistern, 22.-26. September 2014 in Stuttgart, Deutschland, volume 232 of LNI, pages 693–704. GI, 2014.

[GB13]Felix Gessert and Florian Bücklers. ORESTES: ein System für horizontal skalierbaren Zugriff auf Cloud-Datenbanken. In Informatiktage. GI, March 2013.

Client (Browser)



Cloud Backend(DBaaS/BaaS)

DatabaseSystems

Expiration (TTL)

Best CacheableStructure

Cached Data

Files

Records, Documents

Query Results

{}

Main ContributionsSummary

54

Cache Coherence for Files, Records & Queries

2

1

TTL Estimation & Result Structure

3

Unified Data Management Interface

4

Database-IndependentDBaaS and BaaS

Scalable Cache-AwareTransactions

Polylgot PersistenceMediation

5

6

Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge...

Documents

Transcript of Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge...