Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge...
Transcript of Low Latency for Cloud Data Management+1% Revenue Web Performance An Open Challenge With a Huge...
Low Latency for Cloud Data Management
Felix GessertDecember 18, 2018, Universität Hamburg, DBIS Group
Presentationis loading
+1% Revenue
Web PerformanceAn Open Challenge With a Huge Impact
100 ms faster → = $1.7 Billion
+20% Ad Sales→ = $19 Billion500 ms faster
3Greg Linden. Make Data Useful. 2006
Tammy Everts. Time Is Money: The Business Value of Web Performance. O’Reilly Media, 2016.
2. Network Delays
1. Backend Processing
4
3. Frontend Rendering
Cloud-Based Web ApplicationsThree Sources of Page Load Time
Latency is the ProblemThroughput vs. Latency
5
0
500
1000
1500
2000
2500
3000
3500
1 2 3 4 5 6 7 8 9 10
Pa
ge
Lo
ad
Tim
e (
ms)
Bandwidth in MBit/s (at 60ms latency)
0
500
1000
1500
2000
2500
3000
3500
4000
4500
240 220 200 180 160 140 120 100 80 60 40 20 0
Pa
ge
Lo
ad
Tim
e (
ms)
Latency in ms (at 5MBit/s bandwidth)
Mike Belshe. More Bandwidth Doesn’t Matter (much). Technical report, Google Inc., 2010.
Throughput Latency
VS
Latency is the ProblemThroughput vs. Latency
6Mike Belshe. More Bandwidth Doesn’t Matter (much). Technical report, Google Inc., 2010.
VS
2× Throughput=
Same Load Time
½ Latency≈
½ Load Time
Problem StatementFour Challenges
7
Latency ofDynamic Data
PolyglotPersistence
Transaction Abort Rates
Direct ClientAccess
1 2
3 4
Problem StatementResearch Question
8
How can the latency of retrieving dynamic datafrom cloud services be minimized in an
application- and database-independent way while maintaining strict consistency guarantees?
1. Latency
9
Problem StatementFour Challenges
2. Direct Access 3. Transactions
4. Polyglot Persistence
Cloud Data Manage-ment Middleware
Caching Dynamic Data
End-to-End Latency in Cloud-based Architectures
Background &Motivation
Outlook and Summary
10
Outline
Providing Modern NoSQL Systems as a Low-Latency DBaaS
1 2
Cache Sketches: Solving Staleness of Reads and Queries
3
The Future of Polyglot Data Management in the Cloud
4
Why is end-to-end latency an open problem?
Background & Motivation
Network Cloud Data ManagementFrontend
12
Background & Motivation
Network Cloud Data ManagementFrontend
State of the Art: Web interface based on HTML, files,
APIs, and application logic Performance defined by critical
rendering path
Problems: No direct access or integration
to data management Storage maintained manually
13
Background & Motivation
Network Cloud Data ManagementFrontend
State of the Art: Service interfaces use REST & HTTP Web caching can reduce
end-to-end latency
Problem: Web caching not compatible with
consistent dynamic data
13
Background & Motivation
Network Cloud Data ManagementFrontend
State of the Art: SaaS, PaaS, and IaaS models Scalability & multi-tenancy
Problems: Combination of cloud services
entails high latency Common application building blocks
often re-implemented13
Background & Motivation
Network Cloud Data ManagementFrontend
State of the Art: Scalability & high availability through
NoSQL systems Sharding & replication Database-as-a-Service (DBaaS) model
Problems: Lack of common data management
abstractions DBaaS model not supported Polyglot persistence manual & error-prone
13
Data Management
13
Background & Motivation
RDBMS
DocumentStore
Key-ValueStore
Wide-ColumnStore
DistributedFile System
Volatile Data
Large Data Sets
Critical Data
Static Files
Nested Data
Challenge:
mapping problem?
data database
How to tackle the
Data Management Techniques
18
LoggingUpdate-in-Place
CachingIn-Memory Storage
Append-Only Storage
Global Secondary IndexingLocal Secondary Indexing
Query PlanningAnalytics FrameworkMaterialized Views
Commit/Consensus ProtocolSynchronous
AsynchronousPrimary Copy
Update Anywhere
Range-ShardingHash-Sharding
Entity-Group ShardingConsistent Hashing
Shared-Disk
Query Processing
Sharding
Replication
Storage Management
Elasticity
Consistency
Read Latency
Write Throughput
Read Availability
Write Availability
Durability
Write Latency
Write Scalability
Read Scalability
Data ScalabilityScan Queries
ACID Transactions
Conditional or Atomic Writes
Joins
Sorting
Filter Queries
Full-text Search
Aggregation and Analytics
FunctionalRequirements
Non-FunctionalRequirements
[GWFR16]
19
LoggingUpdate-in-Place
CachingIn-Memory Storage
Append-Only Storage
Global Secondary IndexingLocal Secondary Indexing
Query PlanningAnalytics FrameworkMaterialized Views
Commit/Consensus ProtocolSynchronous
AsynchronousPrimary Copy
Update Anywhere
Range-ShardingHash-Sharding
Entity-Group ShardingConsistent Hashing
Shared-Disk
Query Processing
Sharding
Replication
Storage Management
Elasticity
Consistency
Read Latency
Write Throughput
Read Availability
Write Availability
Durability
Write Latency
Write Scalability
Read Scalability
Data ScalabilityScan Queries
ACID Transactions
Conditional or Atomic Writes
Joins
Sorting
Filter Queries
Full-text Search
Aggregation and Analytics
NoSQLToolbox
Access
Fast Lookups
RAM
RedisMemcache
Unbounded
AP CP
CassandraRiak
VoldemortAerospike
HBaseMongoDBCouchBaseDynamoDB
Complex Queries
HDD-Size Unbounded
Analytics
MongoDBRethinkDB
HBase, Accumulo
ElasticSearch, Solr
Hadoop, SparkParallel DWH
Cassandra, HBaseRiak, MongoDB
ACID Availability
RDBMSNeo4j
RavenDBMarkLogic
CouchDBMongoDBSimpleDB
Ad-hoc
CacheShopping-
basketOrder
HistoryOLTP Website
SocialNetwork
Big Data
VolumeVolume
CAP Query PatternConsistency
Example Applications
DecisionTree
[GWFR16]
Backend-as-a-Service
Database-as-a-Service
20
NoSQLToolbox
DecisionTree
Unified DataManagement API
QuerySupport
TransactionProcessing
Data Validation
Partial Updates
Schema Management
CodeExecution
AccessControl
Indexing & Configuration
ObjectPersistence
[GFW+14]
How can cloud data management be unified & combined with low latency?
Orestes: GoalsA Data Management Middleware for Low Latency
22
DatabaseIndependence
Low Latency withTunable Consistency
Scalable, Available, Multi-Tenant
DBaaS & BaaSFunctionality
Orestes ConceptOverview
23
Heterogeneous Data Stores
UnmodifiedDatabase Systems
Web and Mobile Applications
Scalable Data Management Platform(Multi-Tenancy, Scaling, Caching, Failover, …)
Data and Default Modules
Web Cachingfor Low Latency
DBaaS/BaaSMiddleware
Unified REST API
[GBR14, GB13]
How can dynamic data be accelerated through web caching?
The Web‘s Caching Model
25
Expiration-Based Caches:
An object x is consideredfresh for TTLx seconds
Server assigns TTLs for eachobject
Invalidation-Based Caches:
Expose object evictionoperation to the server
Client
Expiration-based Caches
Invalidation-based Caches
RequestPath
Server/DB
Invalidations,Objects
CacheHits
Browser Caches, Forward Proxies, ISP Caches
Content Delivery Networks, Reverse Proxies
[GSW+15]
ExpirationCache
InvalidationCache
Web Caching for Data ManagementOverview of Cache Sketch Method
26
0 4 020
invalidate
Add to Server Cache Sketch
30 1 110Compact Cache Sketch
ValidateFreshness
1
Data Cachedfor Fixed TTLWithout Cache Sketch:
Stale Cached Data
[GSW+15, GSW+17]
Client
Expiration-based Caches
Invalidation-based Caches
RequestPath
Server/DB
Invalidations,Objects
CacheHits
Needs Invalidation? Server Cache Sketch
10201040
10101010
Counting Bloom Filter
Non-expiredObject Keys
Report Expirations and Writes
The Cache Sketch Approach
27
Needs Revalidation?
Client Cache Sketch
10101010Bloom filter
atconnect
Periodicevery Δ
seconds
attransaction
begin
MinimizeStaleness
MinimizeInvalidations
1
4
Initializationfrom Cache
1
2Δ-AtomicConsistency
3Cache-Aware Transactions
4InvalidationMinimization
32
[GSW+15]
Cache SketchMain Properties
28
To ensure Δ-atomicity the Cache Sketch at time t contains
key(x) of every object x that was written before it expired in
all caches.
t1
r(x)TTL
t1 + TTLt2
w(x)
t
ct
Retrieved Cache Sketch
t3
r(x) Δ
StalenessBound
timespan for which x c[GSW+15, GSW+17]
Cache SketchConstruction
29
To ensure compactness the Cache Sketch stores n keys
in a Bloom filter with m bits, k hash functions and a false
positive rate of 𝑓 ≈ 1 − exp𝑘⋅𝑛
𝑚
𝑘.
k hash functions m Bloom filter bits
1 0 0 1 1 0 1 1h1
hk
...keyfind(key)
Client Cache Sketch
Bits = 1
no
yes
GET request
Revalidation
Cache
Hit
Miss
key
key
Example
↓
11 KB in size
20 000 entries & 5% false positives
[GSW+15, GSW+17]
Writes Follow
Reads
Read Your
Writes
Monotonic
Reads
Monotonic
Writes
Δ-Atomicity
Linearizability
PRAM
Causal
Consistency
Sequential
Consistency
(Δ,p)-
Atomicity
Δ-Atomicity
(Δ,p)-
Atomicity
Read Your
Writes
Monotonic
Reads
Monotonic
Writes
PRAM
Writes Follow
Reads
Causal
Consistency
Linearizability
Sequential
Consistency
Controllable Consistency LevelsCache Sketch Guarantees
30
ControllableStaleness
DefaultGuarantees
Opt-in Guarantees
With CacheBypassing
P. Viotti and M. Vukoli´c. Consistency in Non-Transactional Distributed Storage Systems. ACM Computing Surveys, 2016.[WGW+18, GWR17, GWFR16, GR16, FWGR14, FWGR14]
Determining TTLsTrade-Off
31
Longer TTLsShorter TTLs
⇩ Cache Misses⇩ Invalidations⇩ False Positive Rate
VS
[GSW+17, GSW+15]
TTL EstimationOptimizing Expiration & Cacheability
32
1. Collect workload statistics for reads and writes
2. Estimate time to next write 𝐸[𝑇𝑤] or mark uncacheable
Constrained Adaptive TTL Estimator
C-LMModel
LWMAEstimator
EWMAEstimator
Ideal for PoissonProcesses
Quick Adaption to Changes
Converges forStatic Workloads
Highly Space-Efficient
[GSW+15]
Average throughput for YCSB workloadsA and B (YCSB benchmark):Average latency for YCSB workloadsA and B (YCSB benchmark):
Evaluation ResultsSimulation & Benchmarking
33
CDNClient MongoDBOrestesSetup:
Page load times with cachedinitialization (Simulation):
Ireland California
[GSW+15]
247 ms3837 ms
2763 ms1456 ms
4442 ms1576 ms
California
260 ms4226 ms
2645 ms2122 ms
753 ms1836 ms
266 ms5263 ms
3214 ms1622 ms
6573 ms1944 ms
277 ms9963 ms
6505 ms6325 ms
9321 ms4697 ms
BaqendAzureParse
FirebaseKinvey
Apiomat
Baqend
Baqend
Baqend
Frankfurt
Tokyo
Sydney
Evaluation ResultsIndustry Evaluation of Commercial Implementation
34[GSW+17, Ges17]
How can object caching be extended to query results?
Query CachingChallenges
36
Invalidation Detection
CacheCoherence
Query ResultRepresentation
When do queryresults change?
How to apply Cache Sketches to queries?
What is the best resultstructure for caching?
Q
[GSW+17]
Invalidation DetectionCache Coherence for Query Results
37
Update Orestes Cached Query Result
Cache Invalidation
0 1 110
Updated Cache Sketch
Real-TimeQueries
Add
Change
Remove
Query Events
Product A
Product B
Scalable Streaming System (InvaliDB)
Query Expression
↓Normalized
String
[WRG18, WGF+17, GSW+17, WGFR16]
Solution: Cost-based decision model weighsexpected round-trips vs. invalidations
[𝑢𝑟𝑙1, 𝑢𝑟𝑙2, 𝑢𝑟𝑙3]
Object ListsID Lists
[{𝑖𝑑: 𝑜𝑏𝑗1, 𝑛𝑎𝑚𝑒: "𝑎𝑙𝑖𝑐𝑒"},
{𝑖𝑑: 𝑜𝑏𝑗2, 𝑛𝑎𝑚𝑒: "𝑏𝑜𝑏"},{𝑖𝑑: 𝑜𝑏𝑗3, 𝑛𝑎𝑚𝑒: "𝑒𝑣𝑒"}]
⇩ Invalidations ⇩ Round-Trips
VS
38
Learning Result RepresenationsHandling Changes to Query Results
[GSW+17]
Evaluation ResultsQuery Caching for YCSB-Based Workloads
39
Throughput with growingrequest parallelsim:
Average end-to-end query andread latency:
11×
ThroughputImprovement
47×
Lower Query Latency
9×
Lower Read Latency
[GSW+17]
Can Cache Sketches improve transaction performance?
Problem of Optimistic TransactionsAbort Rates Depend on Latency
41
10 ms
50 ms100 ms150 ms
Transaction Abort Rates Increase
Exponentially withLatency
[GBR14]
Distributed Cache-Aware TransactionsDCAT Solves Latency Problem
42
Orestes Server
Orestes Server
Orestes Server
DB
Coordinator
Client Cache
Cache
Cache
Begin Transaction
Cache Sketch
Reads
WritesBuffer
Commit: read-set and updates
Committed OR aborted + stale objects
Mutual Exclusion
Writes
Read all
1. Cache Sketch: staleness barrier at transaction begin
2. Shorter duration through cached reads
[GBR14]
ResultsSimulation-Based Abort Analysis
43
15×
Faster Transactions
7×
More Objects BeforeExceeding 2 Seconds
Can polyglot data management be automated in the future?
VisionAutomated Choice of Databases
45
Latency < 20ms
AnnotatedSchema
Polyglot Persistence Mediator
Application
DB1 DB2 DB3
[SGR15]
Towards Automated Polyglot PersistenceThree-Step Process
46
Requirements specifiedas SLA annotations forschemas (based on NoSQL Toolbox)
Find or provision a suitable combination ofdatabases throughranking algorithm
Mediate data allocationand database operationsbetween applications and databases
1. Requirements 2. Resolution 3. Mediation
CounterTop-k Query
20 ms Write LatencyCounter
Redis
MongoDBCounterUpdate Redis
[SGR15]
Evaluation ResultsCase Study
47
Article
IDTitle…
Imp.
Imp.ID
Document Sorted Set
2.1×
Higher Throughput
10 ms
Predictable Write Latency
Scenario:News Articles With Impression Counts
[SGR15]
Future WorkThree Promising Areas
48
Proactive SLA Enforcement
Monitor & predict database behavior
Action: change routing, live migration, polyglot scaling
Reinforcement Learning of Caching Decisions
Learn best TTLs for any workload
Applications define goals
Polyglot Transaction Processing
Optimal choice of concurrency & commit protocol – across DBs
[SKE+18, SG16]
What are themain contributions?
Publications (1/4)
50
[SKE+18]Michael Schaarschmidt, Alexander Kuhnle, Ben Ellis, Kai Fricke, Felix Gessert, and Eiko Yoneki. LIFT: Reinforcement Learning in Computer Systems by Learning FromDemonstrations. arXiv preprint arXiv:1808.07903 (under submission), 2018.
[WRG18]Wolfram Wingerath, Norbert Ritter, and Felix Gessert. Real-Time & Stream Data Management: Push-Based Data in Research & Practice. Springer, book to be published in late 2018.
[WGW+18]
Wolfram Wingerath, Felix Gessert, Erik Witt, Steffen Friedrich, and Norbert Ritter. Real-time Data Management for Big Data. In Proceedings of the 21th International Conference on Extending Database Technology, EDBT 2018, Vienna, Austria, March 26-29, 2018. OpenProceedings.org, 2018.
[GSW+17]Felix Gessert, Michael Schaarschmidt, Wolfram Wingerath, Erik Witt, Eiko Yoneki, and Norbert Ritter. Quaestor: Query Web Caching for Database- as-a-Service Providers. Proceedings of the VLDB Endowment, 2017.
[GWR17]Felix Gessert, Wolfram Wingerath, and Norbert Ritter. Scalable Data Management: An In-Depth Tutorial on Nosql Data Stores. In BTW (Workshops), volume P-266 of LNI, pages399–402. GI, 2017.
Publications (2/4)
51
[WGF+17]
Wolfram Wingerath, Felix Gessert, Steffen Friedrich, Erik Witt, and Norbert Ritter. The Case for Change Notifications in Pull-Based Databases. In Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, 2.-3. März 2017, Stuttgart, Germany, 2017.
[GR17]Felix Gessert and Norbert Ritter. SCDM 2017 - Vorwort. In BTW (Workshops), volume P-266 of LNI, pages 211–213. GI, 2017.
[Ges17]Felix Gessert. Lessons Learned Building a Backend-as-a-Service. Baqend Tech Blog, May 2017. (Accessed on 08/11/2017).
[GWFR16]Felix Gessert, Wolfram Wingerath, Steffen Friedrich, and Norbert Ritter. NoSQL Database Systems: A Survey and Decision Guidance. Computer Science - Research and Development, November 2016.
[GR16]Felix Gessert and Norbert Ritter. Scalable Data Management: NoSQL Data Stores in Research and Practice. In 32nd IEEE International Conference on Data Engineering, ICDE, 2016.
[SG16]Michael Schaarschmidt and Felix Gessert. Learning Runtime Parameters in Computer Systems with Delayed Experience Injection. In Deep Reinforcement Learning Workshop, NIPS, 2016.
Publications (3/4)
52
[WGFR16]Wolfram Wingerath, Felix Gessert, Steffen Friedrich, and Norbert Ritter. Real- Time Stream Processing for Big Data. it - Information Technology, 58(4), January 2016.
[GSW+15]
Felix Gessert, Michael Schaarschmidt, Wolfram Wingerath, Steffen Friedrich, and Norbert Ritter. The Cache Sketch: Revisiting Expiration-based Caching in the Age of Cloud Data Management. In Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme". GI, 2015.
[GR15a]Felix Gessert and Norbert Ritter. Polyglot Persistence. Datenbank-Spektrum, 15(3):229–233, November 2015.
[GR15b]Felix Gessert and Norbert Ritter. Skalierbare NoSQL- und Cloud-Datenbanken in Forschung und Praxis. In Datenbanksysteme für Business, Technologie und Web (BTW2015) - Workshopband, 2.-3. März 2015, Hamburg, Germany, pages 271–274, 2015.
[Ges15]Felix Gessert. Low Latency Cloud Data Management through Consistent Caching and Polyglot Persistence. In Proceedings of the 9th Advanced Summer School on Service Oriented Computing, SummerSOC, 2015.
[SGR15]Michael Schaarschmidt, Felix Gessert, and Norbert Ritter. Towards Automated PolyglotPersistence. In Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme", 2015.
Publications (4/4)
53
[WFGR15]
Wolfram Wingerath, Steffen Friedrich, Felix Gessert, and Norbert Ritter. Who Watches theWatchmen? On the Lack of Validation in NoSQL Benchmarking. In Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme", 2015.
[GBR14]Felix Gessert, Florian Bücklers, and Norbert Ritter. ORESTES: a Scalable Database-as-a-Service Architecture for Low Latency. In CloudDB, Data Engineering Workshops (ICDEW), pages 215–222. IEEE, 2014.
[GFW+14]
Felix Gessert, Steffen Friedrich, Wolfram Wingerath, Michael Schaarschmidt, and Norbert Ritter. Towards a Scalable and Unified REST API for Cloud Data Stores. In 44. Jahrestagung der Gesellschaft für Informatik, Informatik 2014, Big Data - Komplexität meistern, 22.-26. September 2014 in Stuttgart, Deutschland, volume 232 of LNI, pages 723–734. GI, 2014.
[FWGR14]
Steffen Friedrich, Wolfram Wingerath, Felix Gessert, and Norbert Ritter. NoSQL OLTP Benchmarking: A Survey. In 44. Jahrestagung der Gesellschaft für Informatik, Informatik 2014, Big Data - Komplexität meistern, 22.-26. September 2014 in Stuttgart, Deutschland, volume 232 of LNI, pages 693–704. GI, 2014.
[GB13]Felix Gessert and Florian Bücklers. ORESTES: ein System für horizontal skalierbaren Zugriff auf Cloud-Datenbanken. In Informatiktage. GI, March 2013.
Client (Browser)
Expiration-based Caches
Invalidation-based Caches
Cloud Backend(DBaaS/BaaS)
DatabaseSystems
Expiration (TTL)
Best CacheableStructure
Cached Data
Files
Records, Documents
Query Results
{}
Main ContributionsSummary
54
Cache Coherence for Files, Records & Queries
2
1
TTL Estimation & Result Structure
3
Unified Data Management Interface
4
Database-IndependentDBaaS and BaaS
Scalable Cache-AwareTransactions
Polylgot PersistenceMediation
5
6