1 CS6320 – Performance L. Grewe. Number of requests a website receives is unpredictable CNN, NY...

44
1 CS6320 – CS6320 – Performance Performance L. Grewe L. Grewe

Transcript of 1 CS6320 – Performance L. Grewe. Number of requests a website receives is unpredictable CNN, NY...

11

CS6320 – Performance CS6320 – Performance

L. GreweL. Grewe

Number of requests a website Number of requests a website receives is unpredictablereceives is unpredictable

CNN, NY Times, ABC News unavailable from 9-10 AM

(Eastern Time)

Content providers’ dilemma: how many resources to provision?

Need on-demand scalabilty

Usual

9/11*

0

50

100

150

Pag

e v

iew

s/day

(in

mill

ion

s)

CNN.com

Content Delivery Network (CDN) SolutionContent Delivery Network (CDN) Solution

Source: http://www.tcsa.org/lisa2001/cnn.txt http://www.akamai.com/en/html/about/press/press479.html

0

200

400

600

800 Normal

12-Sep-01

Election day(Nov 2), 2004

Used Akamai on Election day

Page was 1.2k instead of 50k on 12 Sep, 01P

ag

e v

iew

s/day

(in

mill

ion

s)

50k

50k

1.2k

CNN.com

Typical Web-Site ArchitectureTypical Web-Site Architecture

Home server

Web Server

App Server

DB

Request

Response

Executecode

AccessDB

Users

CDN ArchitectureCDN Architecture

Users

Internet core

Content providers

CDN nodes

CDNs excel at delivering static content.

Advantages of CDNsAdvantages of CDNs

Large infrastructure handles load Large infrastructure handles load spikesspikes

Clients charged on a per-usage basisClients charged on a per-usage basis• no need to guess what resources to no need to guess what resources to

provisionprovision Moves data closer to end-usersMoves data closer to end-users

• decreases latency and increases decreases latency and increases throughputthroughput

CDN Application Services CDN Application Services

but for data-intensive dynamic applications…but for data-intensive dynamic applications…

DBUser

s

database server becomes the bottleneck!

Internet

CDN’s can also run applications

Methods to scale the database Methods to scale the database componentcomponent

In-house database scalability: In-house database scalability: [DBCache, [DBCache, DBProxy, MTCache, NEC Cache Portal] DBProxy, MTCache, NEC Cache Portal] • Must provision for peak loadMust provision for peak load

Database outsourcing: Database as a service Database outsourcing: Database as a service [Hacigumus+ ICDE ’02, SIGMOD ’02][Hacigumus+ ICDE ’02, SIGMOD ’02]• Have to cede control of dataHave to cede control of data

Database Scalability Service (DBSS): Shared Database Scalability Service (DBSS): Shared infrastructure that caches applications’ data infrastructure that caches applications’ data [INRIA/LIP6, CIDR ’05, SIGMOD ’06, ICDE ’07][INRIA/LIP6, CIDR ’05, SIGMOD ’06, ICDE ’07]

S3 Database Scalability ServiceS3 Database Scalability Service CDN-like proxy nodes cache results of CDN-like proxy nodes cache results of

database queriesdatabase queries• reduces load on central database serversreduces load on central database servers

All database updates sent to central serverAll database updates sent to central server• clients don’t cede ownership of their dataclients don’t cede ownership of their data

Uses publish/subscribe system to maintain Uses publish/subscribe system to maintain data consistencydata consistency• avoids additional load at the central serveravoids additional load at the central server

Content provider may encrypt database Content provider may encrypt database requests/responses to protect sensitive requests/responses to protect sensitive datadata

InternetInternet

Database Scalability ServiceDatabase Scalability Service

DBSSDBSS

home serverdatabases:

users:

Content Delivery NetworkContent Delivery Network

Database Scalability ServiceDatabase Scalability Service

DBSSDBSS

home serverdatabases:

users:

InternetInternet

Web and application serversWeb and application servers

Database Scalability ServiceDatabase Scalability Service

DBSSDBSS

home serverdatabases:

client apps:

Internet

OutlineOutline

Need for on-demand scalabilityNeed for on-demand scalability S3 invalidation mechanismS3 invalidation mechanism Security-scalability tradeoffSecurity-scalability tradeoff Reducing latencyReducing latency

Addressing consistencyAddressing consistency TTL is TTL is wastefulwasteful::

• Often refresh cached data unnecessarily Often refresh cached data unnecessarily (workloads dominated by reads)(workloads dominated by reads)

• Must set TTL=0 for strong consistency!Must set TTL=0 for strong consistency!

SolutionSolution: update or invalidate cached data : update or invalidate cached data only when affected by updatesonly when affected by updates

• Naïve approachNaïve approach: home organizations notify : home organizations notify proxy servers of relevant updates proxy servers of relevant updates not not scalablescalable

Our approach:

Fully-distributed, proxy-to-proxyupdate notification mechanism

Distributed Consistency MechanismDistributed Consistency Mechanism

Multicast Environme

ntproxy node

update notification

update notification

users

update

• Distributed app-level multicast environment, e.g., Scribe

• Forward all updates to backend home servers

Configuring Multicast ChannelsConfiguring Multicast Channels Key observation:Key observation: Web applications typically interact with DB via a Web applications typically interact with DB via a

small, fixed set of query/update templates (usually 10-100)small, fixed set of query/update templates (usually 10-100)

Example:Example:

SELECT qty FROM inv WHERE id = ?SELECT qty FROM inv WHERE id = ?UPDATE inv SET qty = ? WHERE id = ?UPDATE inv SET qty = ? WHERE id = ?

Templates: natural way to configure channels

Options:Channel-by-query or Channel-by-update

Channel-by-Query OptionChannel-by-Query Option One channel per One channel per queryquery template Q: C(Q) template Q: C(Q)

Few subscriptions/cached resultFew subscriptions/cached result Many invalidation notifications/updateMany invalidation notifications/update

Begin caching result(s) of query template Q

Subscribe to C(Q)

Evict only query result for Q

Unsubscribe from C(Q)

Issue updateDetermine which query templates Q1, …, Qn affected; send notification on each C(Qi)

Conflicts determined lazily (upon update)

One channel per One channel per updateupdate template U: C(U) template U: C(U)

Many subscriptions/cached resultMany subscriptions/cached result Few invalidation notifications/updateFew invalidation notifications/update

Channel-by-Update OptionChannel-by-Update Option

Begin caching result(s) of query template Q

Determine which update templates U1, …, Un apply; subscribe to each C(Ui)

Evict only query result for Q

Unsubscribe from all C(Ui) above

Issue update using template U

Send notification on C(U)

Conflicts determined eagerly (when caching Q)

Parameter-Specific ChannelsParameter-Specific Channels Optimization:Optimization: consider consider parameter bindingsparameter bindings

supplied at runtime … for example:supplied at runtime … for example:

Q5:Q5: SELECT qty FROM inv WHERE id = ? SELECT qty FROM inv WHERE id = ?• When issued with id = 29, create extra When issued with id = 29, create extra

parameter-specific channelparameter-specific channel C(5, 29) C(5, 29)• Subscribe to both C(5) and C(5, 29)Subscribe to both C(5) and C(5, 29)

Upon update:Upon update:• If update affects a single item with id = X, send If update affects a single item with id = X, send

notification on channel C(5, X)notification on channel C(5, X) Saves work if X Saves work if X 29 29

• Updates affecting multiple items sent to C(5)Updates affecting multiple items sent to C(5)

S3 PrototypeS3 Prototype Tomcat as proxy web server/servlet containerTomcat as proxy web server/servlet container Proxy database cache written in JavaProxy database cache written in Java Queries: access cached data when possibleQueries: access cached data when possible

• Cache JDBC query results (i.e., materialized views)Cache JDBC query results (i.e., materialized views)• Index results by JDBC query representationIndex results by JDBC query representation

MySQL4 as back-end databaseMySQL4 as back-end database Updates: sent to back-end databaseUpdates: sent to back-end database Invalidation notifications delivered via ScribeInvalidation notifications delivered via Scribe Experiments on Emulab (Utah) – Thanks!Experiments on Emulab (Utah) – Thanks!

Benchmark ApplicationsBenchmark Applications BookstoreBookstore (TPC-W, from UW-Madison) (TPC-W, from UW-Madison)

• Online bookseller, a standard web Online bookseller, a standard web benchmarkbenchmark

• Changed the popularity of booksChanged the popularity of books

AuctionAuction (RUBiS, from Rice) (RUBiS, from Rice) • Modeled after EbayModeled after Ebay

Bulletin boardBulletin board (RUBBoS, from Rice) (RUBBoS, from Rice)• Modeled after SlashdotModeled after Slashdot

Benchmarks model popular websites

Selective: cache queries only if subscribed to parameter-dependent groups

Impact of Cooperative CachingImpact of Cooperative Caching

0

50

100

150

200

250

bookstore brow sing mix bookstore shopping mix auction

Thr

ough

put (

WIP

S)

NoProxy

NoCache

SimpleCache

Ferdinand

OutlineOutline

Need for on-demand scalabilityNeed for on-demand scalability S3 invalidation mechanismS3 invalidation mechanism Security-scalability tradeoffSecurity-scalability tradeoff Reducing latencyReducing latency

Guaranteeing security in a DBSS settingGuaranteeing security in a DBSS setting

Security-Scalability tradeoff in the DBSS settingSecurity-Scalability tradeoff in the DBSS setting

Analyzing the code helps in managing this tradeoff

Limit ability to observe an application’s data by:– DBSS administrator– Unauthorized application through the

DBSS

A simple solution for guaranteeing securityA simple solution for guaranteeing security

Outsource database scalability Outsource database scalability • Home server: master copies of all data—Home server: master copies of all data—

handles updates directlyhandles updates directly No query execution on the DBSSNo query execution on the DBSS

• DBSS caches query results (read-only)—DBSS caches query results (read-only)—kept consistent by invalidationkept consistent by invalidation

All data passing through the DBSS can be encrypted:

Query, Update, Query results

A Simple ExampleA Simple Example

Empty

Home server Database

Q1: SELECT toy_id FROM toys WHERE toy_name=“GI Joe”

DBSSQ1

Q1:toy_id=15

Q1: toy_id=15

U1: DELETE FROM toys WHERE toy_id=5

U1

Empty

Q1

Nothing is encrypted

Results are encrypted

No Invalidations

Q1:

Q1:

U1

Invalidate

More encryption leads to more invalidations

11 Barbie

15 GI Joe

11 Barbie

15 GI Joe

toys (toy_id, toy_name)

Result

Result

Challenge: providing scalability Challenge: providing scalability while guaranteeing securitywhile guaranteeing security

Application faces a dilemma in what data to encrypt (secure)More encryption

Less encryption

Conservative Invalidation

Security

Precise Invalidation

Scalability

Security-scalability tradeoff

When updates occur, DBSS needs to invalidate

Opportunity for managing the tradeoffOpportunity for managing the tradeoff

But for most data, nontrivial to assess:

1. Data-sensitivity2. Scalability impact of securing the

data

Data Sensitivity Extremel

y sensitive

Completely insensitive

Moderately sensitive

Credit Card Information

Bestsellers list

Inventory records, customer records

Don’t careCare but worried about scalability impact

Secure atall costs

Not all data is equally sensitive

Key Insight: arbitrary queries and Key Insight: arbitrary queries and updates not possibleupdates not possible

function get_toy_id ($toy_name) {

$template:=“SELECT toy_id FROM toys

WHERE toy_name=?”;

$query:=attach_to_template ($template, $toy_name);

execute ($query);

}

Can statically identify data not needed for precise invalidation

Given templates:

Data not useful for invalidation: examplesData not useful for invalidation: examples

No data is needed for precise invalidation

Q1: SELECT toy_id FROM toys WHERE toy_name=?

U1: DELETE FROM toys WHERE toy_id=?

Query parameters are not needed for precise invalidation (the query result is needed though)

Example 2:

Example 1:Q1: SELECT toy_id FROM toys WHERE toy_name=?

Q2: SELECT toy_name FROM toys WHERE toy_id=?

Security without hurting scalabilitySecurity without hurting scalability

Security Conscious Scalability Approach [SIGMOD ’06]

Tradeoff has to be only managed over remaining data

Data not needed for invalidation

Can secure “for free” (without hurting scalability)

As a result,

Sample experiment: methodologySample experiment: methodology

California Privacy Law determined sensitive data California Privacy Law determined sensitive data Non-transactional invalidationNon-transactional invalidation Start with a cold cacheStart with a cold cache

Home serverCDN and DBSSUsers

5 ms 100 ms

• Scalability: max # concurrent users with acceptable response times

• Security: # templates with encrypted results

Benchmark ApplicationsBenchmark Applications BookstoreBookstore (TPC-W, from UW-Madison) (TPC-W, from UW-Madison)

• Online bookseller, a standard web Online bookseller, a standard web benchmarkbenchmark

• Changed the popularity of booksChanged the popularity of books

AuctionAuction (RUBiS, from Rice) (RUBiS, from Rice) • Modeled after EbayModeled after Ebay

Bulletin boardBulletin board (RUBBoS, from Rice) (RUBBoS, from Rice)• Modeled after SlashdotModeled after Slashdot

Benchmarks model popular websites

Security-Scalability TradeoffSecurity-Scalability Tradeoff

Template Parameters Query result

Invalidations

Blind All Q1, Q2, Q3

Template All Q1, Q2

Statement All Q1,

Q2 with toy_id=5

View Q1 with toy_id=5

Q2 with toy_id=5

U1: DELETE FROM toys WHERE toy_id=5

Sca

labili

tyS

ecu

rit

y

x x xxxx

Q1 SELECT toy_id FROM toys WHERE toy_name=?

Q2 SELECT qty FROM toys WHERE toy_id=?

Q3 SELECT cust_name FROM customers WHERE cust_id=?

X denotes encrypted, visible

Sca

labili

ty (

nu

mb

er

of

con

curr

en

t u

sers

su

pp

ort

ed

)

Magnitude of Security-Scalability tradeoff

00

Benchmark Applications

Security ResultsSecurity Results

Bboard

and result

Query data that can be encrypted “for free”

Parameters

Result

Nothing

Auction

18

6 4 17 7

12

Bookstore

14

7 7

Security Results in DetailSecurity Results in Detail

Auction:Auction: The historical record of user bids was The historical record of user bids was not exposednot exposed

Bboard:Bboard: The rating users give one another The rating users give one another based on the quality of their postingbased on the quality of their posting

Bookstore:Bookstore: Book purchase association rules Book purchase association rules discovered by the vendor – customers who discovered by the vendor – customers who purchase book A also purchase book Bpurchase book A also purchase book B

Scalability Conscious Security Approach Scalability Conscious Security Approach (SCSA) to managing the tradeoff(SCSA) to managing the tradeoff

1. Easy to either get good scalability or good security2. SCSA presents a shortcut to manage the tradeoff

Security (Number of query templates with encrypted results)

Sca

lab

ility

(N

um

ber

of

con

curr

en

t u

sers

su

pp

ort

ed

)

Nothing encrypted

SCSA

Everything encrypted

0

300

600

900

0 5 10 15 20 25 30

OutlineOutline

Need for on-demand scalabilityNeed for on-demand scalability S3 invalidation mechanismS3 invalidation mechanism Security-scalability tradeoffSecurity-scalability tradeoff Reducing latencyReducing latency

42

Contributors to User LatencyContributors to User Latency

Web serverApp server Database

DatabaseDBSSCDN

Traditional architecture

DBSS architecture

Request, high latency

Response, high latency

high latency

A single HTTP request Multiple database requests

43

Sample Web Application CodeSample Web Application Code

function find_comments ($user_id) {

$template:=“SELECT from_id, body FROM comments

WHERE to_id=?”

$query:=attach_to_template ($template, $user_id)

$result:=execute ($query)

foreach ($row in $result)

print (get_body ($row), get_name (get_id ($row)))

}

(N+1) queries are issued because:

• Convenient for programmers to abstract database values

• No effect in the traditional setting Found many examples in the benchmark

applications

44

Reducing User Latency in a DBSS Reducing User Latency in a DBSS SettingSetting

Transformations to reduce number of round-tripsTransformations to reduce number of round-trips1.1. Group execution of queries: Group execution of queries: MERGINGMERGING transformation transformation2.2. Overlap execution of queries: Overlap execution of queries: NONBLOCKINGNONBLOCKING transformation transformation

Proceduralprogram with embedded SQL Holistic

transformations using src-to-src compilers

Transformed program and SQL

Web Application Code

Transformed Code