Dynamic Memory Allocation (also see pointers lectures) -L. Grewe.
1 CS6320 – Performance L. Grewe. Number of requests a website receives is unpredictable CNN, NY...
-
Upload
jennifer-simpson -
Category
Documents
-
view
213 -
download
0
Transcript of 1 CS6320 – Performance L. Grewe. Number of requests a website receives is unpredictable CNN, NY...
Number of requests a website Number of requests a website receives is unpredictablereceives is unpredictable
CNN, NY Times, ABC News unavailable from 9-10 AM
(Eastern Time)
Content providers’ dilemma: how many resources to provision?
Need on-demand scalabilty
Usual
9/11*
0
50
100
150
Pag
e v
iew
s/day
(in
mill
ion
s)
CNN.com
Content Delivery Network (CDN) SolutionContent Delivery Network (CDN) Solution
Source: http://www.tcsa.org/lisa2001/cnn.txt http://www.akamai.com/en/html/about/press/press479.html
0
200
400
600
800 Normal
12-Sep-01
Election day(Nov 2), 2004
Used Akamai on Election day
Page was 1.2k instead of 50k on 12 Sep, 01P
ag
e v
iew
s/day
(in
mill
ion
s)
50k
50k
1.2k
CNN.com
Typical Web-Site ArchitectureTypical Web-Site Architecture
Home server
Web Server
App Server
DB
Request
Response
Executecode
AccessDB
Users
CDN ArchitectureCDN Architecture
Users
Internet core
Content providers
CDN nodes
CDNs excel at delivering static content.
Advantages of CDNsAdvantages of CDNs
Large infrastructure handles load Large infrastructure handles load spikesspikes
Clients charged on a per-usage basisClients charged on a per-usage basis• no need to guess what resources to no need to guess what resources to
provisionprovision Moves data closer to end-usersMoves data closer to end-users
• decreases latency and increases decreases latency and increases throughputthroughput
CDN Application Services CDN Application Services
but for data-intensive dynamic applications…but for data-intensive dynamic applications…
DBUser
s
database server becomes the bottleneck!
Internet
CDN’s can also run applications
Methods to scale the database Methods to scale the database componentcomponent
In-house database scalability: In-house database scalability: [DBCache, [DBCache, DBProxy, MTCache, NEC Cache Portal] DBProxy, MTCache, NEC Cache Portal] • Must provision for peak loadMust provision for peak load
Database outsourcing: Database as a service Database outsourcing: Database as a service [Hacigumus+ ICDE ’02, SIGMOD ’02][Hacigumus+ ICDE ’02, SIGMOD ’02]• Have to cede control of dataHave to cede control of data
Database Scalability Service (DBSS): Shared Database Scalability Service (DBSS): Shared infrastructure that caches applications’ data infrastructure that caches applications’ data [INRIA/LIP6, CIDR ’05, SIGMOD ’06, ICDE ’07][INRIA/LIP6, CIDR ’05, SIGMOD ’06, ICDE ’07]
S3 Database Scalability ServiceS3 Database Scalability Service CDN-like proxy nodes cache results of CDN-like proxy nodes cache results of
database queriesdatabase queries• reduces load on central database serversreduces load on central database servers
All database updates sent to central serverAll database updates sent to central server• clients don’t cede ownership of their dataclients don’t cede ownership of their data
Uses publish/subscribe system to maintain Uses publish/subscribe system to maintain data consistencydata consistency• avoids additional load at the central serveravoids additional load at the central server
Content provider may encrypt database Content provider may encrypt database requests/responses to protect sensitive requests/responses to protect sensitive datadata
InternetInternet
Database Scalability ServiceDatabase Scalability Service
DBSSDBSS
home serverdatabases:
users:
Content Delivery NetworkContent Delivery Network
Database Scalability ServiceDatabase Scalability Service
DBSSDBSS
home serverdatabases:
users:
InternetInternet
Web and application serversWeb and application servers
Database Scalability ServiceDatabase Scalability Service
DBSSDBSS
home serverdatabases:
client apps:
Internet
OutlineOutline
Need for on-demand scalabilityNeed for on-demand scalability S3 invalidation mechanismS3 invalidation mechanism Security-scalability tradeoffSecurity-scalability tradeoff Reducing latencyReducing latency
Addressing consistencyAddressing consistency TTL is TTL is wastefulwasteful::
• Often refresh cached data unnecessarily Often refresh cached data unnecessarily (workloads dominated by reads)(workloads dominated by reads)
• Must set TTL=0 for strong consistency!Must set TTL=0 for strong consistency!
SolutionSolution: update or invalidate cached data : update or invalidate cached data only when affected by updatesonly when affected by updates
• Naïve approachNaïve approach: home organizations notify : home organizations notify proxy servers of relevant updates proxy servers of relevant updates not not scalablescalable
Our approach:
Fully-distributed, proxy-to-proxyupdate notification mechanism
Distributed Consistency MechanismDistributed Consistency Mechanism
Multicast Environme
ntproxy node
update notification
update notification
users
update
• Distributed app-level multicast environment, e.g., Scribe
• Forward all updates to backend home servers
Configuring Multicast ChannelsConfiguring Multicast Channels Key observation:Key observation: Web applications typically interact with DB via a Web applications typically interact with DB via a
small, fixed set of query/update templates (usually 10-100)small, fixed set of query/update templates (usually 10-100)
Example:Example:
SELECT qty FROM inv WHERE id = ?SELECT qty FROM inv WHERE id = ?UPDATE inv SET qty = ? WHERE id = ?UPDATE inv SET qty = ? WHERE id = ?
Templates: natural way to configure channels
Options:Channel-by-query or Channel-by-update
Channel-by-Query OptionChannel-by-Query Option One channel per One channel per queryquery template Q: C(Q) template Q: C(Q)
Few subscriptions/cached resultFew subscriptions/cached result Many invalidation notifications/updateMany invalidation notifications/update
Begin caching result(s) of query template Q
Subscribe to C(Q)
Evict only query result for Q
Unsubscribe from C(Q)
Issue updateDetermine which query templates Q1, …, Qn affected; send notification on each C(Qi)
Conflicts determined lazily (upon update)
One channel per One channel per updateupdate template U: C(U) template U: C(U)
Many subscriptions/cached resultMany subscriptions/cached result Few invalidation notifications/updateFew invalidation notifications/update
Channel-by-Update OptionChannel-by-Update Option
Begin caching result(s) of query template Q
Determine which update templates U1, …, Un apply; subscribe to each C(Ui)
Evict only query result for Q
Unsubscribe from all C(Ui) above
Issue update using template U
Send notification on C(U)
Conflicts determined eagerly (when caching Q)
Parameter-Specific ChannelsParameter-Specific Channels Optimization:Optimization: consider consider parameter bindingsparameter bindings
supplied at runtime … for example:supplied at runtime … for example:
Q5:Q5: SELECT qty FROM inv WHERE id = ? SELECT qty FROM inv WHERE id = ?• When issued with id = 29, create extra When issued with id = 29, create extra
parameter-specific channelparameter-specific channel C(5, 29) C(5, 29)• Subscribe to both C(5) and C(5, 29)Subscribe to both C(5) and C(5, 29)
Upon update:Upon update:• If update affects a single item with id = X, send If update affects a single item with id = X, send
notification on channel C(5, X)notification on channel C(5, X) Saves work if X Saves work if X 29 29
• Updates affecting multiple items sent to C(5)Updates affecting multiple items sent to C(5)
S3 PrototypeS3 Prototype Tomcat as proxy web server/servlet containerTomcat as proxy web server/servlet container Proxy database cache written in JavaProxy database cache written in Java Queries: access cached data when possibleQueries: access cached data when possible
• Cache JDBC query results (i.e., materialized views)Cache JDBC query results (i.e., materialized views)• Index results by JDBC query representationIndex results by JDBC query representation
MySQL4 as back-end databaseMySQL4 as back-end database Updates: sent to back-end databaseUpdates: sent to back-end database Invalidation notifications delivered via ScribeInvalidation notifications delivered via Scribe Experiments on Emulab (Utah) – Thanks!Experiments on Emulab (Utah) – Thanks!
Benchmark ApplicationsBenchmark Applications BookstoreBookstore (TPC-W, from UW-Madison) (TPC-W, from UW-Madison)
• Online bookseller, a standard web Online bookseller, a standard web benchmarkbenchmark
• Changed the popularity of booksChanged the popularity of books
AuctionAuction (RUBiS, from Rice) (RUBiS, from Rice) • Modeled after EbayModeled after Ebay
Bulletin boardBulletin board (RUBBoS, from Rice) (RUBBoS, from Rice)• Modeled after SlashdotModeled after Slashdot
Benchmarks model popular websites
Impact of Cooperative CachingImpact of Cooperative Caching
0
50
100
150
200
250
bookstore brow sing mix bookstore shopping mix auction
Thr
ough
put (
WIP
S)
NoProxy
NoCache
SimpleCache
Ferdinand
OutlineOutline
Need for on-demand scalabilityNeed for on-demand scalability S3 invalidation mechanismS3 invalidation mechanism Security-scalability tradeoffSecurity-scalability tradeoff Reducing latencyReducing latency
Guaranteeing security in a DBSS settingGuaranteeing security in a DBSS setting
Security-Scalability tradeoff in the DBSS settingSecurity-Scalability tradeoff in the DBSS setting
Analyzing the code helps in managing this tradeoff
Limit ability to observe an application’s data by:– DBSS administrator– Unauthorized application through the
DBSS
A simple solution for guaranteeing securityA simple solution for guaranteeing security
Outsource database scalability Outsource database scalability • Home server: master copies of all data—Home server: master copies of all data—
handles updates directlyhandles updates directly No query execution on the DBSSNo query execution on the DBSS
• DBSS caches query results (read-only)—DBSS caches query results (read-only)—kept consistent by invalidationkept consistent by invalidation
All data passing through the DBSS can be encrypted:
Query, Update, Query results
A Simple ExampleA Simple Example
Empty
Home server Database
Q1: SELECT toy_id FROM toys WHERE toy_name=“GI Joe”
DBSSQ1
Q1:toy_id=15
Q1: toy_id=15
U1: DELETE FROM toys WHERE toy_id=5
U1
Empty
Q1
Nothing is encrypted
Results are encrypted
No Invalidations
Q1:
Q1:
U1
Invalidate
More encryption leads to more invalidations
11 Barbie
15 GI Joe
11 Barbie
15 GI Joe
toys (toy_id, toy_name)
Result
Result
Challenge: providing scalability Challenge: providing scalability while guaranteeing securitywhile guaranteeing security
Application faces a dilemma in what data to encrypt (secure)More encryption
Less encryption
Conservative Invalidation
Security
Precise Invalidation
Scalability
Security-scalability tradeoff
When updates occur, DBSS needs to invalidate
Opportunity for managing the tradeoffOpportunity for managing the tradeoff
But for most data, nontrivial to assess:
1. Data-sensitivity2. Scalability impact of securing the
data
Data Sensitivity Extremel
y sensitive
Completely insensitive
Moderately sensitive
Credit Card Information
Bestsellers list
Inventory records, customer records
Don’t careCare but worried about scalability impact
Secure atall costs
Not all data is equally sensitive
Key Insight: arbitrary queries and Key Insight: arbitrary queries and updates not possibleupdates not possible
function get_toy_id ($toy_name) {
$template:=“SELECT toy_id FROM toys
WHERE toy_name=?”;
$query:=attach_to_template ($template, $toy_name);
execute ($query);
…
}
Can statically identify data not needed for precise invalidation
Given templates:
Data not useful for invalidation: examplesData not useful for invalidation: examples
No data is needed for precise invalidation
Q1: SELECT toy_id FROM toys WHERE toy_name=?
U1: DELETE FROM toys WHERE toy_id=?
Query parameters are not needed for precise invalidation (the query result is needed though)
Example 2:
Example 1:Q1: SELECT toy_id FROM toys WHERE toy_name=?
Q2: SELECT toy_name FROM toys WHERE toy_id=?
Security without hurting scalabilitySecurity without hurting scalability
Security Conscious Scalability Approach [SIGMOD ’06]
Tradeoff has to be only managed over remaining data
Data not needed for invalidation
Can secure “for free” (without hurting scalability)
As a result,
Sample experiment: methodologySample experiment: methodology
California Privacy Law determined sensitive data California Privacy Law determined sensitive data Non-transactional invalidationNon-transactional invalidation Start with a cold cacheStart with a cold cache
Home serverCDN and DBSSUsers
5 ms 100 ms
• Scalability: max # concurrent users with acceptable response times
• Security: # templates with encrypted results
Benchmark ApplicationsBenchmark Applications BookstoreBookstore (TPC-W, from UW-Madison) (TPC-W, from UW-Madison)
• Online bookseller, a standard web Online bookseller, a standard web benchmarkbenchmark
• Changed the popularity of booksChanged the popularity of books
AuctionAuction (RUBiS, from Rice) (RUBiS, from Rice) • Modeled after EbayModeled after Ebay
Bulletin boardBulletin board (RUBBoS, from Rice) (RUBBoS, from Rice)• Modeled after SlashdotModeled after Slashdot
Benchmarks model popular websites
Security-Scalability TradeoffSecurity-Scalability Tradeoff
Template Parameters Query result
Invalidations
Blind All Q1, Q2, Q3
Template All Q1, Q2
Statement All Q1,
Q2 with toy_id=5
View Q1 with toy_id=5
Q2 with toy_id=5
U1: DELETE FROM toys WHERE toy_id=5
Sca
labili
tyS
ecu
rit
y
x x xxxx
Q1 SELECT toy_id FROM toys WHERE toy_name=?
Q2 SELECT qty FROM toys WHERE toy_id=?
Q3 SELECT cust_name FROM customers WHERE cust_id=?
X denotes encrypted, visible
Sca
labili
ty (
nu
mb
er
of
con
curr
en
t u
sers
su
pp
ort
ed
)
Magnitude of Security-Scalability tradeoff
00
Benchmark Applications
Security ResultsSecurity Results
Bboard
and result
Query data that can be encrypted “for free”
Parameters
Result
Nothing
Auction
18
6 4 17 7
12
Bookstore
14
7 7
Security Results in DetailSecurity Results in Detail
Auction:Auction: The historical record of user bids was The historical record of user bids was not exposednot exposed
Bboard:Bboard: The rating users give one another The rating users give one another based on the quality of their postingbased on the quality of their posting
Bookstore:Bookstore: Book purchase association rules Book purchase association rules discovered by the vendor – customers who discovered by the vendor – customers who purchase book A also purchase book Bpurchase book A also purchase book B
Scalability Conscious Security Approach Scalability Conscious Security Approach (SCSA) to managing the tradeoff(SCSA) to managing the tradeoff
1. Easy to either get good scalability or good security2. SCSA presents a shortcut to manage the tradeoff
Security (Number of query templates with encrypted results)
Sca
lab
ility
(N
um
ber
of
con
curr
en
t u
sers
su
pp
ort
ed
)
Nothing encrypted
SCSA
Everything encrypted
0
300
600
900
0 5 10 15 20 25 30
OutlineOutline
Need for on-demand scalabilityNeed for on-demand scalability S3 invalidation mechanismS3 invalidation mechanism Security-scalability tradeoffSecurity-scalability tradeoff Reducing latencyReducing latency
42
Contributors to User LatencyContributors to User Latency
Web serverApp server Database
DatabaseDBSSCDN
Traditional architecture
DBSS architecture
Request, high latency
Response, high latency
high latency
A single HTTP request Multiple database requests
43
Sample Web Application CodeSample Web Application Code
function find_comments ($user_id) {
$template:=“SELECT from_id, body FROM comments
WHERE to_id=?”
$query:=attach_to_template ($template, $user_id)
$result:=execute ($query)
foreach ($row in $result)
print (get_body ($row), get_name (get_id ($row)))
}
(N+1) queries are issued because:
• Convenient for programmers to abstract database values
• No effect in the traditional setting Found many examples in the benchmark
applications
44
Reducing User Latency in a DBSS Reducing User Latency in a DBSS SettingSetting
Transformations to reduce number of round-tripsTransformations to reduce number of round-trips1.1. Group execution of queries: Group execution of queries: MERGINGMERGING transformation transformation2.2. Overlap execution of queries: Overlap execution of queries: NONBLOCKINGNONBLOCKING transformation transformation
Proceduralprogram with embedded SQL Holistic
transformations using src-to-src compilers
Transformed program and SQL
Web Application Code
Transformed Code