Jason E. Huffer, M.S. Washington Safety Management Solutions (WSMS) Jason.Huffer@wsms
Query Processing in a WSMS - Stanford University
Transcript of Query Processing in a WSMS - Stanford University
Query Processing in a WSMS
Utkarsh SrivastavaStanford University
Joint work withJennifer Widom, Kamesh Munagala, Rajeev Motwani,
and Gene Pang
March 24, 2008 InfoLab Workshop 2
What are Web Services?Highly standardized method of sharing data
and functionality
WebServices
Discovery and Description
UDDI, WSDLUser/Client
March 24, 2008 InfoLab Workshop 3
What are Web Services?Highly standardized method of sharing data
and functionality
WebServices
Communication
SOAPUser/Client
March 24, 2008 InfoLab Workshop 4
Example
March 24, 2008 InfoLab Workshop 5
Result of Invocation
March 24, 2008 InfoLab Workshop 6
Query Across Web Services
WS1
symbol companyinfo.
WS2
symbol stockpriceinfo.
NASDAQ
Query
Results
User/Client
find info. about allcompanies whose
stock had >10% change
Complex
March 24, 2008 InfoLab Workshop 7
Taking a Cue from DBMSs
DataDatabase
Management System
Query
Results
User/Client
DeclarativeInterfaceSimple All the complexity
March 24, 2008 InfoLab Workshop 8
Web Service Management System
WS1
symbol companyinfo.
WS2
symbol stockpriceinfo.
NASDAQ
Query
Results
User/Client
Web ServiceManagement
System(WSMS)
March 24, 2008 InfoLab Workshop 9
Web Service Management System
Client
WS1
WS2
WSn
WSMS
Query and Input Data
Results
Declarative Interface WS InvocationsMetadata Component
WS Registration
SchemaMapper
Query Processing Component
PlanSelection
PlanExecution
ResponseTime Profiler
StatisticsTracker
Profiling and Statistics Component
March 24, 2008 InfoLab Workshop 10
Query over Web Services – An ExampleCredit card company wishes to send offers to thosea)Who have a credit rating > 50, and, b)Who have a payment history = “Good” on a prior
card.
Company has at its disposalL: List of potential recipient SSNsWS1: SSN ! credit ratingWS2: SSN ! card no(s)WS3: card no. ! payment history
March 24, 2008 InfoLab Workshop 11
Plan 1
Client
WS1
WS2
WS3
WSMS
Results
L (SSN)
SSN ! cr
SSN ! ccn
ccn ! ph
Filter on cr, keep SSN
SSNSSN, cr
SSN, ccn
SSN, ccn, phFilter on ph, keep SSN
Note pipelined processing
SSN cr1 602 70
SSN ccn1 xx12 xx2
ccn phxx1 Bxx2 G
SSN12
SSN2
March 24, 2008 InfoLab Workshop 12
Simple Representation of Plan 1
L WS1 WS2 WS3 Results
March 24, 2008 InfoLab Workshop 13
Plan 2
Client
WS1
WS2
WS3
WSMS
Results
L (SSN)
Filter on cr, keep SSN
SSNSSN, cr
SSN, ccn
SSN, ccn, phFilter on ph, keep SSN
JoinSSN
SSN ! cr
SSN ! ccn
ccn ! ph
SSN cr1 602 70
SSN ccn1 xx12 xx2
ccn phxx1 Bxx2 G
SSN12
SSN2
March 24, 2008 InfoLab Workshop 14
Simple Representation of Plan 2
L
WS1
WS2 WS3
Results
March 24, 2008 InfoLab Workshop 15
Quiz
L
WS1
WS2 WS3
Results
L WS1 WS2 WS3 Results
In Plan 1, WS2 has to process only filtered SSNsIn Plan 2, WS2 has to process all SSNs
Which plan is better?
Cost Metric: Steady-state throughput
Plan 1
Plan 2
March 24, 2008 InfoLab Workshop 16
Query Planning Recap
Possible plans P1, …, Pn
Statistics S
Cost Metric cost(Pi, S)
Want to find least-cost plan
March 24, 2008 InfoLab Workshop 17
Class of Queries Considered
“Select-Project-Join” queries over input data and set of web services
Precedence constraintsInput for WSi may be provided by the output of WSj
e.g., WS2: SSN ! ccn and WS3: ccn ! phPrecedence constraints impose a DAG.
March 24, 2008 InfoLab Workshop 18
Statistics: Response Time
ci: per-tuple response time of WSi from client
Assume independent response times
WS1
SSN ! crClient
SSN
cr
March 24, 2008 InfoLab Workshop 19
Statistics: Selectivitysi: selectivity of WSiAverage number of output tuples per input tuple to
WSi
a) WS1: SSN ! crIf 90% individuals have cr > 50, s1 = 0.9
b) WS2: SSN ! ccnIf on average each SSN holds 2 credit cards, s2
= 2
Assume independent selectivities
March 24, 2008 InfoLab Workshop 20
Bottleneck Cost MetricLunch Buffet
Dish 1 Dish 2 Dish 3 Dish 4
Overall per-item processing time=
Response time of slowest or bottleneck stage in pipeline
March 24, 2008 InfoLab Workshop 21
Cost Expression for Plan PRi(P): Predecessors of WSi in plan P
Fraction of input tuples seen by WSi =
Response time per original input tuple at WSi
Assumption: WSMS cost is not the bottleneckContrast with sum cost metric
March 24, 2008 InfoLab Workshop 22
Problem StatementInput:
Set of web services WS1, …, WSnResponse times c1, …, cnSelectivities s1, …, snPrecedence constraints among web services
Output:Arrange web services into a plan PP respects all precedence constraintscost(P) by the bottleneck metric is minimized
March 24, 2008 InfoLab Workshop 23
No Precedence ConstraintsAll selectivities · 1
Theorem: Optimal to linearly order by increasing ci
General case
…LocalJoin at
WSMS
selective web services
in increasing cost order
proliferativeweb services
Results
March 24, 2008 InfoLab Workshop 24
With Precedence Constraints
Sum cost metricHard to approximate to within a factor O(nθ)
Bottleneck cost metricSurprisingly, solvable in polynomial time Developed an O(n5) algorithm
Adds one WS at a time to the planWS to be added is chosen by solving a linear program.
March 24, 2008 InfoLab Workshop 25
Isn’t this the same as …?Web Service Composition
Targeted towards workflow-oriented applicationsDon’t give provably optimal strategies
Parallel and Distributed Query OptimizationFreedom to move query operators aroundMuch larger space of execution plans
Data Integration, MediatorsIntegrate general sources of dataPrimarily optimize the cost at the integration system itself
March 24, 2008 InfoLab Workshop 26
Implementation
Building a prototype general-purpose WSMSWritten in JavaUses Apache Axis, an open-source implementation of SOAPImplements query planning and execution
March 24, 2008 InfoLab Workshop 27
Future DirectionsMonetary cost of invoking web services
Optimize combination of response time and cost
Variations in web service response timesDepends on provisioning, load, network conditionsConsider adaptive plans and/or robust plans
Statistics CollectionSelf-tuning histograms are relevant
Extension to optimizing workflows
March 24, 2008 InfoLab Workshop 28
Conclusion
Query
Results
User/Client Questions?
http://infolab.stanford.edu/wsms
WebServices