Post on 14-Dec-2015
Lightweight Service Oriented Parallelism
Paul RoeQueensland University of Technology (QUT)
p.roe@qut.edu.au
2
QUT• Queensland University of Technology (QUT)• One of largest universities in Australia: 40,000+ students
(undergraduate, postgraduate, 10% international)• Applied emphasis, strong links with industry• Motto “A university for the real world”• Faculty of IT, 4000 students, 20% international
BrisbaneBrisbane
3
My Background
• Academic at QUT for 10 years
• I am a computer scientist background in– Programming languages– Distributed computing
• Practical / applied emphasis
• I lead a small research group interested in grid computing and eScience
4
Two Parts
• Introduction to web services and service orientation
• Lightweight Service Oriented Parallelism
5
Web services
6
Web services (WS)
• Computer to computer messaging using XML• Typically SOAP for messaging protocol with WSDL (Web
Service Definition Language)– Standard and platform neutral
• Designed for eCommerce and enterprise application integration
• Similarities with MPI– message passing– Support for different message exchange patterns
• Web service principles and technologies are evolving– Originally SOAP was for lightweight RPC between objects– SOAP and WSDL support RPC and messaging encoding and
styles– Now strong move to XML centric messaging
7
Why Not CORBA, DCOM, Java RMI etc.?
• Distributed object models try to scale local OO model– Ok for a LAN– Breaks for Internet
• Too complex– Assume an object model, virtual machine etc.– Large investment for little return
• Poor interoperability– WS designed for interoperability – primary goal
• Designed for local area networks rather than Internet• Not standards based (except CORBA)• Problems bootstrapping, ‘all or nothing’ approach• Other attempts e.g. EDI
– Problem fixed, not extensible
8
XML Basics
• XML is the basis for web services
• XML is platform neutral data language
• XML is three things:1. Family of specifications e.g. XSLT, XPath,
…
2. Serialisation format (XML 1.0 with tags etc.)
3. Infoset: Model for data
• XML can be described by XML schema
9
Infoset
• Infoset is a model of XML– Essence of XML
• XML is no longer just a syntax• This is important – opens the way to other
representations of XML
XML is very inefficient; it’s verbose, there’s lots of angle brackets, everything’s a Unicode string, there’s no binary format; you’ve always got to parse it first, and that’s why web services are slow …
XML is very inefficient; it’s verbose, there’s lots of angle brackets, everything’s a Unicode string, there’s no binary format; you’ve always got to parse it first, and that’s why web services are slow …
Wrong!
10
SOAP
• Provides two key features for XML based messaging– Separation of message header vs payload data
(envelope with header and body)– Standard way to report faults
• No further evolution of SOAP necessary!• Extensible header mechanism supports modular
and composable advanced services e.g. security, transactions and reliability– Vital feature
11
SOAP
<envelope>
<header>: Message context
<body>: Message payload, data
<fault>: Soap error (optional)
12
SOAP Extensible Headers
<soap:Envelopexmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xmlns:xsd="http://www.w3.org/2001/XMLSchema"xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Header> <t:Transaction xmlns:t="some-URI" soap:mustUnderstand="1"> 5 </t:Transaction> </soap:Header> <soap:Body> <Add xmlns="http://www.qut.edu.au/"> <a>1</a> <b>2</b> </Add> </soap:Body></soap:Envelope>
Extensible header info: can be optional or mandatory
SOAP body, message payload
13
WSDL (1.1)
<definitions>: root element
<types>: What data types will be transmitted?
<messages>: What messages will be transmitted?
<portType>: What operations (functions) will be supported?
<binding>: How will messages be transmitted + SOAP specifics, encoding etc.
<service>: Where is the service located?
Abstract,c.f.interface
concrete
WSDL is an XML document. Elements can be split across multiple files.
(Typically XML Schema)
14
Web service invocation:The big picture
WebserviceProxy
WSDL doc (contains/refs XML schema)
Generate usingdeveloper tools e.g.Visual Studio or Eclipse
describes
XMLdocument
ClientProgram
sender receiver
ServerProgram
Webservice
stub
Deserialisemessage
Serialisemessage
Send XMLmessageon the wire,SOAP format
15
Web Services Landscape
SecurityReliable
MessagingTransactions
WS-Policy
WS-Addressing, MTOM
XML, SOAP
HTTP, HTTPS, SMTP, TCP, …Transport
Messaging
Composableservice assurances
WSDL,XML Schema
Discovery: UDDI,WS–Discovery,
MetaDataExchange
Description
16
Service Orientation
17
Service Orientation (SO)
• Architectural view of software and systems inspired by web services
• Much hype!• “Service-oriented development focuses on systems that
are built from a set of autonomous services.” Don Box• No flat space containing a sea of objects• There are four tenets:
– Boundaries are explicit– Services are autonomous– Services share schema and contract, not class– Service compatibility is determined based on policy
• Key idea services are loosely coupled and autonomous– Web services are one possible implementation
18
SO vs Distributed Objects• CORBA, DCOM, Java RMI etc. try to present a uniform view of the
world– Common object model– Set of objects all living in the same space– Ok for a LAN: single admin domain, reliable, simple security,
homogeneous• Doesn’t work on the internet• Can’t do business by dictation: you must use Corba / RMI / DCOM
etc.• Increasingly doesn’t work in LAN
– Move to more structure, local firewalls and tiered admin within organisations
• Déjà vu?– C.f. TCP sockets (no shared implementation)
• Policy => metadata
19
Parallelism
20
Motivation and Ideas
• Use SOAP instead of MPI– Interoperability– Leverage higher level WS specs e.g. security
• Service orientation decouples clients and servers, producers and consumers
• Simple producer consumer models of parallelism can benefit from SO– E.g. when producers are legacy applications and
consumers are modern e.g. WS enabled apps or modern scripts
21
Two Simple Models of Parallelism
• (Both producer consumer)
• Futures (Task-result)– Lisp futures or Cilk etc.
• Linda– Tuple space, JavaSpaces etc.
22
Futures
• Idea, spawn function calls – asynchronous– handle = Future (Add(1,2))
– Create a task to perform Add(1,2)
– Can interrogate the handle to enquire on result
• Web services can naturally express this form of communication
handle Add(int,int)
int+ getAdd(handle)
Client Cluster
23
Add Request
<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <Add xmlns="http://www.qut.edu.au/"> <a>1</a> <b>2</b> </Add> </soap:Body></soap:Envelope>
24
Add Response
<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <AddResult xmlns="http://www.qut.edu.au/"> 437643786432 </AddResult> </soap:Body></soap:Envelope>
25
getResultAdd Request
<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <getAdd xmlns="http://www.qut.edu.au/"> <handle>437643786432</handle> </getAdd > </soap:Body></soap:Envelope>
26
getResultAdd Response
<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <getAddResult xmlns="http://www.qut.edu.au/"> 3 </getAddResult> </soap:Body></soap:Envelope>
If result not ready return null (empty)
27
Caching
• Assume computation is ‘functional’• Cache results on server• Sessionless• Poll server until get result• Need to match args to see if already got result• Can support both kinds of function in web service
interface
int+ Add(int,int)
Client Cluster
28
Data Parallelism
• Problem, asynchronous programming model rather tricky• Often want to invoke many functions en mass• Can build data parallel abstractions in language to support data
parallelism– E.g. matrix add
• Also build into web service framework, automatically lift point wise operations
int+ [] Add(int[],int[])
Client Cluster
29
System Overview
Client Server
WebServices
Web Services
Grid/ Cluster
Web Server
Job Repository(function cache)
DecoupledAnd autonomous
30
System Properties• Job requestors poll for results and for creating tasks• Job executors poll for jobs
• Decouple result requestors/consumers from result producers• Result producers can be legacy code• Result consumers can be different code• Completely decoupled• Can share results• Also naturally fault tolerant if cache results in a stable store
• (Service orientation:1. Boundaries are explicit 2. Services are autonomous)
31
Result cache
• Need a stable store• Need to efficiently store results and compare
arguments XML• Use an XML database e.g.
– Xindice, SQL Server 2005 etc.
• One table per job type e.g. table for Add• Use stored procedures to perform operations• Need facility to create tables
– Also a web service
32
Jobs, Schema and Web Services
ServerWeb Services
Job tableCreate job
Get result
Get result
Create table
SchemaWSDL
Web Services
Put result
Data parallel
Job creators /consumers
Job executors
33
Database
34
WSDL, Schema etc
• Typed jobs: when a job type is created the schema must be provided for the inputs and outputs to the function.
• The WSDL, table, and web services are created automatically
• (Service orientation:3. Services share schema and contract, not class
4.Service compatibility is determined based on policy)
35
Details
• Using SQL 2005
• Supports XML indexing, but not testing XML for equality
• Therefore need an efficient mechanism to compare web service call inputs with what already in database
• Use canonicalisation provided by XML security and generate a hash from this
36
User Interface
37
Utilising Idle Machines
• (old project G2, g2.fit.qut.edu.au)• System is amenable to cycle scavenging• Extend the system to also support code
caching and distribution for simple code • Can be heterogenous and support Java
applets, .NET etc.• Volunteer machines download jobs and
code• Extra table in database
38
Results
• Blast application running on ten node test cluster– Speedup of 9.96 times for 40 jobs of approx 1m57s
duration• The bioinformatics SVM application in 50 PC lab
(cycle scavenging)– Speedup of 46 times with 200 jobs of approx 1m44s
duration (input and output were negligible)• Works well for coarse grained parallelism
• To generate tasks simply send an XML doc to the server via a tool or DIY
39
REST
• Many end user applications support binding to XML– E.g. in Excel can simply import XML data
• REST – different style of web services based on HTTP verbs
• Expose results as XML through a URL e.g.– eresearch.fit.qut.edu.au/g2x/Add/1/2– Results in an XML doc
40
Linda
• (Work in progress)• Alternative simple model of parallelism• Linda has a tuple space and 4 operations:
– in, out, rd, eval
• Add and copy/remove tuples from tuplespace• Remove and copy by associative matching on
data• Naturally asynchronous model
41
XML Databases and Linda
• Use XML instead of tuples• XML databases store XML data and support querying
data• Build a Linda like system• SQL server supports XQuery (Xindice supports XPath)• Use XQuery to query for data
– XQuery is a SQL like functional language for querying XML data
• Have a few simple web services to add and remove XML data
• (related work on XSpaces etc.)
42
Operations
• Like functional case support creation of typed XML tables, but hold just a single XML value
• Operations (web services)
URL CreateLindaTable(XML Schema)void Put(XMLDoc[])XMLDoc[] Take (XQuery-string)XMLDoc[] Copy (XQuery-string)
43
Linda
Cluster
<foo></foo>
<foo></foo>
<foo></foo>
<foo></foo>
<foo></foo>
TableXML documents
ProducersPut(<foo> … </foo>)
ConsumersTake(“for $v in / where $v/@val < 2000 return $v”)
Web services
44
Preliminary Results
• Preliminary results encouraging• Sending around XQueries – some security issues e.g.
DoS attacks etc.• Model well suited to certain algorithms e.g. genetic
algorithms where got a set of improving values• Producers and consumers tend to be the same program
– But just need to generate and send XML docs to server
• Can have multiple tables– Locking?
45
Future Work• Search on functional parallelism cache• Notification interface• WS Resource Framework• Untyped jobs• Security• Connect to a proper job scheduler• Server is a bottleneck – can we use
database replication etc. to alleviate this
46
Conclusions
• Web services and databases can support simple lightweight service oriented parallelism
• Service orientation very useful, particularly the decoupling
• Databases useful – highly tuned
• Need to support different paradigms