Services and Mashups Roy Williams California Institute of Technology

66
Services and Mashups Roy Williams California Institute of Technology

description

Services and Mashups Roy Williams California Institute of Technology. Agenda. VIM Portal for VO mashup Scaling services Asynchronous (batch) Security Advanced services AJAX SOAP. Making a portal for a command line application. The command-line application - PowerPoint PPT Presentation

Transcript of Services and Mashups Roy Williams California Institute of Technology

Page 1: Services and Mashups Roy Williams California Institute of Technology

Services and Mashups

Roy Williams

California Institute of Technology

Page 2: Services and Mashups Roy Williams California Institute of Technology

Agenda

• VIM– Portal for VO mashup

• Scaling services– Asynchronous (batch)– Security

• Advanced services– AJAX– SOAP

Page 3: Services and Mashups Roy Williams California Institute of Technology

Making a portal for a command line application

The command-line application $ mycode -apple 56 -banana 5346 -orange SDSS

(1) Make HTML form

<center> <h4>Mycode Portal</h4> </center>

Please fill in values<br/><form method=GET action="http://localhost/cgi-bin/mycodeportal">

Apple: <input name="apple"><br/>Banana: <input name="banana"><br/>Orange: <select name="orange"><br/> <option value="SDSS">Sloan Digital Sky Survey DR5</option> <option value="2MASS">2MASS All-Sky Catalog</option></select><input type=submit value="Run Mycode">

</form>

Page 4: Services and Mashups Roy Williams California Institute of Technology

Making a portal for a command line application

(2) Make CGI wrapper

import cgiform = cgi.FieldStorage()

cmd = "mycode -apple %s -banana %s -orange %s" ]% (form["apple"], form["banana", form["orange"])

print "Content-type: text/plain\n"print "Command %s" % cmd

pipe = os.popen(cmd)print "Stdout %s", pipe.read()print "Exit status %s", pipe.close()

Page 5: Services and Mashups Roy Williams California Institute of Technology

More VOTable

<VOTABLE version="v1.0"> <RESOURCE type="results"> <DESCRIPTION>Results from query to NASA/IPAC Extragalactic Database (NED) …. </DESCRIPTION>

<TABLE ID="NED_MainTable" name="Searching NED within 0.3 arcmin of 178.542980, 10.796330"> <DESCRIPTION>Main information about object (Cone Search results)</DESCRIPTION>

<PARAM ucd="time.equinox" datatype="char" value="J2000.0" name="Equinox"/> <PARAM ucd="pos.system.coord" datatype="char" value="Equatorial" name="CoordSystem"/>

<FIELD ucd="meta.id" datatype="int" ID="main_col1" name="No."> <DESCRIPTION>A sequential object number applicable to this list only.</DESCRIPTION> </FIELD>

<FIELD ucd="meta.id;meta.main" datatype="char" arraysize="30" ID="main_col2" name="Object Name"> <DESCRIPTION>NED preferred name for the object</DESCRIPTION> </FIELD>

<FIELD ucd="pos.eq.ra;meta.main" datatype="double" ID="main_col3" unit="degrees" name="pos_ra_equ"> <DESCRIPTION>Right Ascension in degrees (Equatorial J2000.0)</DESCRIPTION> </FIELD>

<FIELD ucd="pos.eq.dec;meta.main" datatype="double" ID="main_col4" unit="degrees" name="pos_dec_equ"> <DESCRIPTION>Declination in degrees (Equatorial J2000.0)</DESCRIPTION> </FIELD>…….

Page 6: Services and Mashups Roy Williams California Institute of Technology

VIM

187.209, -1.938, NGC 4454208.826, 59.506, NGC 5376214.218, 10.807, NGC 5532187.844, 57.964, NGC 4500130.384, 4.971, NGC 2644179.042, 60.522, NGC 3978……

Resourcescatalog or other position-based datasetexposed by cone or skynode serviceexample: SDSSexample: Abell galaxy cluster catalog

Customer provides Sources table

NGC 4454NGC 4454NGC 4454NGC 5376NGC 5532NGC 4500NGC 4500NGC 4500NGC 2644NGC 3978

NGC 4454NGC 4454NGC 4454NGC 5376NGC 5532NGC 4500NGC 4500NGC 4500NGC 2644NGC 3978

NGC 4454NGC 4454NGC 4454NGC 5376NGC 5532NGC 4500NGC 4500NGC 4500NGC 2644NGC 3978

Multicone resources provide data tablessdss

gsc2

twomass

Page 7: Services and Mashups Roy Williams California Institute of Technology

MulticoneUser gets sources elsewhere

Source = RA, Dec, ID

Multicone =N sources + radius

returns VOTable

radius

Page 8: Services and Mashups Roy Williams California Institute of Technology

Architecture

CustomerVim

personal persistent storage

upload sources

HTML + JS

CatalogsSpectra

NesssiCoregisteredImage cutouts

All the relevant information about your sources-- mashups from the VO-- kept for you in a workbench in the cloud-- view, mine, download

batch jobs

Page 9: Services and Mashups Roy Williams California Institute of Technology
Page 10: Services and Mashups Roy Williams California Institute of Technology

Sources and Matches• Start with a source table

– RA, Dec, ID for each source

RA Dec ID

sources

Page 11: Services and Mashups Roy Williams California Institute of Technology

Sources and Matches• Run VO services to get data

– “Match” tables from each catalog– Multiple matches per source

RA Dec ID

sources Cat1 Cat2

Page 12: Services and Mashups Roy Williams California Institute of Technology

Join to sources

• Closest or All• Joins match table to source table

RA Dec ID

sources

Cat1 Cat2

Page 13: Services and Mashups Roy Williams California Institute of Technology

Table column metadata

Click to open/close Toggle column

display

Page 14: Services and Mashups Roy Williams California Institute of Technology

Table displaySources

(user input)Three match tables

Table with no columns displayed-- just the match count

Page 15: Services and Mashups Roy Williams California Institute of Technology

Why Vim is best

• WebServer or Laptop install• Mac and Linux have personal webserver

• Scalable• Column operations only• Large operations can be Asynchronous (NESSSI)• Cannot select rows except by formula• Powered by Stilts (2,000,000,000 rows and up)

• Open and Secure• Bench ID = random string• Share your workbench with your colleagues

Page 16: Services and Mashups Roy Williams California Institute of Technology

Why Vim is best

• Content– Any cone search (== all the main catalogs)– Cutouts from SIAP services

• Co-registered to hyperatlas with Montage

– Spectra via SSAP (from NRAO)• Thumbnails and images and FITS

• Display– Column selection, Row sort/select– Images small-hover-large– Tools and metadata by hide-click-expose

Page 17: Services and Mashups Roy Williams California Institute of Technology

Cutout images

Hover mouse on cutout to see larger image

Page 18: Services and Mashups Roy Williams California Institute of Technology

Spectra from SSAPSpectral Collections brokered by NRAO:

• Arecibo Maser Catalog

• 2dFGRS

• SDSS DR5

Hover mouse on thumbnail to see larger image

Page 19: Services and Mashups Roy Williams California Institute of Technology

Tools• Multicone

• Fetch cone/siap/ssap for each source

• Sort and Select• By any column value

• Compute new column• Expressions (eg 2mass Jmag - SDSS Rmag)

• Join• Closest or All combinations

• Upload• From NESSSI service results

• Caching• Of dynamic/remote data

• Download• VOTable, CSV, KML, etc

Page 20: Services and Mashups Roy Williams California Institute of Technology

Asynchronous services: Waiting for Godot

Page 21: Services and Mashups Roy Williams California Institute of Technology

Here They Are!

• Jpeg is linked to FITS

• Cutouts co-registered from different surveys

Page 22: Services and Mashups Roy Williams California Institute of Technology

Asynchronous

• Drop source list into Nesssi• Choose cutouts/cones• Leave to run over lunch

Page 23: Services and Mashups Roy Williams California Institute of Technology

Asynchronous

• Drop results as URLs uploaded to Vim

Page 24: Services and Mashups Roy Williams California Institute of Technology

Usage• Install

– You will need Python 2.x– You will need a webserver, personal or on a server– Read and edit the unpack.py script– Execute it with "python unpack.py"– Point to the URL

• Try the links to the collections called:• seven galaxies, • 20 pulsars, or • 338 Arp galaxies.

– Once loading is stopped, the tiny images respond to mouse hover with bigger images– Click on a Tool to open its form, click again to close it– Click on a Table to see its metadata and choose display, click again to close it– Use Multicone to get data from the VO

• Upload sources– VOTable or CSV or VOTable-link

Page 25: Services and Mashups Roy Williams California Institute of Technology

CrossmatchJoin(= crossmatch)

Page 26: Services and Mashups Roy Williams California Institute of Technology

Computing and Plotting

Compute new columneg. Infrared-Optical color

Download VOTable and plot with Topcat

Page 27: Services and Mashups Roy Williams California Institute of Technology

Current Resource List

Example:http://nedwww.ipac.caltech.edu/cgi-bin/nph-objsearch?search_type=Near%%20Position%%20Search&of=xml_main&

lon=%8.5f d&lat=%8.5f d&radius=%f

Resource = URLformat + descriptions

URLformat =• “Generalized Cone Search”• Unification of {cone OR siap OR ssap OR others}

• URL = URLformat % (lon,lat,radius)

Page 28: Services and Mashups Roy Williams California Institute of Technology

Vim scripting language

– multicone: act on the source list with a resource– view: change view and refresh display

• Implemented with Astrogrid Stilts

– addcol: compute new column from others– select: keep rows where criterion is true– join: join a table of matches to the source table– sort: sort on any column

• Implemented with NVO VOTableLib

– download: make an output product– cachelinks: copy images, dynamic links to cache– urltable: ingest external VOTable– upload: ingest text

These are the commands

sent from client to server

(future) users will get Python/Perl script that can

reproduce session

Page 29: Services and Mashups Roy Williams California Institute of Technology

Screenshot: Arp galaxies

Page 30: Services and Mashups Roy Williams California Institute of Technology

Building Compute Services

• Developer and Admin– Services should be built by developers– In a framework managed by an adminstrator

• Service developers must be careful– Services can be dangerous (eg “execute any command”)

• Service users authenticated with “graduated security”– Easy to start, but great power is possible– Or just keep it anonymous

• Asynchrony for compute intensive jobs– Jobs submitted to batch queue– Unique benchID may be used to monitor job & return results

• From “clicking” to “scripting”– Services may be accessed by clicking on a web page or with scripted client

codes– Authentication for web clicking comes from a certificate in browser– Scripted access requires a certificate

client

service container

services

Page 31: Services and Mashups Roy Williams California Institute of Technology

Persistent Storage(“workbench”)

Ceramics class meets each week for 8 weeks

Page 32: Services and Mashups Roy Williams California Institute of Technology

Workbench

• Persistent storage• Just a directory in the web space

– Initiated by service– Tools operate on files in workbench

• http://……?bench=39840422 & action=PCA & (other params)

Page 33: Services and Mashups Roy Williams California Institute of Technology

Workbench

• URL to workbench is obscure– htttp://localhost/cgi-bin/vim?benchID=16213077368925688004920409437160

– Can send to your colleague

• Set up as– Read is free but URL is obscure– Using tools / write permission via password

• Reaping– Maybe 30-day lifetime for workbench storage?

• Need cron process to delete old benches

Page 34: Services and Mashups Roy Williams California Institute of Technology

Keywords

• “bench”– If present, specifies workbench

• “action”– What should the server do?

• Create workbench (provide password) • Upload data• Start algorithm• Monitor run (does the result exist?)• Download result

• Others:– Depends on action, specifies detail

Page 35: Services and Mashups Roy Williams California Institute of Technology

VIM server

if actionkey == "init":benchID = bench.makeBench()

elif form.has_key("bench"):benchID = form["bench"].value

else: print "No bench specified -- exiting"

# bench must be 32 decimal digits (NOT ../../precious)if re.match(r'^[0-9]{32,32}$', benchID, re.IGNORECASE) == None:

print "Sorry, but %s does not look like a valid benchID name" % benchIDsys.exit(1)

bench.setBenchID(benchID)

if actionkey == "urltable": actions.urltable(bench)if actionkey == "deletetable": actions.deletetable(bench)if actionkey == "fetch": actions.fetch(bench)if actionkey == "addcol": actions.addcol(bench)if actionkey == "select": actions.select(bench)if actionkey == "join": actions.join(bench)if actionkey == "sort": actions.sort(bench)

Page 36: Services and Mashups Roy Williams California Institute of Technology

Making things easier

• Let them log in!– Keeps record of workbenches– Who owns which– Users can ask for “my workbenches”– Can make log for funders

• Who is doing what

• BUT– Users *hate* to register at websites

Page 37: Services and Mashups Roy Williams California Institute of Technology

Security and Certificates

• Stop attacks• Access to secret data• Access to big resources• BUT

– Lots of extra infrastructure– Users hate it

Page 38: Services and Mashups Roy Williams California Institute of Technology

NESSSINVO Extensible Secure Scalable Service

Infrastructure

• Services are science-oriented• Services are made by trusted developers

from the science community• Web forms OR command line (Python API)• Built-in security (X.509 certificates)• Very large jobs can be run• Easy to get a certificate• No complex install needed by client• Different levels of certificate get different

service• Is installed on Teragrid• Services can be part of a workflow

Page 39: Services and Mashups Roy Williams California Institute of Technology

Nesssi

client nesssi

node

node

node

node

cluster

certificatepolicies

queue

workbenchstorage

Secure SOAP

certificate

open http

Page 40: Services and Mashups Roy Williams California Institute of Technology

Clarens server

An open-source webserver based on OpenSSL.

Page 41: Services and Mashups Roy Williams California Institute of Technology

A “Graduated Security” Model

Web form - anonymous access, small jobsSome science....

Get NVO weak certificate - access logged, but identity not verified

More science....

Full Grid account - browser accessBig-iron computing....

Scripted accessPower user

Portal-Based

Page 42: Services and Mashups Roy Williams California Institute of Technology

Traditional Grid Security

client

Show us your Certificate!I will do exactly what you want.

Page 43: Services and Mashups Roy Williams California Institute of Technology

Graduated Security

clientMay I have your Request and your Certificate?

Page 44: Services and Mashups Roy Williams California Institute of Technology

Authentication with Certificates

• A digital certificate proves who you are• X.509

– Usually encrypted by passphrase

• Certificate as login– Map from certificate to account

Page 45: Services and Mashups Roy Williams California Institute of Technology

This is a US driver’s licence. In the US it proves identity strongly. It is like a strong certificate.

This is a loyalty card where I buy food.(You can put a false address on the application.)It is like a weak certificate.

CertificatesThe Virtual Observatory as a Virtual Organization

Page 46: Services and Mashups Roy Williams California Institute of Technology
Page 47: Services and Mashups Roy Williams California Institute of Technology

How to be a Certificate Authority

In order for an RA to validate the identity of a person, the subject should contact the RA face-to-face and present photo-id and/or valid official documents showing that the subject is an acceptable end entity as defined in the CP/CPS document of the CA.

In case of host or service certificate requests, the RA should validate the identity of the person in charge of the specific entities using a secure method. The RA should ensure that the requestor is appropriately authorized by the owner of the FQDN or the responsible administrator of the machine to use the FQDN identifiers asserted in the certificate.

Page 48: Services and Mashups Roy Williams California Institute of Technology

Bench ID

• Identify which job we are talking about• 32 character hex string eg

cb28d0753a7fec9a485981f741d425ec

• Used to monitor a running jobsessionID = nesssiServer.cutout.init()msg = server.cutout.monitor(sessionID)

• Used to form URL where results appear, eg– http://dtf-test1.sdsc.teragrid.org:8080

/clarens/shell/cb/cb28d0753a7fec9a485981f741d425ec/cutouts/index.html

• If you lose the sessionID, you lose your job

Page 49: Services and Mashups Roy Williams California Institute of Technology

<NesssiMonitor>

<Service>Cutout</Service>

<Uname>ux400560</Uname>

<SessionID>774daf5ef52facc68cb03db4b1fdc815</SessionID>

<Sandbox>http://dtf-test1.sdsc.teragrid.org:8080/clarens/shell/77/774daf5ef52facc68cb03db4b1fdc815</Sandbox>

<Result>http://dtf-test1.sdsc.teragrid.org:8080/clarens/shell/77/774daf5ef52facc68cb03db4b1fdc815/cutouts/index.html</Result>

<QueueStatus>149.envoy.cacr.calte roy batch C8845cb 11516 1 -- -- 60:00 R --

</QueueStatus></NesssiMonitor>

Monitoring a Nesssi job

service name

running as this user

session ID

sandbox URL

results URL

queue status(R = running)

Page 50: Services and Mashups Roy Williams California Institute of Technology

Example: SleepyAdd

nesssiServer=nesssi.client('https://dtf-test1.sdsc.teragrid.org:8443/clarens/',debug=0)

sessionID = nesssiServer.sleepyadd.init()print "Your session ID is", sessionID

# Run: sleep 30 seconds then add 52 and 344nesssiServer.sleepyadd.run(sessionID, "-time 30 -n 52 -m 344")

web portal

command line

Page 51: Services and Mashups Roy Williams California Institute of Technology

Monitoring the Run

Key n is 52Key m is 344Key time is 30Sleeping for 30 secondsWaking up...Sum of 52 and 344 is 396

<NesssiMonitor><Service>Sleepyadd</Service><Uname>ux400560</Uname><SessionID>a3a167a383111c0cbd6941325b8659aa</SessionID><Result>http://dtf-test1.sdsc.teragrid.org:8080/clarens/shell/a3/a3a167a383111c0cbd6941325b8659aa/batch.out</Result><Sandbox>http://dtf-test1.sdsc.teragrid.org:8080/clarens/shell/a3/a3a167a383111c0cbd6941325b8659aa</Sandbox><QueueStatus>305875.dtf-mgmt1.sds ux400560 dque Ca3a167 -- 1 -- -- 18:00 Q --</QueueStatus></NesssiMonitor>

Page 52: Services and Mashups Roy Williams California Institute of Technology

Mosaic Service

nesssiServer=nesssi.client('https://envoy.cacr.caltech.edu:8443/clarens/',debug=0)

mosaic_loc = "-ra 49.1 -dec 60.1 -rawidth 0.5 -decwidth 0.5 -filt f -bgcorr 0"

session = nesssiServer.dpossMosaic.mosaic(mosaic_loc)print "Your session ID is %s." % session

msg = dbsvr.dpossMosaic.monitor(session)print msg

Page 53: Services and Mashups Roy Williams California Institute of Technology

nesssiServer.dpossMosaic.mosaic (“-ra 49.1 -dec 60.1 -rawidth 0.5 -decwidth 0.5 -filt f -bgcorr 0”)

Page 54: Services and Mashups Roy Williams California Institute of Technology

Coadd Service

nesssiServer=nesssi.client('https://envoy.cacr.caltech.edu:8443/clarens/',debug=0)

# Initialize the servicesessionID = nesssiServer.hyperatlas.init()print "Session id is ", sessionID

# Arguments for service, the coaddition to doargs = "-bandpass z1 -ra 170.08 -dec 13.275 -rawidth 1.0 -decwidth 1.0"

Page 55: Services and Mashups Roy Williams California Institute of Technology

-bandpass z1 -ra 170.08 -dec 13.275 -rawidth 1.0 -decwidth 1.0

Page 56: Services and Mashups Roy Williams California Institute of Technology

Cutout Service

nesssiServer=nesssi.client('https://envoy.cacr.caltech.edu:8443/clarens/',debug=0)sessionID = nesssiServer.cutout.init()print "Session id is ", sessionID

# Upload locations fileremoteinputfile = "/shell/%2s/%s/inputfile.xml" % (sessionID[0:2], sessionID)nesssiServer.upload_file(inputfile, remoteinputfile)

# Arguments for service, surveys to use and cutout sizeargs = "-surveys PQ:gr,PQ:gi,PQ:z1,PQ:z2,SDSS:r,SDSS:i,SDSS:z,2MASS:k,2MASS:h "args += "-size 64"

# Run servicenesssiServer.cutout.run(sessionID, args)

Page 57: Services and Mashups Roy Williams California Institute of Technology

Cutout Monitoring

Page 58: Services and Mashups Roy Williams California Institute of Technology

cutouts from Palomar-Quest, SDSS, 2MASSof sources from Veron quasar catalog

Page 59: Services and Mashups Roy Williams California Institute of Technology

AJAX (Asynchronous Javascript + XML)

• Uses browser’s XML support: DOM, XSLT• XMLHttpRequest• Google Maps is best-known AJAX application

Page 60: Services and Mashups Roy Williams California Institute of Technology

What do GET/POST services lack?

• Format method for describing interface contract• Reliable messaging• Digital signatures• Message routing• Resource life cycle management• Asynchronous event notification• Other capabilities captured by WS-* specs

Page 61: Services and Mashups Roy Williams California Institute of Technology

What is SOAP?

• Simple Object/Service-Oriented Access Protocol (Snakes On A Plane?)

• An XML-based communication protocol and encoding format for exchanging structured information in a decentralized, distributed environment

• W3C specification (http://www.w3.org/TR/soap)

Page 62: Services and Mashups Roy Williams California Institute of Technology

Anatomy of a SOAP message

• An envelope to encapsulate data which defines formatting conventions for describing the message contents and routing directions: header and body

• A message exchange pattern: request/response (RPC mechanism), fire-and-forget

• A transport or binding protocol• Data encoding rules for describing the mapping

of application-defined datatypes into an XML tag-based representation

Page 63: Services and Mashups Roy Williams California Institute of Technology

SOAP example

Request:<soap:Envelope xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xmlns:xsd=http://www.w3.

org/2001/XMLSchema xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">

<soap:Body> <ComovingLineOfSight xmlns="http://skyservice.pha.jhu.edu">

<z>float</z> <hubble>float</hubble> <omega>float</omega> <lambda>float</lambda>

</ComovingLineOfSight> </soap:Body>

</soap:Envelope>

Response:<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body>

<ComovingLineOfSightResponse xmlns="http://skyservice.pha.jhu.edu"> <ComovingLineOfSightResult>float</ComovingLineOfSightResult>

</ComovingLineOfSightResponse></soap:Body>

</soap:Envelope>

Page 64: Services and Mashups Roy Williams California Institute of Technology

Client Invocation Models

• Static: use generated stubs:java org.apache.axis.wsdl.WSDL2Java <wsdl url>

• Dynamic:– no generated code– a proxy dynamically generates a class at runtime that conforms to

a particular interface, proxying all invocations to a single ‘generic’ method

– Examples: • Java : use javax.xml.rpc.Service.getPort() and createCall()• .NET : use RealProxy class (must extend ContextBound) or

Reflection.Emit

• Generic SOAP client: http://soapclient.com/soaptest.html

Page 65: Services and Mashups Roy Williams California Institute of Technology

Why is SOAP better?

• Asynchrony• Reliable messaging (e.g. once-and-only delivery,

guaranteed or exact execution)• Send and receive complex datatypes to invoke a particular

method not just key-value pairs • Security • Binds to other protocols• Service description

Page 66: Services and Mashups Roy Williams California Institute of Technology

Take a REST from SOAP?

• IVOA jumped into SOAP services in 2002

• But SOAP is perceived as “difficult”– WSDL (formal service description) is complex and not interoperable

• REST and GET are perceived as easier

• Where is the sophistication of SOAP really needed?