1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay...

38
1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay...

Page 1: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

1

Information At Your Fingertips

Web ServicesJim Gray & Tom Barclay

Microsoft Research

Alex SzalayJohns Hopkins University

Page 2: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

2

Communications Excitement!!

Point-to-Point Broadcast

Immediate

TimeShifted

conversationmoney

lectureconcert

mail booknewspaper

NetNetWorkWork+ DB+ DB

DataDataBaseBase

Its ALL going electronicImmediate is being stored for analysis (so ALL database)Analysis & Automatic Processing are being added

Slide borrowed from Craig Mundie

Page 3: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

3

Information Excitement! • All information will be online (somewhere)

text, speech, sound, vision, graphics, spatial, time…

• You might record everything – read: 10MB/day, 400 GB/lifetime (5 disks today)– hear: 400MB/day, 16 TB/lifetime (2 disks/year today)– see: 1MB/s, 40GB/day, 1.6 PB/lifetime (150 disks/year maybe

someday)

• Information at Your Fingertips–Make it easy to capture & present –Make it easy to store & organize & access

–Make it easy to analyze & summarize

Page 4: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

4

How much information is there?• Soon everything can be

recorded and indexed• Most bytes will never be

seen by humans.• Data summarization,

trend detection, anomaly detection are key technologies

See Mike Lesk: How much information is there: http://www.lesk.com/mlesk/ksg97/ksg.html

See Lyman & Varian:

How much informationhttp://www.sims.berkeley.edu/research/projects/how-much-info/

Yotta

Zetta

Exa

Peta

Tera

Giga

Mega

KiloA BookA Book

.Movie

All LoC books(words)

All Books MultiMedia

Everything!

Recorded

A PhotoA Photo

24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli

Page 5: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

5

How do we get information today.

• Human searches web (with an index)

• Human browses pages

Page 6: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

6

How do we get information tomorrow?

• Agents gather and digest it for us.

• Q: How?

• A Microsoft: Dot Net

– Discovery: UDDI,

WSDL– Explore: SOAP

My Agents

Digital Dashboard

Web Services

SOAPWSDL

Page 7: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

7

How do you publish information?• Get the data.

• Conceptualize the data schema • Provide methods that return data subsets.

– Challenge: how much processing on your server?

• Publish the schema and methods.

• We are exploring these issues.

f, g, x, y…

Page 8: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

8

TerraServer Example• What is TerraServer?

– 3TB Internet Map DB available since June 1998– USGS photo and topo maps of the US– Integrated with Home Advisor– Shows off SQL Server availability & scalability– Designed for basic computer systems and low

speed communications

• What is TerraService? – A .NET web service– Makes TerraServer data available to other apps

Page 9: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

10

Application Goals

• Available – Always, 24x7x52 99.99% of the time

• Programmable -- .NET applications can integrate TerraServer data into their apps

• BIG — 1 TB of data including catalog, temporary space, etc.

• PUBLIC — available on the world wide web

• INTERESTING — to a wide audience

• ACCESSIBLE — using standard browsers (IE, Netscape)

• REAL — a LOB application (users can buy imagery)

• FREE — cannot require NDA or money to a user to access

• FAST — usable on low-speed (56kbps) and high speeds(T-1+)

• EASY — we do not want a large group to develop, deploy, or maintain the application

3 TB

Page 10: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

11

Demo http://terraserver.microsoft.com

Show

photo

topo

gazetteer

demographics

Page 11: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

12

Hardware

SQL\Inst1SQL\Inst1

SQL\Inst2SQL\Inst2

SQL\Inst3SQL\Inst3

SpareSpare

F GLKP Q

E EJ JO O

IHM NR S

22002200 22002200 22002200

220022002200220022002200

22002200 22002200 22002200

One SQL database per rackOne SQL database per rackEach rack contains 4.5 tbEach rack contains 4.5 tb261 total drives / 13.7 TB total261 total drives / 13.7 TB total

Meta DataMeta DataStored on 101 GBStored on 101 GB““Fast, Small Disks”Fast, Small Disks”(18 x 18.2 GB)(18 x 18.2 GB)

Imagery DataImagery DataStored on 4 339 GBStored on 4 339 GB““Slow, Big Disks”Slow, Big Disks”(15 x 73.8 GB)(15 x 73.8 GB)

To Add 90 72.8 GBTo Add 90 72.8 GBDisks in Feb 2001Disks in Feb 2001to create 18 TB SANto create 18 TB SAN

8 Compaq DL360 “Photon” Web Servers8 Compaq DL360 “Photon” Web Servers

Fiber SANFiber SANSwitchesSwitches

4 Compaq ProLiant 8500 Db Servers4 Compaq ProLiant 8500 Db Servers

Page 12: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

13

TerraServer Experience• Successful Web Site

– Met all 8 goals – interesting, big, real, public, fast, easy, accessible, and free

– High Availability – Windows Data Center & Compaq SAN Technology

– Top 1000 Web Site – continues to be popular• New Feature Requests

– Programmable access to meta-data– User selectable image sizes, i.e. “a map server”– Permission to use TerraServer data within

server applications

Page 13: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

14

What is a Web Service?

SOAPSOAPSOAPSOAP Web Service consumers can send and receive messages using XML

SOAPSOAPContract LanguageContract Language

SOAPSOAPContract LanguageContract Language

Web Services are defined in terms of the formats and ordering of messages

SOAPSOAPDiscoveryDiscoverySOAPSOAP

DiscoveryDiscovery You can ask a site for a description of the

Web Services it offers

All these capabilities are built using open Internet protocols XML & HTTPXML & HTTP

Open Internet Protocols

Web Web ServiceService

A programmable application component accessible via standard Web protocols

UDDIUDDIUniversal Description, Design, and IntegrationUniversal Description, Design, and Integration

UDDIUDDIUniversal Description, Design, and IntegrationUniversal Description, Design, and Integration

Provide a Directory of Services on the Internet

Page 14: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

15

.NET TerraService Architecture

ExistingDB Server

SQL 2000

1.0 TB Db

SQL 2000

1.0 TB Db

SQL 2000

1.0 TB Db

705 m Rows705 m Rows

ADO.NETADO.NETADO.NETADO.NET

TerraServer TerraServer Web ServiceWeb Service

OLEDB

Map Server Map Server Http HandlerHttp Handler

Map UI Map UI Web FormsWeb FormsStandardStandard

BrowsersBrowsersStandardStandardBrowsersBrowsers

SmartSmartClientsClientsSmartSmartClientsClients

WindowsWindowsFormsForms

WindowsWindowsFormsForms

.NET.NETFrameworkFramework

.NET.NETFrameworkFramework SOAP/XML

HTMLImage/jpeg

Image/jpeg

Page 15: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

16

TerraServer Web Services

• Query Gazetteer• Retrieve imagery

meta-data • Retrieve imagery • Simple Projection

conversions

• Geo-coded places, e.g. Schools, Golf Courses, Hospitals, etc.

• Place Polygons e.g. Zip Codes, Cities, etc.

Terra-Tile-Service Landmark-Service

allows “overlay” information for Terra-Tile-Service applications

Clients can present TerraServer imageryin new ways.

Page 16: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

17

Web Service Methods• Place Search

– GetPlaceFacts– GetPlaceList– GetPlaceListInRect– CountPlacesInRect

• Projection– ConvertLonLatPtToUtmPt– ConvertUtmPtToLonLatPt– ConvertLonLatTo NearestPlace – GetTheme– GetLatLonMetrics

• Tile– GetAreaFromPt– GetAreaFromRect– GetAreaFromTileId– GetTileMetaFromLonLatPt– GetTileMetaFromTileId– GetTile (Image)

• Landmark– GetLandmarkTypes– CountOfLandmarkPointsByRect– GetLandmarkPointsByRect– CountOfLandmarkShapesByRect– GetLandmarkShapesByRect

http://terraservice.net

Page 17: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

18

Soil Viewer Uses TerraService

Page 18: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

19

Custom End ProductWeb Soil Data Viewer XML Soil ReportSoil Interpretation Map

Page 19: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

20

What Tom Showed You• Converted a Web Server

–HTML get post–Server returns pictures

to people

• to a Web Service–SOAP service–returns XML self-describing data–Application integrates data

(Agriculture and Geo data)

Page 20: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

21

Rosetta Stone• Distributed computing

+ basic services• Yellow Pages

• ?• RPC – remote procedure

call, CORBA, DCOM, RMI• IDL – interface definition

language• XDR - eXternal Data

Representation

• Dot Net

• UDDI – Universal description, discovery, and integration

• Schema, XLANG• SOAP – simple object

access protocol• WSDL – web services

definition language• XML- eXtended Markup

Language

Page 21: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

22

Sky Server– Like TerraServer pictures of the sky.

– But also LOTS of data on each object

So a data mining web service• Luminosity (multi-spectra), morphology, spectrum• So, it is a data mining application• Cross-correlation is challenging because

–Multi-resolution–Data is dirty/fuzzy (error bars, cosmic rays, airplanes…)

–Time varying

+

•50 K Spectro Objects •~ 100 attributes + 30 lines

•15M Photo Objects ~ 400 attributes

Page 22: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

23

Astronomy Data• In the “old days” astronomers took photos.

• Starting in the 1960’s they began to digitize.• New instruments are digital (100s of GB/nite)

• Detectors are following Moore’s law.

• Data avalanche: double every year

Total area of 3m+ telescopes in the world in m2, total number of CCD pixels in megapixel, as a function of time. Growth over 25 years is a factor of 30 in glass, 3000 in pixels.

Courtesy of

Alex Szalay

Page 23: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

24

Astronomy Data• Astronomers have a few Petabytes now.

– 1 pixel (byte) / sq arc second ~ 4TB– Multi-spectral, temporal, … → 1PB

• They mine it looking for new (kinds of) objects or more of interesting ones(quasars), density variations in 400-D space correlations in 400D space

• Data doubles every year.• Data is public after a year.• So, 50% of the data is public.• Some have private access to 5% more data.• So: 50% vs 55% access for everyone

Page 24: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

25

Astronomy Data• But…..• How do I get at that 50% of the data?• Astronomers have culture of publishing.

– FITS files and many tools.http://fits.gsfc.nasa.gov/fits_home.html

– Encouraged by NASA.• Publishing data “details” is difficult.

Astronomers want to do it but it is VERY hard.(What programs where used? what were the processing steps? How were errors treated?…)

Page 25: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

26

Virtual Observatoryhttp://www.astro.caltech.edu/nvoconf/

http://www.voforum.org/

• Premise: Most data is (or could be online)

• So, the Internet is the world’s best telescope:– It has data on every part of the sky– In every measured spectral band: optical, x-ray, radio..

– As deep as the best instruments (1 year ago).– It is up when you are up.

The “seeing” is always great (no working at night, no clouds no moons no..).

– It’s a smart telescope: links objects and data to literature on them.

Page 26: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

27

Virtual Observatory The Age of Mega-Surveys

• Large number of new surveys– multi-TB in size, 100 million objects or more– individual archives planned, or under way– Data publication an integral part of the survey– Software bill a major cost in the survey

• Multi-wavelength view of the sky– more than 13 wavelength coverage in 5 years

• Impressive early discoveries– finding exotic objects by unusual colors

• L,T dwarfs, high-z quasars

– finding objects by time variability• gravitational micro-lensing

MACHO2MASSDENISSDSSPRIMEDPOSSGSC-IICOBE MAPNVSSFIRSTGALEXROSATOGLE ...

MACHO2MASSDENISSDSSPRIMEDPOSSGSC-IICOBE MAPNVSSFIRSTGALEXROSATOGLE ...

Slide courtesy of Alex Szalay, modified by jim

Page 27: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

28

Virtual Observatory Federating the Archives

• The next generation mega-surveys are different– top-down design– large sky coverage– sound statistical plans– well controlled/documented data processing

• Each survey has a publication plan• Data mining will lead to stunning new discoveries

• Federating these archives

Virtual Observatory

Slide courtesy of Alex Szalay

Page 28: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

29

The Multiwavelength Crab Nebula

Nova first sighted 1054 A.D. by

Chinese Astronomers

Now: Crab Nebula X-ray, optical,

infrared, and radio

Slide courtesy of Robert Brunner @ CalTech.

Crab star 1053 AD

Page 29: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

30

Exploring Parameter SpaceGiven an arbitrary

parameter space:• Data Clusters• Points between Data

Clusters• Isolated Data Clusters• Isolated Data Groups• Holes in Data Clusters• Isolated Points

Nichol et al. 2001

Slide courtesy of Robert Brunner @ CalTech.

Page 30: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

31

Virtual Observatory and Education• In the beginning science was empirical.

• Then theoretical branches evolved.

• Now, we have a computational branches.– The computational branch has been simulation– It is becoming data analysis/visualization

• The Virtual Observatory can be used to – Teach astronomy:

make it interactive, demonstrate ideas and phenomena

– Teach computational science skillsand the process of scientific discovery

Page 31: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

32

Sloan Digital Sky Survey http://sdss.org/

• A group of astronomers has been building a telescope (with 90M$ from Sloan Foundation, NSF, and a dozen universities). for the last 12 years!

• Now data is arriving: – 250GB/nite (20 nights per year).– 100 M stars, 100 M galaxies, 1 M spectra.

• Public data at http://sdss.org/ – 5% of the survey, 600 sq degrees, 15 M objects 60GB.– This data includes most of the known high z quasars.– It has a lot of science left in it but… that is just the start.

Page 32: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

33

Demo of Sky ServerAlex built SkyServer (based on TerraServer design).

http://skyserver.sdss.org/

Demo: famous places navigator data shopping cart spectrum SQL? ?

Page 33: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

34

Virtual Observatory Challenges• Size : multi-Petabyte

40,000 square degrees is 2 Trillion pixels

– One band (at 1 sq arcsec) 4 Terabytes– Multi-wavelength 10-100 Terabytes– Time dimension >> 10 Petabytes

– Need auto parallelism tools

• Unsolved Meta-Data problem– Hard to publish data & programs– Hard to find/understand data & programs

• Current tools inadequate– new analysis & visualization tools

• Transition to the new astronomy– Sociological issues

Page 34: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

35

3-steps to Virtual Observatory • Get SDSS and Palomar online

– Alex Szalay, Jan Vandenberg, Ani Thakar….– Roy Williams, Robert Brunner, Julian Bunn

• Do queries and crossID matches with CalTech and SDSS to expose – Schema, Units,…– Dataset problems– the typical use scenarios.

• Implement WebServices at CalTech and SDSS

Page 35: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

36

The Challenges• How to federate the Archives to make a VO?• The hope: XML is the answer.• The reality: XML is syntax and tools:

FITS on XML will be good but….. Explaining the data will still be very difficult.

• Define Astronomy Objects and Methods.– Based on UDDI, WSDL, SOAP.– Each archive is a service

• http://TerraService.net/ shows the idea.– Working with Caltech (Brunner, Williams, Djorgovski, Bunn)

– But, how does data mining work?

Page 36: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

37

SkyServer as a WebServiceWSDL+SOAP

just add details

Archive ss = new VOService(SkyServer);

Attributes A[] = ss.GetObjects(ra,dec,radius)

?? What are the objects (attributes…)?

?? What are the methods (GetObjects()...)?

?? What query language? SQL, Xquery…?

Page 37: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

38

Summary• All information at your fingertips.

• How do we publish information so that our agents can digest it?

• Example: TerraServer -> TerraService

• The Virtual Observatory Concept

– The Internet is worlds best telescope• For astronomy• For teaching astronomy and • For teaching computational science

Page 38: 1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

39