Virtual Science in the Cloud
-
Upload
thetfoot -
Category
Technology
-
view
885 -
download
0
description
Transcript of Virtual Science in the Cloud
Virtual Sciencein the Cloud
Roy WilliamsCalifornia Institute of Technology
humans clouds sensorsbeginner to expert
sharinglogins and access
click to code to workflow
personal storagebig data and replication
compute and scalingsoftware as component
interoperabilty
survey and eventcontrol or autonomous
The New Science
Registry
Getting Data
Compute Services
Service Oriented Architecture
servicerequest
response
clientrequest
response
registry 1. publish
2. find
3. bind
service contract
Principle: Click or Code
VO Data Services• Cone Search
• radius+position list of objects • encoded as VOTable
– Simple Image Access Protocol– Simple Spectrum Access Protocol
• spectra have subtleties protocol more complicated
• Astronomical Data Query Language– For database queries– Core SQL functions plus astronomy-specific extensions
• Sky region, Xmatch
• Table Access Protocol– Exposes relational databases
• What tables• What table schema• Here is a query in ADQL
VO Compute Services
• Asynchronous• May not get immediate answer
– just get a place to check back
• Security• Expensive resources, big requests, sequestered data• Strong or Weak or None
• Scalable• Graduated path to powerful computation and big data
• Cloud store• VOSpace• Sharable
VO Registry• publish -- find -- bind• Registry Metadata
– Descriptions of – data collections – data delivery services– organizations, etc.
– Based on Dublin Core with astronomy-specific extensions
– Represented as XML schema; extensible
– Contents stored in Resource Registries • exchange metadata records through the
Open Archives Initiative Protocol (OAI-PMH)
Distributed Registry
Caltech
NCSA
STScI/JHU
HEASARC
Astrogrid
CDS
JapanVO
Ongoing harvesting March 07(CfA, ESO, NOAO soon)
ESO
CfA
NOAO
Semantics & Search
• Identifiers ivo://nasa.gsfc.gcn/SWIFT#BAT_GRB_Pos_374875-722
• Free tags beard Fred pudding
• Controlled Vocab (UCD) phot.flux;em.ir
• Controlled Vocab interop (SKOS)• Ontology Greek isA Man, Socrates isA Greek Socrates isA Man
• Data Models Each sky position will have a circular positional error estimate ...
• Text markup Outflows from <object>NGC 666</object> are irregular ...
• Schema Columns are Magnitude, Position, Identifier , ...
• Metadata (registry) forms Full Registry: true; ManagedAuthorities: authority, nasa.heasarc
• Formal service description
Cloud Based Toolscode & presentation data
Open SkyQuery.netVO Astronomical Crossmatch Service
• Query builder• Presentation
Execution
• Query planning• Query execution• Workflow
MicrolensingOptical transients
Radio transientsX-ray transients
Gamma transients
Follow-up Scheduler
TelescopeTelescope
Telescope
Authors SubscribersInternational
GCN Broker annotation from archives
Events and annotation disseminated to subscribers
in real time with intelligence
skyalert.org
AstronomersAmateursStudents
Skyalert
• Push-based workflow– Can be cyclic
• Portfolio aggregation by citation• Annotation as software components• Stream owner builds template• Django, Python, Jquery
• now 4 developers via SVN
Skyalert Stream Registry... will be VO registry
Roleshuman or robot
2. subscribehuman or robot
3. author 4. annotatecontrib software componentsarchive, mining
triggers
portfolios db
actions
web
push inject
human or robot
1. browsequery, human computing, WWT/Google
IM/tweet/email/TCP
skyalert.org
Trigger
Action
Cyclic workflow graph
CRTS[“Geometry”][“Moon angle”] > 30and SDSS[“Photoprimary”][“g-magnitude”] < 18
dynamically loads modulerun(triggerEvent, portfolio): <business logic>can build event and inject recursively
annotator
followup request
send message
Alerts and event cascade
18
skyalert.org
Skyalert-LSST•Test run for LSST mobile app
•Data service from CRTS and Skyalert• gets JSON event list via http
•LSST building skyalert clone• Pasadena and Tucson both get
events by Jabber/XMPP
• “Unknown” is now choice ofCataclysmic Variable, Supernova, Blazar Outburst, Active Galactic Nucleus Variability, UVCeti Variable, Asteroid, Variable, Mira Variable, High Proper Motion Star, Comet, Eclipsing Variable, Gamma Ray Burst Afterglow, Microlensing, Nova, Planetary Microlensing, RRLyrae Variable, Tidal Disruption Flare
skyalert.org
Tier1 and Tier2 Event NodesEvolving in IVOA
• Tier1: • Immediate Forwarding, Reliable?, Topology?
• Tier2:• Subscription, Repository, Query, Portfolio, Registry, Machine
Learning, Substreams etc etc
Tier1
Tier2
Brokering
Jabber/XMPPor raw socket
Authoring
Distribution
Registry:• Stream definitions• Event Servers
NSF Teragrid
• World’s largest open distributed cyberinfrastructure• 11 Resource Provider sites, >2 Petaflop HPC & >27000 CPUs, >3 Petabyte disk, >60 PB tape• Fast network, Visualization, experiments (VMs, GPUs, FPGAs)• For US researchers and their collaborators through national peer-review process
Teragrid 2002
user100s of nodes
purged /scratch
parallel file system/home
login node
job submission and queueing(Condor, PBS, ..)
metadata node
parallel I/O
global file system
Unix, Globus, C++, ssh, files, MPI, PBS, make
Architectures 2010
• Science Gateway (no architecture!)• Node farm (condor)• Parallel computing
– Message-passing MPI– Shared memory
• Graphics Processing Units• 104 independent tiny threads
• Data Intensive• Flash memory (TG/UCSD)• Graywulf (JHU/Pannstarrs)
• Immediate resources
Science Gateways• Biology and Biomedicine Science Gateway• Open Life Sciences Gateway• The Telescience Project• Grid Analysis Environment (GAE)• Neutron Science Instrument Gateway• TeraGrid Visualization Gateway, ANL• BIRN• Open Science Grid (OSG)• Special PRiority and Urgent Computing Environment (SPRUCE)• National Virtual Observatory (NVO)• Arroyo Adaptive Optics• Linked Environments for Atmospheric Discovery (LEAD)• Computational Chemistry Grid (GridChem)• Computational Science and Engineering Online (CSE-Online)• GEON(GEOsciences Network)• Network for Earthquake Engineering Simulation (NEES)• SCEC Earthworks Project• Network for Computational Nanotechnology and nanoHUB• GIScience Gateway (GISolve)• Gridblast Bioinformatics Gateway• Earth Systems Grid• Astrophysical Data Repository (Cornell)
Slide courtesy of Nancy Wilkins-Diehr
GPU for molecular modelling
Data valetload/validate
mergecrawl
replicatelog
User facingSQL/casjobsworkbench
privacy/sharestored queries
wor
kflow
wor
kflow
compute
datahead/slice
hot/warm/cold
Fault tolerance: multiple replication, fault workflowCost and energy carefully consideredFuture: Hadoop/Mapreduce
Pannstarrs PS1
Cloud Supercomputing?
• Teragrid/Globus vs Cloud/Amazon MI
• Both ways to get wholesale computing• Both provide IaaS, Infrastructure as a Service
• Virtual Machine more popular than CTSS stack• What about parallelism? I/O speed? GPUs? etc
– Watch 3leaf and ScaleMP for these
Science and Web 2.0
• Easy for groups to form and collaborate• Integrates with user workspace
– iGoogle and OpenSocial– alongside other aspects of their lives
• Use existing tools• SlideShare, blogs, google gadgets, facebook, Gwave, Flickr,
YouTube
• Sharing workspace• Electronic log• Provenance• Virtual Data as “equivalent script”
Science and Web 2.0
• Server delivers only code– Browser makes presentation– Ajax and Ajaj and Http “long poll”– Jquery and Google toolkit– see WWT and GSky in Skyalert
• “Everything is a wiki”• or a wave?
• Visible/editable by group/s
Adaptive Optics Gateway
proposed upgrade of the Palomar AO system to a 56x56 subaperture system
• Adaptive optics simulations• 30-meter telescope• Planet finding coronograph
• 4-day run for 4-sec!• Parallel parameter sweeps
Arroyo
Arroyo Gateway Architecture
Django
webserver
daemon
MySQLjob definitions and status
local space for results
remote space for results
wholesale computing1. use HTML/JS from webserver to create job definition.
2. Daemon is polling & sees new job, makes local space for it.
3. Start job on compute resource & update jpb status.
4. Fetch &update status of running job. Repeat.
5. Output to remote space.
5. Daemon copies output from remote to local, updates job status.
7. User fetches results from webserver
retail wholesale
RW and J. Bunn
Pegasus workflow
E. Deelman
E. Deelman, G. Berriman, RW, et al
LIGO Grid• Condor/DAGMan• now 45,000 jobs per month• Pegasus for load balancing?
Asynchronous services: User needs feedback
• AJAJ (AJAX but with JSON)
• Detailed progress reports during run
• Strong/weak security model with certificates
Wide-area Mosaicking
Griffith Observatory, Los Angeles
158 feet
Citizen Science
Human Volunteers
• Science Layer– Describe what you see in image– Each person has level of expertise– How to use results most effectively– Galaxyzoo.org, citizensky.org good models
• Game Layer– Makes people come back– Top 10 ranking etc– Anonymous partner a la gwap.com
Human Volunteer Evidence
Donalek et alarXiv:0810.4945 [astro-ph]
4 of 10 say artifact artifact
RW and C. Donalek
Macromolecule Citizen Science
A. Cunha
Information Fusion
Classic Machine LearningMetric in “Feature Space”
RW and J. Beck
Feature VectorsLearning from Training setPicking relevant lessons
Relevance Vector Machine (Tipping)
New Machine Learning:Information Fusion
• Data Portfolios• selected from known set of object
types
• Evidence object• set of class/prob and prior assumptions• may be correlated priors
• Annotator builds evidence• from portfolio• may include other evidence
• Inference (= Expert System)
• Combines evidence with cost-benefit• Builds Importance
• Alchemy• Logic handles
complexity• Probability handles
uncertainty• Markov Logic Networks• Matrix Completion• Influence Diagrams
Automated Decision through Tripod of Data
• Archive• nearby radio source escalates p(blazar)• nearby galaxy escalates p(supernova)
• Human• Crowded field? Artifact present?• Can make follow-up observation
• Machine• Fuzzy center escalates p(host galaxy)• Moving source escalates p(asteroid)• Bobotic follow-up observation
decision
human
machine
learningarch
ive
Lessons Learned
User Interface (wrong)
Finally get some helpAsk for helpTranslate VOTable formatLearn to use VO RegistryRead about web servicesRead about XMLWait for accountRegister
and now do some science....
Web form
some science....
Register
more science....
Run bigger job
hey this is interesting ....Learn the VO structure
Power user
User interface (right)in Darwinian evolution every small change must give benefit
Anonymous
be careful with complex authentication!
Steering the Ship
• Short term Pragmatism• useful tools now• simple protocols (eg cone search)• “just use RA and Dec”
vs • Long term Architecture
• modular suite of interoperable tools• sophisticated protocols (eg skynode)• sophisticated Space-Time coordinates
Building Information Standards
• Semantics• Meaning• Usefulness• Applicability
• Code• Services• Interfaces
• Documents• Agreements• Data Models• Tight Schema• Loose Schema
• UML• XSD • WSDL
A Data Model is a bridge fromcommunity to computers
What is a Data Center?
machines services
doesn’t matter where or howtesting testing testing
do we have enough power and HVAC?
Complex scienceComplex machines
• Separate science user from complexity– Must have domain science context
• Making simple things simple but– Power to scale up– Drill-down if wanted
• Machines are not the objective– Science through data, compute, sharing
eScience is for People, right?
Summer Schools
ForumDocumentationKnowledge Base
Social MediaBlog/newsfeed
Help Desk
Education
Getting Started
Campus Champions
Contact UsCalendar
Advanced Supportfor Developers