What Is Web 2.0? In plain English . Automating tedious tasks
using web technology Tools to help people and software
collaborate
Slide 3
Scientific American May 2008 Science 2.0 The Risk and Reward of
Web-Based Research --------------------------------- Our real
mission isnt to publish journals but to facilitate scientific
communication Timo Hannay Head of Web Publishing at Nature
Publishing Group
Slide 4
ScienceStudio Elder Matias CLS 09-04-28
Slide 5
5 User Access to Synchrotrons Who is the community that will
use your platform? Synchrotrons are electron storage rings that
emit high intensity photons that are used for experiments by a
large scientific community (tens of thousands worldwide). Access is
normally granted for single periods of 1-3 days in a half- year
cycle. What couldnt your community do without the platform?
Physical distances and episodic access prevent rapid scientific
progress and limit scientific collaboration. Why was that a problem
or limitation? Governments worldwide have invested >$2B in these
facilities, yet the scientific outcomes could be optimised.
Slide 6
User Access to Synchrotrons What middleware was needed to
resolve the limitations? Workflow management Engine for the User
Office Web Portal for remote data access (during and post
experiment) Enterprise Service Bus and SOA to integrate internal
and external data analysis services How do your plans meet the
needs Users will have frequent remote access to the VESPERS
beamline at the Canadian Light Source under conditions where many
collaborators can participate in the experiment. 6
Slide 7
Science Studio serves three purposes: Management of all aspects
of a scientific experiment including data storage, collaboration
with others, processing of data; Control of, or interaction with,
remote experiments on the CLSI VESPERS Beamline and UWO
Nanofabrication Laboratory and User Services (sample management,
scheduling, peer review, user training) 7
Slide 8
8 Team: People and Orgs Remote Control User Services System
Deployment Integration System Architecture System Requirements
Testing Data Analysis/Grid Computing User Office Software
Scientific Workflow Engines
Slide 9
9 Team: People and Orgs Dionisio Medrano Dylan Maxwell Daron
Chabot Elder Matias Chris Armstrong John Haley Mike Bauer Stewart
McIntyre Marina Suominen Fuller Jinhui Qin Nathaniel Sherry Yuhong
Yan Zahid Anwar Ludeng (Eric) Zhao Dan Ni Yaofeng Xu
Slide 10
System Architecture Web Application Beamline Control Module DB
SAN JMSCA VESPERS HTTP 1. VESPERS Beamline 2. EPICS control system
3. Beamline Control Module (BCM) 4. Web Application 5. Database 6.
File Storage 7. Web Interface
Slide 11
VESPERS Beamline VESPERS Very Sensitive Elemental and
Structural Probe Employing Radiation from a Synchrotron A bending
magnet beamline on sector 6 at the Canadian Light Source
synchrotron in Saskatoon, Saskatchewan. A hard x-ray microprobe
with an energy range of 6 to 30keV. Techniques: X-Ray Fluorescence
(XRF) & X-Ray Diffraction (XRD) Web Application Beamline
Control Module DB SAN JMSCA VESPERS HTTP
EPICS Low-level Control System EPICS Experimental Physics and
Industrial Control System The standard control system at the CLS.
EPICS consists of a network of Input-Output Controls (IOCs) which
are connected to directly to devices. An IOC provides many Process
Variables (PVs) which relate to either an input or output from a
device and have a unique name. Channel Access (CA) is used to read
or write to any PV without knowing which IOC provides the PV. More
than 50,000 PVs in the CLS control system. Web Application Beamline
Control Module DB SAN JMSCA VESPERS HTTP
Slide 14
Beamline Control Module (BCM) The BCM provides a high-level
interface to the low-level control system (EPICS). Logical and
physical separation of business logic and control logic. Virtual
device abstraction that provides independence from low-level
control system. Virtual devices can be logically organized into a
device hierarchy. Basic devices can be combined to build more
functional devices. Communication with external applications using
two message queues (ActiveMQ). Web Application Beamline Control
Module DB SAN JMSCA VESPERS HTTP
Slide 15
Web Application A J2EE Servlet application that provides a
web-based interface Science Studio. Tools: Spring (MVC), iBATIS
(ORM), JSecurity (Apache Ki), Apache Tomcat Divided into two parts:
the Core application and the VESPERS beamline application. Core
application is responsible for providing access to the business
objects. VESPERS application is responsible for remote control of
the VESPERS beamline. Web Application Beamline Control Module DB
SAN JMSCA VESPERS HTTP
Slide 16
Database Metadata associated with the operation of a remote
controlled beamline and the organization of experimental data
collected on that beamline. A project is the top level
organizational unit and is associated with a project team. A
session defines a period of time allocated to a project team to
conduct experiments. An experiment relates a sample and the
technique being applied to that sample. A scan records the location
of the acquired experimental data. Web Application Beamline Control
Module DB SAN JMSCA VESPERS HTTP
Experimental Data Storage Experimental data is stored at the
CLS. Common directory structure shared with other beamlines. A
large data storage facility is now operational at the University of
Saskatchewan as part of WestGrid. Web Application Beamline Control
Module DB SAN JMSCA VESPERS HTTP
Slide 19
VESPERS Web Interface Rich web interface to Science Studio and
the VESPERS beamline. Designed to be used over commodity broadband
internet. Developed for the Firefox web browser without any
additional plugins or extensions. Known to work with other
browsers, but requires the Canvas HTML tag. AJAX is used for the
VESPERS interface to provide device values in pseudo real time.
ExtJS, a JavaScript framework, provides many advanced GUI elements.
Web Application Beamline Control Module DB SAN JMSCA VESPERS
HTTP
Slide 20
Beamline Setup
Slide 21
Experiment Setup
Slide 22
XRF (X-Ray Fluorescence)
Slide 23
Beamline Hutch Cameras
Slide 24
Experimental Data Viewer
Slide 25
User Office Workflow Goal: Many tasks in proposal & sample
management at CLS To develop a workflow management system that
manages ordering of tasks e.g. (training before shipping) Tracks
manual as well as SS task progression Mar 6-month cycle CLS call
for proposals Proposal submission To CLS CLS gathers proposals CLS
reviews proposals CLS grants scientist Beamline time cientist packs
sample I wonder if CLS received my sample yet? Scientist must
complete Online SS training CLS health & safety inspection Many
other tasks Perform Experiment Return Sample Take Survey
Slide 26
User office Workflow Status Workflow Management Engine Beamline
User User Office Task :Training Completed Notify Approved Notify
Record Progress Features Open source Petri-nets based Direct
support for workflow control flow patterns Ability to interact with
web services declared in WSDL Relies on XML standards e.g. XPath
and XQuery for data & doesnt use proprietary languages
Architecture System Core: YAWL engine. Engine instantiates
specifications designed using YAWL designer. managed by the YAWL
repository Environment composed of YAWL services inspired by web
services paradigm, end-users, applications, and organizations are
all services in YAWL.
Slide 27
Screenshot: User Training Test Creation
Slide 28
Screenshot: User Survey Taking Page
Slide 29
Screenshot: User Survey Edit Page
Slide 30
Screenshot: Workflow Sample Management
Slide 31
Screenshot: Workflow Call for Proposals
Slide 32
User Office Workflow Example Prototype Implementation 1. CLS
issues a call for proposals and gives deadline 2. Beamline users
submit proposals 3. User Office administrator ends registration or
extends deadline 4. User Office administrator assigns proposals to
user office reviewers 5. Reviewers look at proposals and rank them
6. User Office looks at ranking and chooses the proposals to accept
7. Accepted proposals contact persons are notified 8. Beamline User
completes training (web service) 9. After training is completed
(simulated by a delay) the CLS is notified
Slide 33
Scheduling Module Goal: To automate the review process and the
method by which beam time is allocated and scheduled to users
depending on the access mechanism chosen by the user and the stage
of operation (construction, commissioning or operation) of the
beamline. Side effects: Facilitate the management of cycles, runs
and modes of operation Use automatic scheduling to handle more
scheduling conditions and constraints than human beings are able to
handle manually and identify optimal solutions.
Slide 34
Scheduling Module Features Users Submit proposals Integer
Programming and Heuristic Algorithm Schedule INPUT: SEARCH AND
CONSTRAINT SATISFIABILITY: OUTPUT: Beamlines2 Experiments3 Release
Times[1,1,2] Deadlines[8,15,5] Weights[4,5,1] Processing
Times[10,4,3] Eligibility[[0,1,0],[1,0,1]] CONSTRAINTS 1. One
beamline per experiment 2. Start time after release time 3. Only
eligible beamlines can be selected. 7. No overlap of experiment per
beamline
Slide 35
X-Ray Fluorescence (XRF): Reveals Elemental Composition
Characteristic Element Lines Selected and Mapped Over a 2D Scan
Area S: K Cr: K & Cr: K Fe: K & Fe: K Ni: K & Ni: K 2D
Maps Generated for Selected Elemental Lines
Slide 36
X-Ray Diffraction (XRD): Reveals Structural Information Peak
Fitting and Indexing of Image Set to Create a Grain Orientation Map
Peak Search Old IDL Programme Matched Peak New C Programme Matched
Peak New C Programme Expected Peak The XRD Indexing programme
examines the locations of peaks in an image in order to determine
the kind of lattice structure the samples constituent atoms are
arranged in. Shown here are the results of an older indexing
programme written in IDL, and the new indexing programme, written
in C. The new indexing programme is proving to be more versatile,
and more reliable than the old programme, often indexing sets of
data that the old programme failed with. Grain Orientations
Indexing Process
Slide 37
High Performance Computing Elder Matias CLS 09-04-28
Slide 38
Is this about making processors faster? Moores Law has limited
us There are also other fundamental limits We need to look at
parallel computers
Slide 39
What is High Performance Computing? Special purpose machines,
configured to solve complex problems Usually multi-processor (tens
to thousands) Requires parallel programming Models Grid
multi-machines inter-connected solving the same problem,
Supercomputer multi-processor with shared memory
Slide 40
Limitation of Parallel Programming (Amdahls Law and Gustafsons
Law) The degree to which a problem can be expressed using a
parallel algorithm will limit the speedup achieved on a
multi-processor machine. Amdahls Law P = % Parallelism S = Speedup
(x sequential) N = number of processors
Slide 41
Examples . LHC LHC at CERN is an example of a grid application
where no one county has sufficient processing capabilities 15
million gigabytes of data per year In 2006 LHC Tier 1 Grid was
tested TRIUMF is the Canadian Tier 1 Centre for LHC Experiments
Courtesy TRIUMF
Slide 42
How about in the synchrotron Community? Many synchrotrons
understand the need for HPC Some of CLS users make use of WestGrid
for Computation The New WestGrid data storage facility is intended
to support CLS experiments and is located on campus
UWO/ORNL/APS/CLS are working on a joint crystallography application
SharcNet using the Cell environment
Slide 43
Diamond - Racks layout Courtesy: Nick Rees Diamond Oct/08
Slide 44
Diamond - Current situation Water pipes Cable Tray Courtesy:
Nick Rees Diamond Oct/08
Slide 45
How do I get access to a HPC Machine? Compute Canada
Responsible for High Performance Computing in Canada Each regional
grid is a member of Compute Canada ACEnet Atlantic Canada CLUMEQ -
Quebec SCINET - UofT HPCVL Queens, Royal Military Collage St.
Lawrence, Carlson, Ottawa, RQCHP - Quebec SHARCNET - Ontario
WESTGRID Western Canada
Slide 46
Grid Data Storage? UofS is the host for the new WestGrid data
storage facility Cost: $3.2 M Includes on-line and archival storage
Two sites on campus Photo: tape backup unit holding 6,000 tape
(each @1TB)
Slide 47
IBM Cell Processor (3.2 GHz)
Slide 48
Slide 49
ANISE Elder Matias CLS 09-04-28
Slide 50
50 ANISE: Active Network for Information from Synchrotron
Experiments Active means near-instantaneous stream processing of
complex data during transfer to the user or to storage. Cell
processing using Infosphere Streams software from IBM and lightpath
provided by CANARIE network. Distributed processing on facilities
provided by SHARCNET and WESTGRID. Objective: Develop such a
network to provide processed results from experiments such as Laue
diffraction at APS (34-ID) and VESPERS at CLS The network would
assist the integration of diffraction data from multiple and large
area detectors. The network would facilitate faster resolution of
research problems and free up time for more users. The network
would encouage common data formats and protocols leding to closer
collaboration.
Slide 51
51 ANISE: Active Network for Information from Synchrotron
Experiments Some project outcomes: 1 Accessibility of Laue
diffraction methods to a greater number and variety of users could
be achieved by reducing the time required to accumulate meaningful
data. 2 The results of complex diffraction measurements involving a
wider segment of angles could be assessed rapidly. 3. Data and
experiment management processes of Science Studio could enable very
brief follow-up experiments to answer crucial questions sometime
later. 4. Distant collaborators could participate in, and learn
from experiments on samples of critical importance to a project. 5.
User support software could man a more rapid publications. 6.
Expansion to include APS and NSLS beamlines.
Slide 52
Slide 53
Slide 54
Slide 55
Slide 56
X-Ray Fluorescence (XRF): Reveals Elemental Composition
Characteristic Element Lines Selected and Mapped Over a 2D Scan
Area S: K Cr: K & Cr: K Fe: K & Fe: K Ni: K & Ni: K
X-Ray Diffraction (XRD): Reveals Structural Information Peak
Fitting and Indexing of Image Set to Create a Grain Orientation Map
The XRD Indexing programme examines the locations of peaks in an
image in order to determine the kind of lattice structure the
samples constituent atoms are arranged in. Shown here are the
results of an older indexing programme written in IDL, and the new
indexing programme, written in C. The new indexing programme is
proving to be more versatile, and more reliable than the old
programme, often indexing sets of data that the old programme
failed with. Peak Search Indexing Process Grain Orientations Apply
to Entire Data Set 2D Maps Generated for Selected Elemental Lines
VESPERS Beamline Experimental Setup Sample Beam XRD Area Detector
XRF Output XRD Output