1 Cyberinfrastructure Summer Institute for Geoscientists August 14-18, 2006 San Diego Supercomputer...

26
1 Cyberinfrastructure Summer Institute for Geoscientists August 14-18, 2006 San Diego Supercomputer Center

Transcript of 1 Cyberinfrastructure Summer Institute for Geoscientists August 14-18, 2006 San Diego Supercomputer...

1

Cyberinfrastructure Summer Institute for Geoscientists

August 14-18, 2006

San Diego Supercomputer Center

2

WELCOME !

3

Acknowledgements• Instructors

– Prof. Ramon Arrowsmith, Arizona State– Dr. Steve Cutchin, SDSC– Efrat Jaeger, SDSC/GEON – Prof. Randy Keller, University of Oklahoma– Dr. Kai Lin, SDSC/GEON– Prof. Bertram Ludaescher, UC Davis– Dr. Charles Meertens, UNAVCO– Ashraf Memon, SDSC/GEON– Prof. Krishna Sinha, VaTech– Dr. David Valentine, SDSC– Nancy Wilkins-Diehr, SDSC

4

Acknowledgements• GEON Team at SDSC

– Margaret Banton– Sandeep Chandra– Ghulam Memon– Vishu Nandigam– Dogan Seber– Nancy White– Choonhan Youn

• Synthesis Center Staff– Linda Ferri– John Moreland

5

Acknowledgements• Others at SDSC

– Ilkay Altintas– Jeff Filliez– Nancy Jensen– Matt Kullberg– Emilio Valente– Peggy Wagner

• NSF – CSIG is funded as a supplement to GEON

6

Schedule• Monday – Introduction to Cyberinfrastructure, Data Integration,

and Web Services

• Tuesday – Web Services, GIS

• Wednesday – GIS, Knowledge Representation

• Thursday – Workflow Systems

• Friday – Path Forward: Integration scenarios, Synthesis Center, TeraGrid Science Gateways

7

LOGISTICS

• Webcasting and video archives

• Machine userid/password– Userid: 279user– Password: 279Class

8

INTRODUCTIONS !

9

Distributed Systems for Geoinformatics

What is the need?

10

GeoinformaticsRef: David Lambert, NSF EAR/GEO

Presentation at GEON Annual Meeting, 2005

11

GeoinformaticsRef: David Lambert, NSF EAR/GEO

Presentation at GEON Annual Meeting, 2005

12

Role of Cyberinfrastructure in Geoinformatics

CyberinfrastructureGEON

13

What is Cyberinfrastructure?• From NSF’s Cyberinfrastructure Vision for 21st

Century Discovery, www.nsf.gov/od/oci/ci-v7.pdf, July 20, 2006

“The comprehensive infrastructure needed to capitalize on dramatic advances in information technology has been termed cyberinfrastructure. Cyberinfrastructure integrates hardware for computing, data and networks, digitally-enabled sensors, observatories and experimental facilities, and an interoperable suite of software and middleware services and tools. Investments in interdisciplinary teams and cyberinfrastructure professionals with expertise in algorithm development, system operations, and applications development are also essential to exploit the full power of cyberinfrastructure to create, disseminate, and preserve scientific data, information, and knowledge…”

• pp40 of the report:“In 1999, the PITAC released the seminal report ITR-Investing in our Future, prompting new and complementary NSF investments in CI projects, such as the Grid Physics Network (GriPhyN) and international Virtual Data Grid Laboratory (iVDGL) and the Geosciences Network, known as GEON.”

14

CI-TEAM: CI Training, Education, Advancement, and Mentoring

http://www.nsf.gov/crssprgm/ci-team/

15

Hardware

Integrated Cyberinfrastructure System Source: Dr. Deborah Crawford, Chair, NSF CI Working Committee

Middleware Services

DevelopmentTools & Libraries

Applications• Geosciences• Environmental Sciences• Neurosciences• High Energy Physics … •

Domain-specific Cybertools (software)

Domain-specific Cybertools (software)

Shared Cybertools (software)

Shared Cybertools (software)

Distributed Resources (computation, storage, communication, etc.)

Distributed Resources (computation, storage, communication, etc.)

Ed

uca

tion a

nd

Tra

inin

g

Dis

covery

& In

novati

on

16

Data, Tools, & Computation• Data

– Field observations– Laboratory analyses– Sensor-based data (land, airborne, satellite)

• Tools– QA/QC, simple transformations and analyses– Complex models

• Computation– Community codes– Access to high-performance computing– Data Intensive Computing

17

Variety of Geoinformatics Efforts

• Data collection– Digital data collection in the field– “When does it become cyberinfrastructure”?

• Database curation– E.g. EarthChem, Paleobiology, MorphoBank, Paleo

Pollen, etc….– When does it become “tools” and “community codes”

• Software Development– Tools: gravity and magnetics, paleogeography,

geochemistry, seismic data products, …– Community codes: SCEC-CME, CIG, …

18

Variety of Geoinformatics Efforts

• High Performance Computing– LiDAR data management– Seismic analyses– Petascale initiative

• Data Integration– E.g. CUAHSI HIS– Also, a pressing need in projects like

EarthScope

19

Cyberinfrastructure

To provide access to all of these “resources” and support “interoperability” among them

Cyberinfrastructure: The Common Platform Across Distributed Projects

Data Collection

Data ManagementAnd Curation

Tool Development

Modeling and Integration

20

Example: USArray Data Flow

• Deploy field sensor arrays– Across US

• Collect data from sensor arrays and perform QA/QC– One of the sites is SIO, San Diego

• Archive data for community access– IRIS, Seattle EarthScope/USArray: Single

project, multiple participants.

21

D. Harding, NASA

Point Cloudx, y, z, …

Example: LiDAR Workflow

Courtesy: Chris Crosby, ASU

Survey

Analyze / “Do Science”

Interpolate / Grid

Single goal: Multiple projects, multiple participants, e.g. NCALM,

GEON, ASU, NASA, USGS, …

22

The CI Challenge

• Support multiple science goals, each requiring access and “integration” of resources from multiple projects and involving multiple participants and partners

Distributed Systems Interoperability And creation of “Virtual Organizations”…

23

24

Community Cyberinfrastructure Projects

Middleware Services

DevelopmentTools & Libraries

Distributed Computing, Instruments and Data Resources

Friendly Work-Facilitating PortalsAuthentication - Authorization - Auditing - Workflows - Visualization - Analysis

Bio

med

ical

In

form

atic

s (B

IRN

)

Hig

h E

neg

y P

hys

ics

(Gri

Ph

yN)

Geo

scie

nce

s (G

EO

N)

Eco

log

ical

Ob

serv

ato

ries

(N

EO

N)

Ear

thq

uak

e E

ng

inee

rin

g (

NE

ES

)

Oce

an O

bse

rvin

g (

OR

ION

)

Hardware

Adapted from: Prof. Mark Ellisman, UC San Diego

Shared Tools

ScienceDomains

Shared Tools

ScienceDomains

Your Specific Tools & User Apps.

Your Specific Tools & User Apps.

25

GEON Cyberinfrastructure

• Funded by NSF IT Research program

• Multi-institution collaboration between IT and Earth Science researchers

• GEON Cyberinfrastructure provides:– Authenticated access to data and Web services

– Registration of data sets, tools, and services with metadata

– Search for data, tools, and services, using ontologies

– Scientific workflow environment and access to HPC

– Data and map integration capability

– Scientific data visualization and GIS mapping

26

Key Informatics Areas• Portals

– Authenticated, role-based access to cyber resources: data, tools, models, model outputs, collaboration spaces, …

• Data Integration– Search, discovery and integration of data from heterogeneous information

sources (“mediation” and “semantic integration”)• Use of workflow systems, and access to HPC

– Ability to “program” at a higher level of abstraction– Sharing of models, along with “provenance” information– Gateways to HPC environments

• Management of Geospatial Information– Using GIS capabilities, map services, geospatial data integration

• Visualization of 3D, 4D geospatial data and information