1 Cyberinfrastructure Summer Institute for Geoscientists August 14-18, 2006 San Diego Supercomputer...
-
Upload
aron-jackson -
Category
Documents
-
view
214 -
download
0
Transcript of 1 Cyberinfrastructure Summer Institute for Geoscientists August 14-18, 2006 San Diego Supercomputer...
1
Cyberinfrastructure Summer Institute for Geoscientists
August 14-18, 2006
San Diego Supercomputer Center
3
Acknowledgements• Instructors
– Prof. Ramon Arrowsmith, Arizona State– Dr. Steve Cutchin, SDSC– Efrat Jaeger, SDSC/GEON – Prof. Randy Keller, University of Oklahoma– Dr. Kai Lin, SDSC/GEON– Prof. Bertram Ludaescher, UC Davis– Dr. Charles Meertens, UNAVCO– Ashraf Memon, SDSC/GEON– Prof. Krishna Sinha, VaTech– Dr. David Valentine, SDSC– Nancy Wilkins-Diehr, SDSC
4
Acknowledgements• GEON Team at SDSC
– Margaret Banton– Sandeep Chandra– Ghulam Memon– Vishu Nandigam– Dogan Seber– Nancy White– Choonhan Youn
• Synthesis Center Staff– Linda Ferri– John Moreland
5
Acknowledgements• Others at SDSC
– Ilkay Altintas– Jeff Filliez– Nancy Jensen– Matt Kullberg– Emilio Valente– Peggy Wagner
• NSF – CSIG is funded as a supplement to GEON
6
Schedule• Monday – Introduction to Cyberinfrastructure, Data Integration,
and Web Services
• Tuesday – Web Services, GIS
• Wednesday – GIS, Knowledge Representation
• Thursday – Workflow Systems
• Friday – Path Forward: Integration scenarios, Synthesis Center, TeraGrid Science Gateways
7
LOGISTICS
• Webcasting and video archives
• Machine userid/password– Userid: 279user– Password: 279Class
13
What is Cyberinfrastructure?• From NSF’s Cyberinfrastructure Vision for 21st
Century Discovery, www.nsf.gov/od/oci/ci-v7.pdf, July 20, 2006
“The comprehensive infrastructure needed to capitalize on dramatic advances in information technology has been termed cyberinfrastructure. Cyberinfrastructure integrates hardware for computing, data and networks, digitally-enabled sensors, observatories and experimental facilities, and an interoperable suite of software and middleware services and tools. Investments in interdisciplinary teams and cyberinfrastructure professionals with expertise in algorithm development, system operations, and applications development are also essential to exploit the full power of cyberinfrastructure to create, disseminate, and preserve scientific data, information, and knowledge…”
• pp40 of the report:“In 1999, the PITAC released the seminal report ITR-Investing in our Future, prompting new and complementary NSF investments in CI projects, such as the Grid Physics Network (GriPhyN) and international Virtual Data Grid Laboratory (iVDGL) and the Geosciences Network, known as GEON.”
14
CI-TEAM: CI Training, Education, Advancement, and Mentoring
http://www.nsf.gov/crssprgm/ci-team/
15
Hardware
Integrated Cyberinfrastructure System Source: Dr. Deborah Crawford, Chair, NSF CI Working Committee
Middleware Services
DevelopmentTools & Libraries
Applications• Geosciences• Environmental Sciences• Neurosciences• High Energy Physics … •
Domain-specific Cybertools (software)
Domain-specific Cybertools (software)
Shared Cybertools (software)
Shared Cybertools (software)
Distributed Resources (computation, storage, communication, etc.)
Distributed Resources (computation, storage, communication, etc.)
Ed
uca
tion a
nd
Tra
inin
g
Dis
covery
& In
novati
on
16
Data, Tools, & Computation• Data
– Field observations– Laboratory analyses– Sensor-based data (land, airborne, satellite)
• Tools– QA/QC, simple transformations and analyses– Complex models
• Computation– Community codes– Access to high-performance computing– Data Intensive Computing
17
Variety of Geoinformatics Efforts
• Data collection– Digital data collection in the field– “When does it become cyberinfrastructure”?
• Database curation– E.g. EarthChem, Paleobiology, MorphoBank, Paleo
Pollen, etc….– When does it become “tools” and “community codes”
• Software Development– Tools: gravity and magnetics, paleogeography,
geochemistry, seismic data products, …– Community codes: SCEC-CME, CIG, …
18
Variety of Geoinformatics Efforts
• High Performance Computing– LiDAR data management– Seismic analyses– Petascale initiative
• Data Integration– E.g. CUAHSI HIS– Also, a pressing need in projects like
EarthScope
19
Cyberinfrastructure
To provide access to all of these “resources” and support “interoperability” among them
Cyberinfrastructure: The Common Platform Across Distributed Projects
Data Collection
Data ManagementAnd Curation
Tool Development
Modeling and Integration
20
Example: USArray Data Flow
• Deploy field sensor arrays– Across US
• Collect data from sensor arrays and perform QA/QC– One of the sites is SIO, San Diego
• Archive data for community access– IRIS, Seattle EarthScope/USArray: Single
project, multiple participants.
21
D. Harding, NASA
Point Cloudx, y, z, …
Example: LiDAR Workflow
Courtesy: Chris Crosby, ASU
Survey
Analyze / “Do Science”
Interpolate / Grid
Single goal: Multiple projects, multiple participants, e.g. NCALM,
GEON, ASU, NASA, USGS, …
22
The CI Challenge
• Support multiple science goals, each requiring access and “integration” of resources from multiple projects and involving multiple participants and partners
Distributed Systems Interoperability And creation of “Virtual Organizations”…
24
Community Cyberinfrastructure Projects
Middleware Services
DevelopmentTools & Libraries
Distributed Computing, Instruments and Data Resources
Friendly Work-Facilitating PortalsAuthentication - Authorization - Auditing - Workflows - Visualization - Analysis
Bio
med
ical
In
form
atic
s (B
IRN
)
Hig
h E
neg
y P
hys
ics
(Gri
Ph
yN)
Geo
scie
nce
s (G
EO
N)
Eco
log
ical
Ob
serv
ato
ries
(N
EO
N)
Ear
thq
uak
e E
ng
inee
rin
g (
NE
ES
)
Oce
an O
bse
rvin
g (
OR
ION
)
Hardware
Adapted from: Prof. Mark Ellisman, UC San Diego
Shared Tools
ScienceDomains
Shared Tools
ScienceDomains
Your Specific Tools & User Apps.
Your Specific Tools & User Apps.
25
GEON Cyberinfrastructure
• Funded by NSF IT Research program
• Multi-institution collaboration between IT and Earth Science researchers
• GEON Cyberinfrastructure provides:– Authenticated access to data and Web services
– Registration of data sets, tools, and services with metadata
– Search for data, tools, and services, using ontologies
– Scientific workflow environment and access to HPC
– Data and map integration capability
– Scientific data visualization and GIS mapping
26
Key Informatics Areas• Portals
– Authenticated, role-based access to cyber resources: data, tools, models, model outputs, collaboration spaces, …
• Data Integration– Search, discovery and integration of data from heterogeneous information
sources (“mediation” and “semantic integration”)• Use of workflow systems, and access to HPC
– Ability to “program” at a higher level of abstraction– Sharing of models, along with “provenance” information– Gateways to HPC environments
• Management of Geospatial Information– Using GIS capabilities, map services, geospatial data integration
• Visualization of 3D, 4D geospatial data and information