AHM 2014: Crawling for EarthCube
description
Transcript of AHM 2014: Crawling for EarthCube
Crawling for EarthCube
Ruth Duerr, Luis Lopez, Abeve Tayachow, Erik Mingo
Outline
• Very brief NSIDC intro • Why crawl? • Libre crawler architecture • Questions for the community
2
NSIDC: An overview2
Cooperative Institute for Research in Environmental Sciences
Main sponsors:
University of Colorado Boulder
NSIDC affiliations and sponsorship
National Science Foundation NASA National Oceanographic and Atmospheric Administration
The National Snow and Ice Data Center…
Provides tools for
data access
Researches the cryosphere and data science
Educates the public about the
cryosphereSupports data users
Manages and distributes scientific data
Supports local and traditional
knowledge
Outline
• Very brief NSIDC intro • Why crawl? • Libre crawler architecture • Questions for the community
2
Why not let Google do it?
• What's their incentive? • The schema.org route for data has extreme limitations
2
Ways to build a comprehensive catalog
• Ask folks to register their data and services • Build your catalog by hand • Automate discovery of data and services
2
Preparing Data for Ingest, presented 10/27/09 by R. Duerr LID590DCL Foundations of Data Curation
What if...
Advertising your data so that everyone could find them, were as simple as...
1 - Filling out a web form 2 - Saving it to your website 3 - Adding its link to your site
Well... It can be!
Why not let Google do it?
2
Outline
• Very brief NSIDC intro • Why crawl? • Libre crawler architecture • Questions for the community
2
Crawler Big Picture
2
BCube Crawler
BCube Broker
CINERGI
Crawler Architecture
2
Things we are going to search for
• OpenSearch • OAI-PMH, ESIP Data and service cast feeds • THREDDS catalogs • Web-enabled folders • WADL/WSDL
2
Things we are going to search for
• OpenSearch • OAI-PMH, ESIP Data and service cast feeds • THREDDS catalogs • Web-enabled folders • WADL/WSDL
2
But what else should we look for?
16
Questions/Comments