AHM 2014: Crawling for EarthCube

15
Crawling for EarthCube Ruth Duerr, Luis Lopez, Abeve Tayachow, Erik Mingo

description

Presentation by Ruth Duerr during the lunch & learn sessions on Day 2, June 25 at the EarthCube All-Hands Meeting

Transcript of AHM 2014: Crawling for EarthCube

Page 1: AHM 2014: Crawling for EarthCube

Crawling for EarthCube

Ruth Duerr, Luis Lopez, Abeve Tayachow, Erik Mingo

Page 2: AHM 2014: Crawling for EarthCube

Outline

• Very brief NSIDC intro • Why crawl? • Libre crawler architecture • Questions for the community

2

Page 3: AHM 2014: Crawling for EarthCube

NSIDC: An overview2

Cooperative Institute for Research in Environmental Sciences

Main sponsors:

University of Colorado Boulder

NSIDC affiliations and sponsorship

National Science Foundation NASA National Oceanographic and Atmospheric Administration

Page 4: AHM 2014: Crawling for EarthCube

The National Snow and Ice Data Center…

Provides tools for

data access

Researches the cryosphere and data science

Educates the public about the

cryosphereSupports data users

Manages and distributes scientific data

Supports local and traditional

knowledge

Page 5: AHM 2014: Crawling for EarthCube

Outline

• Very brief NSIDC intro • Why crawl? • Libre crawler architecture • Questions for the community

2

Page 6: AHM 2014: Crawling for EarthCube

Why not let Google do it?

• What's their incentive? • The schema.org route for data has extreme limitations

2

Page 7: AHM 2014: Crawling for EarthCube

Ways to build a comprehensive catalog

• Ask folks to register their data and services • Build your catalog by hand • Automate discovery of data and services

2

Page 8: AHM 2014: Crawling for EarthCube

Preparing Data for Ingest, presented 10/27/09 by R. Duerr LID590DCL Foundations of Data Curation

What if...

Advertising your data so that everyone could find them, were as simple as...

1 - Filling out a web form 2 - Saving it to your website 3 - Adding its link to your site

Well... It can be!

Page 9: AHM 2014: Crawling for EarthCube

Why not let Google do it?

2

Page 10: AHM 2014: Crawling for EarthCube

Outline

• Very brief NSIDC intro • Why crawl? • Libre crawler architecture • Questions for the community

2

Page 11: AHM 2014: Crawling for EarthCube

Crawler Big Picture

2

BCube Crawler

BCube Broker

CINERGI

Page 12: AHM 2014: Crawling for EarthCube

Crawler Architecture

2

Page 13: AHM 2014: Crawling for EarthCube

Things we are going to search for

• OpenSearch • OAI-PMH, ESIP Data and service cast feeds • THREDDS catalogs • Web-enabled folders • WADL/WSDL

2

Page 14: AHM 2014: Crawling for EarthCube

Things we are going to search for

• OpenSearch • OAI-PMH, ESIP Data and service cast feeds • THREDDS catalogs • Web-enabled folders • WADL/WSDL

2

But what else should we look for?

Page 15: AHM 2014: Crawling for EarthCube

16

Questions/Comments